Knowledge

SWAR

Source 📝

167:, in the form of a group of registers and instructions to make use of them. SWAR refers to the use of those registers and instructions, as opposed to using specialized processing engines designed to be better at SIMD operations. It also refers to the use of SIMD with general-purpose registers and instructions that were not meant to do it at the time, by way of various novel software tricks. 245:
In the fall of 1996, Professor Hank Dietz was the instructor for the undergraduate Compiler Construction course at Purdue University's School of Electrical and Computer Engineering. For this course, he assigned a series of projects in which the students would build a simple compiler targeting MMX.
272:
Dietz and Fisher began developing the idea of a well-defined parallel programming model that would allow the programming to target the model without knowing the specifics of the target architecture. This model would become the basis of Fisher's dissertation. The acronym "SWAR" was coined by Dietz
253:
During the course of the semester, it became clear to the course teaching assistant, Randall (Randy) Fisher, that there were a number of issues with MMX that would make it difficult to build the back-end of the NEMPL compiler. For example, MMX has an instruction for multiplying 16-bit data but not
175:
A SWAR architecture is one that includes instructions explicitly intended to perform parallel operations across data that is stored in the independent subwords or fields of a register. A SWAR-capable architecture is one that includes a set of instructions that is sufficient to allow data stored in
273:
and Fisher one day in Hank's office in the MSEE building at Purdue University. It refers to this form of parallel processing, architectures that are designed to natively perform this type of processing, and the general-purpose programming model that is Fisher's dissertation.
241:
With the introduction of Intel's MMX multimedia instruction set extensions in 1996, desktop processors with SIMD parallel processing capabilities became common. Early on, these instructions could only be used via hand-written assembly code.
447:
Fisher, Randall J.; Henry G. Dietz (August 1998). S. Chatterjee; J. F. Prins; L. Carter; J. Ferrante; Z. Li; D. Sehr; P.-C.Yew (eds.). "Compiling for SIMD Within A Register".
595:
Hauser, Thomas; T. I. Mattox; R. P. LeBeau; H. G. Dietz; P. G. Huang (April 2003). "Code Optimizations for Complex Microprocessors Applied to CFD Software".
254:
multiplying 8-bit data. The NEMPL language did not account for this problem, allowing the programmer to write programs that required 8-bit multiplies.
176:
these fields to be treated independently even though the architecture does not include instructions that are explicitly intended for that purpose.
191:, by contrast, did not include such instructions, but could still act as a SWAR architecture through careful hand-coding or compiler techniques. 284:
SWAR processing has been used in image processing, cryptographic pairings, raster processing, computational fluid dynamics, and communications.
580:
Persada, Onil Nazra; Thierry Goubier (12–14 September 2004). "Accelerating Raster Processing with Fine and Coarse Grain Parallelism in GRASS".
265:, and other multimedia instruction sets had been added to other manufacturers' existing instruction set architectures to support so-called 519: 518:
Padua, Flavio L. C.; Pereira, Guilherme A. S.; Neto, Jose P. de Queiroz; Campos, Mario F. M.; Fernandes, Antonio O. (January 2001).
17: 564: 269:
applications. These extensions had significant differences in the precision of data and types of instructions supported.
106: 545:
Grabher, Philipp; Johann Großschädl; Dan Page (2009). "On Software Parallel Implementation of Cryptographic Pairings".
205: 101: 85: 234:
introduced partitioned subword data operations in the 1950s. This can be seen as a very early predecessor to SWAR.
504: 80: 44:), also known by the name "packed SIMD" is a technique for performing parallel operations on data contained in a 397: 132: 276:
The problem of compiling for these widely varying architectures was discussed in a paper presented at LCPC98.
238:
presented SWAR techniques in his paper titled "Multiple byte processing with full-word instructions" in 1975.
148: 127: 57: 257:
Intel's x86 architecture was not the only architecture to include SIMD-like parallel instructions. Sun's
667: 122: 354:
An algorithm of hardware unit generation for processor core synthesis with packed SIMD type instructions
527:. Chilean Computing Week, V Workshop on Parallel and Distributed Systems. Punta Arenas. Archived from 325: 302: 180: 672: 449:
Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
258: 220: 188: 317: 184: 389: 8: 528: 66: 485: 427:
General-Purpose SIMD Within A Register: Parallel Processing on Consumer Microprocessors
45: 612: 560: 306: 209: 631: 223:. Like MMX, many of the SWAR instruction sets are intended for faster video coding. 604: 550: 489: 475: 406: 385: 365: 357: 294: 356:. Asia-Pacific Conference on Circuits and Systems. Vol. 1. pp. 171–176. 425: 298: 231: 555: 361: 235: 608: 661: 616: 410: 521:
Improving processing time of large images by instruction level parallelism
480: 463: 594: 163:
Many modern general-purpose computer processors have some provisions for
352:
Miyaoka, Y.; Choi, J.; Togawa, N.; Yanagisawa, M.; Ohtsuki, T. (2002).
370: 549:. Lecture Notes in Computer Science. Vol. 5381. pp. 35–50. 195: 652: 202: 544: 321: 247: 216: 31: 351: 333: 329: 262: 212: 164: 153: 49: 313: 226: 579: 517: 464:"Multiple byte processing with full-word instructions" 390:"Some Computer Organizations and Their Effectiveness" 446: 582:
Proceedings of the FOSS/GRASS Users Conference 2004
198: 659: 179:An early example of a SWAR architecture was the 133:Associative processing (predicated/masked SIMD) 653:The Aggregate - SWAR: SIMD Within A Register 633:SWAR Systems and Communications Applications 279: 56:. Flynn's 1972 taxonomy categorises SWAR as 246:The input language was a subset dialect of 629: 554: 479: 378: 369: 442: 440: 461: 250:'s MPL called NEMPL (Not Exactly MPL). 14: 660: 423: 437: 384: 227:History of the SWAR programming model 170: 597:SIAM Journal on Scientific Computing 24: 208:, Silicon Graphics Incorporated's 128:Pipelined processing (packed SIMD) 25: 684: 646: 502: 194:Early SWAR architectures include 54:single instruction, multiple data 639:(Ph.D.). University of Aberdeen. 505:"The Aggregate Magic Algorithms" 630:Spracklen, Lawrence A. (2001). 623: 588: 462:Lamport, Leslie (August 1975). 573: 547:Selected Areas in Cryptography 538: 511: 496: 455: 417: 398:IEEE Transactions on Computers 345: 13: 1: 339: 27:Parallel processing technique 7: 556:10.1007/978-3-642-04159-4_3 433:(Ph.D.). Purdue University. 362:10.1109/APCCAS.2002.1114930 287: 10: 689: 424:Fisher, Randall J (2003). 29: 609:10.1137/S1064827502410530 468:Communications of the ACM 280:Some applications of SWAR 303:digital signal processor 183:, which implemented the 411:10.1109/TC.1972.5009071 123:Array processing (SIMT) 30:For musical notes, see 181:Intel Pentium with MMX 58:"pipelined processing" 38:SIMD within a register 18:SIMD within a register 481:10.1145/360933.360994 94:Multiple data streams 201:, Hewlett-Packard's 668:Parallel computing 388:(September 1972). 171:SWAR architectures 115:SIMD subcategories 73:Single data stream 46:processor register 566:978-3-642-04158-7 386:Flynn, Michael J. 185:MMX extension set 161: 160: 16:(Redirected from 680: 641: 640: 638: 627: 621: 620: 603:(4): 1461–1477. 592: 586: 585: 577: 571: 570: 558: 542: 536: 535: 533: 526: 515: 509: 508: 500: 494: 493: 483: 459: 453: 452: 444: 435: 434: 432: 421: 415: 414: 394: 382: 376: 375: 373: 349: 307:stream processor 295:vector processor 200: 67:Flynn's taxonomy 63: 62: 21: 688: 687: 683: 682: 681: 679: 678: 677: 658: 657: 649: 644: 636: 628: 624: 593: 589: 578: 574: 567: 543: 539: 531: 524: 516: 512: 501: 497: 460: 456: 445: 438: 430: 422: 418: 392: 383: 379: 350: 346: 342: 299:array processor 290: 282: 232:Wesley A. Clark 229: 173: 35: 28: 23: 22: 15: 12: 11: 5: 686: 676: 675: 673:SIMD computing 670: 656: 655: 648: 647:External links 645: 643: 642: 622: 587: 572: 565: 537: 534:on 2007-02-25. 510: 495: 474:(8): 471–475. 454: 436: 416: 405:(9): 948–960. 377: 343: 341: 338: 337: 336: 310: 293:SIMD engines: 289: 286: 281: 278: 236:Leslie Lamport 228: 225: 172: 169: 159: 158: 157: 156: 151: 143: 142: 138: 137: 136: 135: 130: 125: 117: 116: 112: 111: 110: 109: 104: 96: 95: 91: 90: 89: 88: 83: 75: 74: 70: 69: 26: 9: 6: 4: 3: 2: 685: 674: 671: 669: 666: 665: 663: 654: 651: 650: 635: 634: 626: 618: 614: 610: 606: 602: 598: 591: 583: 576: 568: 562: 557: 552: 548: 541: 530: 523: 522: 514: 506: 503:Dietz, Hank. 499: 491: 487: 482: 477: 473: 469: 465: 458: 450: 443: 441: 429: 428: 420: 412: 408: 404: 400: 399: 391: 387: 381: 372: 367: 363: 359: 355: 348: 344: 335: 331: 327: 323: 319: 315: 311: 308: 304: 300: 296: 292: 291: 285: 277: 274: 270: 268: 264: 260: 255: 251: 249: 243: 239: 237: 233: 224: 222: 218: 214: 211: 207: 204: 197: 192: 190: 189:Intel Pentium 186: 182: 177: 168: 166: 155: 152: 150: 147: 146: 145: 144: 140: 139: 134: 131: 129: 126: 124: 121: 120: 119: 118: 114: 113: 108: 105: 103: 100: 99: 98: 97: 93: 92: 87: 84: 82: 79: 78: 77: 76: 72: 71: 68: 65: 64: 61: 59: 55: 51: 47: 43: 39: 33: 19: 632: 625: 600: 596: 590: 581: 575: 546: 540: 529:the original 520: 513: 498: 471: 467: 457: 448: 426: 419: 402: 396: 380: 353: 347: 316:processors: 283: 275: 271: 266: 256: 252: 244: 240: 230: 215:, and Sun's 193: 178: 174: 162: 53: 41: 37: 36: 52:stands for 662:Categories 371:2065/10689 340:References 617:1064-8275 267:new media 196:DEC Alpha 312:SWAR on 288:See also 261:, SGI's 141:See also 490:1593593 203:PA-RISC 187:. The 615:  563:  488:  322:3DNow! 248:MasPar 637:(PDF) 532:(PDF) 525:(PDF) 486:S2CID 431:(PDF) 393:(PDF) 217:SPARC 32:swara 613:ISSN 561:ISBN 403:C-21 334:SSE3 330:SSE2 263:MDMX 213:MDMX 210:MIPS 165:SIMD 154:MPMD 149:SPMD 107:MIMD 102:SIMD 86:MISD 81:SISD 50:SIMD 42:SWAR 605:doi 551:doi 476:doi 407:doi 366:hdl 358:doi 326:SSE 318:MMX 314:x86 259:VIS 221:VIS 219:V9 206:MAX 199:MVI 664:: 611:. 601:25 599:. 559:. 484:. 472:18 470:. 466:. 439:^ 401:. 395:. 364:. 332:, 328:, 324:, 320:, 305:, 301:, 297:, 60:. 48:. 619:. 607:: 584:. 569:. 553:: 507:. 492:. 478:: 451:. 413:. 409:: 374:. 368:: 360:: 309:. 40:( 34:. 20:)

Index

SIMD within a register
swara
processor register
SIMD
"pipelined processing"
Flynn's taxonomy
SISD
MISD
SIMD
MIMD
Array processing (SIMT)
Pipelined processing (packed SIMD)
Associative processing (predicated/masked SIMD)
SPMD
MPMD
SIMD
Intel Pentium with MMX
MMX extension set
Intel Pentium
DEC Alpha
PA-RISC
MAX
MIPS
MDMX
SPARC
VIS
Wesley A. Clark
Leslie Lamport
MasPar
VIS

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.