167:, in the form of a group of registers and instructions to make use of them. SWAR refers to the use of those registers and instructions, as opposed to using specialized processing engines designed to be better at SIMD operations. It also refers to the use of SIMD with general-purpose registers and instructions that were not meant to do it at the time, by way of various novel software tricks.
245:
In the fall of 1996, Professor Hank Dietz was the instructor for the undergraduate
Compiler Construction course at Purdue University's School of Electrical and Computer Engineering. For this course, he assigned a series of projects in which the students would build a simple compiler targeting MMX.
272:
Dietz and Fisher began developing the idea of a well-defined parallel programming model that would allow the programming to target the model without knowing the specifics of the target architecture. This model would become the basis of Fisher's dissertation. The acronym "SWAR" was coined by Dietz
253:
During the course of the semester, it became clear to the course teaching assistant, Randall (Randy) Fisher, that there were a number of issues with MMX that would make it difficult to build the back-end of the NEMPL compiler. For example, MMX has an instruction for multiplying 16-bit data but not
175:
A SWAR architecture is one that includes instructions explicitly intended to perform parallel operations across data that is stored in the independent subwords or fields of a register. A SWAR-capable architecture is one that includes a set of instructions that is sufficient to allow data stored in
273:
and Fisher one day in Hank's office in the MSEE building at Purdue
University. It refers to this form of parallel processing, architectures that are designed to natively perform this type of processing, and the general-purpose programming model that is Fisher's dissertation.
241:
With the introduction of Intel's MMX multimedia instruction set extensions in 1996, desktop processors with SIMD parallel processing capabilities became common. Early on, these instructions could only be used via hand-written assembly code.
447:
Fisher, Randall J.; Henry G. Dietz (August 1998). S. Chatterjee; J. F. Prins; L. Carter; J. Ferrante; Z. Li; D. Sehr; P.-C.Yew (eds.). "Compiling for SIMD Within A Register".
595:
Hauser, Thomas; T. I. Mattox; R. P. LeBeau; H. G. Dietz; P. G. Huang (April 2003). "Code
Optimizations for Complex Microprocessors Applied to CFD Software".
254:
multiplying 8-bit data. The NEMPL language did not account for this problem, allowing the programmer to write programs that required 8-bit multiplies.
176:
these fields to be treated independently even though the architecture does not include instructions that are explicitly intended for that purpose.
191:, by contrast, did not include such instructions, but could still act as a SWAR architecture through careful hand-coding or compiler techniques.
284:
SWAR processing has been used in image processing, cryptographic pairings, raster processing, computational fluid dynamics, and communications.
580:
Persada, Onil Nazra; Thierry
Goubier (12–14 September 2004). "Accelerating Raster Processing with Fine and Coarse Grain Parallelism in GRASS".
265:, and other multimedia instruction sets had been added to other manufacturers' existing instruction set architectures to support so-called
519:
518:
Padua, Flavio L. C.; Pereira, Guilherme A. S.; Neto, Jose P. de
Queiroz; Campos, Mario F. M.; Fernandes, Antonio O. (January 2001).
17:
564:
269:
applications. These extensions had significant differences in the precision of data and types of instructions supported.
106:
545:
Grabher, Philipp; Johann Großschädl; Dan Page (2009). "On
Software Parallel Implementation of Cryptographic Pairings".
205:
101:
85:
234:
introduced partitioned subword data operations in the 1950s. This can be seen as a very early predecessor to SWAR.
504:
80:
44:), also known by the name "packed SIMD" is a technique for performing parallel operations on data contained in a
397:
132:
276:
The problem of compiling for these widely varying architectures was discussed in a paper presented at LCPC98.
238:
presented SWAR techniques in his paper titled "Multiple byte processing with full-word instructions" in 1975.
148:
127:
57:
257:
Intel's x86 architecture was not the only architecture to include SIMD-like parallel instructions. Sun's
667:
122:
354:
An algorithm of hardware unit generation for processor core synthesis with packed SIMD type instructions
527:. Chilean Computing Week, V Workshop on Parallel and Distributed Systems. Punta Arenas. Archived from
325:
302:
180:
672:
449:
Proceedings of the 11th
International Workshop on Languages and Compilers for Parallel Computing
258:
220:
188:
317:
184:
389:
8:
528:
66:
485:
427:
General-Purpose SIMD Within A Register: Parallel
Processing on Consumer Microprocessors
45:
612:
560:
306:
209:
631:
223:. Like MMX, many of the SWAR instruction sets are intended for faster video coding.
604:
550:
489:
475:
406:
385:
365:
357:
294:
356:. Asia-Pacific Conference on Circuits and Systems. Vol. 1. pp. 171–176.
425:
298:
231:
555:
361:
235:
608:
661:
616:
410:
521:
Improving processing time of large images by instruction level parallelism
480:
463:
594:
163:
Many modern general-purpose computer processors have some provisions for
352:
Miyaoka, Y.; Choi, J.; Togawa, N.; Yanagisawa, M.; Ohtsuki, T. (2002).
370:
549:. Lecture Notes in Computer Science. Vol. 5381. pp. 35–50.
195:
652:
202:
544:
321:
247:
216:
31:
351:
333:
329:
262:
212:
164:
153:
49:
313:
226:
579:
517:
464:"Multiple byte processing with full-word instructions"
390:"Some Computer Organizations and Their Effectiveness"
446:
582:
Proceedings of the FOSS/GRASS Users
Conference 2004
198:
659:
179:An early example of a SWAR architecture was the
133:Associative processing (predicated/masked SIMD)
653:The Aggregate - SWAR: SIMD Within A Register
633:SWAR Systems and Communications Applications
279:
56:. Flynn's 1972 taxonomy categorises SWAR as
246:The input language was a subset dialect of
629:
554:
479:
378:
369:
442:
440:
461:
250:'s MPL called NEMPL (Not Exactly MPL).
14:
660:
423:
437:
384:
227:History of the SWAR programming model
170:
597:SIAM Journal on Scientific Computing
24:
208:, Silicon Graphics Incorporated's
128:Pipelined processing (packed SIMD)
25:
684:
646:
502:
194:Early SWAR architectures include
54:single instruction, multiple data
639:(Ph.D.). University of Aberdeen.
505:"The Aggregate Magic Algorithms"
630:Spracklen, Lawrence A. (2001).
623:
588:
462:Lamport, Leslie (August 1975).
573:
547:Selected Areas in Cryptography
538:
511:
496:
455:
417:
398:IEEE Transactions on Computers
345:
13:
1:
339:
27:Parallel processing technique
7:
556:10.1007/978-3-642-04159-4_3
433:(Ph.D.). Purdue University.
362:10.1109/APCCAS.2002.1114930
287:
10:
689:
424:Fisher, Randall J (2003).
29:
609:10.1137/S1064827502410530
468:Communications of the ACM
280:Some applications of SWAR
303:digital signal processor
183:, which implemented the
411:10.1109/TC.1972.5009071
123:Array processing (SIMT)
30:For musical notes, see
181:Intel Pentium with MMX
58:"pipelined processing"
38:SIMD within a register
18:SIMD within a register
481:10.1145/360933.360994
94:Multiple data streams
201:, Hewlett-Packard's
668:Parallel computing
388:(September 1972).
171:SWAR architectures
115:SIMD subcategories
73:Single data stream
46:processor register
566:978-3-642-04158-7
386:Flynn, Michael J.
185:MMX extension set
161:
160:
16:(Redirected from
680:
641:
640:
638:
627:
621:
620:
603:(4): 1461–1477.
592:
586:
585:
577:
571:
570:
558:
542:
536:
535:
533:
526:
515:
509:
508:
500:
494:
493:
483:
459:
453:
452:
444:
435:
434:
432:
421:
415:
414:
394:
382:
376:
375:
373:
349:
307:stream processor
295:vector processor
200:
67:Flynn's taxonomy
63:
62:
21:
688:
687:
683:
682:
681:
679:
678:
677:
658:
657:
649:
644:
636:
628:
624:
593:
589:
578:
574:
567:
543:
539:
531:
524:
516:
512:
501:
497:
460:
456:
445:
438:
430:
422:
418:
392:
383:
379:
350:
346:
342:
299:array processor
290:
282:
232:Wesley A. Clark
229:
173:
35:
28:
23:
22:
15:
12:
11:
5:
686:
676:
675:
673:SIMD computing
670:
656:
655:
648:
647:External links
645:
643:
642:
622:
587:
572:
565:
537:
534:on 2007-02-25.
510:
495:
474:(8): 471–475.
454:
436:
416:
405:(9): 948–960.
377:
343:
341:
338:
337:
336:
310:
293:SIMD engines:
289:
286:
281:
278:
236:Leslie Lamport
228:
225:
172:
169:
159:
158:
157:
156:
151:
143:
142:
138:
137:
136:
135:
130:
125:
117:
116:
112:
111:
110:
109:
104:
96:
95:
91:
90:
89:
88:
83:
75:
74:
70:
69:
26:
9:
6:
4:
3:
2:
685:
674:
671:
669:
666:
665:
663:
654:
651:
650:
635:
634:
626:
618:
614:
610:
606:
602:
598:
591:
583:
576:
568:
562:
557:
552:
548:
541:
530:
523:
522:
514:
506:
503:Dietz, Hank.
499:
491:
487:
482:
477:
473:
469:
465:
458:
450:
443:
441:
429:
428:
420:
412:
408:
404:
400:
399:
391:
387:
381:
372:
367:
363:
359:
355:
348:
344:
335:
331:
327:
323:
319:
315:
311:
308:
304:
300:
296:
292:
291:
285:
277:
274:
270:
268:
264:
260:
255:
251:
249:
243:
239:
237:
233:
224:
222:
218:
214:
211:
207:
204:
197:
192:
190:
189:Intel Pentium
186:
182:
177:
168:
166:
155:
152:
150:
147:
146:
145:
144:
140:
139:
134:
131:
129:
126:
124:
121:
120:
119:
118:
114:
113:
108:
105:
103:
100:
99:
98:
97:
93:
92:
87:
84:
82:
79:
78:
77:
76:
72:
71:
68:
65:
64:
61:
59:
55:
51:
47:
43:
39:
33:
19:
632:
625:
600:
596:
590:
581:
575:
546:
540:
529:the original
520:
513:
498:
471:
467:
457:
448:
426:
419:
402:
396:
380:
353:
347:
316:processors:
283:
275:
271:
266:
256:
252:
244:
240:
230:
215:, and Sun's
193:
178:
174:
162:
53:
41:
37:
36:
52:stands for
662:Categories
371:2065/10689
340:References
617:1064-8275
267:new media
196:DEC Alpha
312:SWAR on
288:See also
261:, SGI's
141:See also
490:1593593
203:PA-RISC
187:. The
615:
563:
488:
322:3DNow!
248:MasPar
637:(PDF)
532:(PDF)
525:(PDF)
486:S2CID
431:(PDF)
393:(PDF)
217:SPARC
32:swara
613:ISSN
561:ISBN
403:C-21
334:SSE3
330:SSE2
263:MDMX
213:MDMX
210:MIPS
165:SIMD
154:MPMD
149:SPMD
107:MIMD
102:SIMD
86:MISD
81:SISD
50:SIMD
42:SWAR
605:doi
551:doi
476:doi
407:doi
366:hdl
358:doi
326:SSE
318:MMX
314:x86
259:VIS
221:VIS
219:V9
206:MAX
199:MVI
664::
611:.
601:25
599:.
559:.
484:.
472:18
470:.
466:.
439:^
401:.
395:.
364:.
332:,
328:,
324:,
320:,
305:,
301:,
297:,
60:.
48:.
619:.
607::
584:.
569:.
553::
507:.
492:.
478::
451:.
413:.
409::
374:.
368::
360::
309:.
40:(
34:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.