22:
1208:
247:
dissipation costs are disproportionate. Moreover, the complexity and often the latency of the underlying hardware structures results in reduced operating frequency further reducing any benefits. Hence, the aforementioned techniques prove inadequate to keep the CPU from stalling for the off-chip data. Instead, the industry is heading towards exploiting higher levels of parallelism that can be exploited through techniques such as
196:
which allows the execution of complete instructions or parts of instructions before being certain whether this execution should take place. A commonly used form of speculative execution is control flow speculation where instructions past a control flow instruction (e.g., a branch) are executed before
113:
Operation 3 depends on the results of operations 1 and 2, so it cannot be calculated until both of them are completed. However, operations 1 and 2 do not depend on any other operation, so they can be calculated simultaneously. If we assume that each operation can be completed in one unit of time
246:
used ILP techniques to overcome the limitations imposed by a relatively small register file). Presently, a cache miss penalty to main memory costs several hundreds of CPU cycles. While in principle it is possible to use ILP to tolerate even such memory latencies, the associated resource and power
125:
designers is to identify and take advantage of as much ILP as possible. Ordinary programs are typically written under a sequential execution model where instructions execute one after the other and in the order specified by the programmer. ILP allows the compiler and the processor to overlap the
219:
It is known that the ILP is exploited by both the compiler and hardware support but the compiler also provides inherent and implicit ILP in programs to hardware by compile-time optimizations. Some optimization techniques for extracting available ILP in programs would include
89:
Hardware level works upon dynamic parallelism, whereas the software level works on static parallelism. Dynamic parallelism means the processor decides at run time which instructions to execute in parallel, whereas static parallelism means the
171:
where instructions execute in any order that does not violate data dependencies. Note that this technique is independent of both pipelining and superscalar execution. Current implementations of out-of-order execution
241:
In recent years, ILP techniques have been used to provide performance improvements in spite of the growing disparity between processor operating frequencies and memory access times (early ILP designs such as the
190:
which refers to a technique used to avoid unnecessary serialization of program operations imposed by the reuse of registers by those operations, used to enable out-of-order execution.
197:
the target of the control flow instruction is determined. Several other forms of speculative execution have been proposed and are in use including speculative execution driven by
206:
176:(i.e., while the program is executing and without any help from the compiler) extract ILP from ordinary programs. An alternative is to extract this parallelism at
2273:
534:
180:
and somehow convey this information to the hardware. Due to the complexity of scaling the out-of-order execution technique, the industry has re-examined
198:
1245:
2384:
1567:
2086:
624:
1364:
476:
2243:
1809:
1626:
2597:
1589:
158:
2238:
75:'s core in a strict alternation, or in true parallelism if there are enough CPU cores, ideally one core for each runnable thread.
2310:
445:
2592:
2063:
605:
427:
252:
215:
which is used to avoid stalling for control dependencies to be resolved. Branch prediction is used with speculative execution.
402:
377:
335:
645:
3007:
2131:
1394:
1238:
872:
3017:
2158:
895:
1285:
784:
173:
640:
2325:
2153:
2126:
1505:
890:
867:
412:
3140:
2703:
1596:
1562:
1557:
1476:
1441:
469:
3176:
3115:
3012:
2413:
2320:
2121:
1342:
1231:
862:
677:
2141:
1860:
1295:
969:
883:
832:
60:
51:. More specifically ILP refers to the average number of instructions run per step of this parallel execution.
2315:
2163:
2136:
1611:
1572:
1429:
1193:
1027:
878:
2752:
2514:
1990:
1951:
1606:
1601:
1535:
1347:
437:
202:
25:
3181:
2379:
2076:
1774:
1471:
1212:
1158:
618:
462:
3029:
2676:
2093:
1584:
1552:
1322:
1310:
1290:
1137:
932:
817:
779:
629:
519:
154:
129:
How much ILP exists in programs is very application specific. In certain fields, such as graphics and
114:
then these three instructions can be completed in a total of two units of time, giving an ILP of 3/2.
3120:
3083:
3073:
1461:
1153:
1132:
1077:
964:
954:
927:
789:
3135:
2542:
2478:
2455:
2305:
2267:
2103:
2053:
2048:
1525:
1419:
1327:
1107:
733:
672:
585:
269:
234:
are another class of architectures where ILP is explicitly specified, for a recent example see the
126:
execution of multiple instructions or even to change the order in which instructions are executed.
3088:
2871:
2765:
2729:
2646:
2630:
2472:
2261:
2220:
2208:
2071:
1985:
1906:
1671:
1332:
1275:
1168:
1163:
1022:
613:
243:
122:
72:
2894:
2866:
2776:
2741:
2490:
2484:
2466:
2200:
2194:
2098:
2002:
1893:
1832:
1694:
1337:
907:
839:
743:
635:
590:
221:
168:
144:
3068:
2977:
2723:
2435:
2253:
2012:
1980:
1938:
1850:
1651:
1466:
1456:
1446:
1436:
1406:
1389:
1254:
999:
959:
912:
902:
697:
560:
499:
231:
193:
21:
3098:
3034:
2620:
2342:
2232:
2179:
1711:
1424:
1280:
1262:
939:
827:
822:
812:
799:
595:
130:
8:
3145:
3130:
2950:
2801:
2783:
2747:
2735:
2389:
2336:
2113:
2029:
1911:
1766:
1661:
1520:
1102:
1057:
857:
723:
315:
225:
68:
3002:
2994:
2846:
2821:
2625:
2500:
2024:
1965:
1845:
1577:
1305:
1127:
976:
949:
774:
738:
728:
687:
529:
509:
504:
485:
341:
290:
235:
64:
40:
442:
2955:
2922:
2838:
2770:
2671:
2661:
2651:
2582:
2577:
2572:
2495:
2424:
2330:
2290:
1923:
1873:
1823:
1799:
1681:
1621:
1616:
1498:
1414:
1173:
849:
807:
702:
432:
408:
398:
361:
331:
212:
187:
79:
345:
3125:
3058:
3044:
2899:
2806:
2760:
2567:
2562:
2557:
2552:
2547:
2537:
2407:
2374:
2285:
2280:
2189:
2041:
2036:
2019:
2007:
1946:
1510:
1488:
1374:
1352:
1270:
1183:
982:
917:
764:
580:
575:
570:
539:
323:
48:
16:
Ability of computer instructions to be executed simultaneously with correct results
71:. On the other hand, concurrency involves the assignment of multiple threads to a
3039:
3024:
2972:
2876:
2851:
2688:
2681:
2532:
2527:
2522:
2461:
2369:
2359:
2081:
1916:
1868:
1631:
1515:
1483:
1384:
1379:
1300:
1047:
987:
922:
769:
759:
692:
682:
524:
514:
449:
327:
264:
248:
181:
44:
393:
Aiken, Alex; Banerjee, Utpal; Kejariwal, Arun; Nicolau, Alexandru (2016-11-30).
3150:
2984:
2967:
2960:
2856:
2713:
2450:
2364:
2295:
1878:
1840:
1789:
1784:
1779:
1493:
1317:
1178:
994:
651:
544:
162:
3170:
2945:
2861:
1901:
1883:
1676:
1369:
1067:
944:
1804:
3155:
3093:
2909:
2886:
2698:
2419:
1357:
667:
314:
Goossens, Bernard; Langlois, Philippe; Parello, David; Petit, Eric (2012).
177:
134:
2940:
2904:
2615:
2587:
2445:
2300:
1223:
1188:
150:
147:
where the execution of multiple instructions can be partially overlapped.
438:
https://www.scribd.com/doc/33700101/Instruction-Level-Parallelism#scribd
184:
which explicitly encode multiple independent operations per instruction.
2826:
2816:
2811:
2793:
2693:
2666:
1928:
1761:
1731:
1451:
322:. Lecture Notes in Computer Science. Vol. 7133. pp. 270–281.
98:
processor works on the dynamic sequence of parallel execution, but the
2917:
2914:
2656:
1726:
1704:
1062:
1037:
454:
140:
Micro-architectural techniques that are used to exploit ILP include:
2932:
1751:
1112:
1092:
1017:
118:
91:
83:
1741:
1699:
1117:
1097:
1072:
707:
99:
95:
1756:
1721:
1686:
1087:
1082:
2214:
1746:
1716:
392:
313:
3078:
2226:
2146:
1736:
1122:
1052:
1042:
78:
There are two approaches to instruction-level parallelism:
1666:
1656:
1032:
1009:
133:
the amount can be very large. However, workloads such as
316:"PerPI: A Tool to Measure Instruction Level Parallelism"
94:
decides which instructions to execute in parallel. The
165:
are used to execute multiple instructions in parallel.
433:
Wired magazine article that refers to the above paper
443:
http://www.hpl.hp.com/techreports/92/HPL-92-132.pdf
360:
397:. Professional Computing (1 ed.). Springer.
358:
102:processor works on the static level parallelism.
3168:
359:Hennessy, John L.; Patterson, David A. (1996).
363:Computer Architecture: A Quantitative Approach
1239:
470:
28:, the first computer with parallel processing
228:/renaming, and memory access optimization.
2244:Computer performance by orders of magnitude
43:or simultaneous execution of a sequence of
1253:
1246:
1232:
477:
463:
320:Applied Parallel and Scientific Computing
159:explicitly parallel instruction computing
428:Approaches to addressing the Memory Wall
20:
3169:
484:
1227:
458:
2215:Floating-point operations per second
63:. In ILP there is a single specific
137:may exhibit much less parallelism.
13:
386:
14:
3193:
421:
3141:Semiconductor device fabrication
1207:
1206:
105:Consider the following program:
3116:History of general-purpose CPUs
1343:Nondeterministic Turing machine
678:Analysis of parallel algorithms
1296:Deterministic finite automaton
378:Reflections of the Memory Wall
371:
352:
307:
283:
109:e = a + b f = c + d m = e * f
59:ILP must not be confused with
1:
2087:Simultaneous and heterogenous
625:Simultaneous and heterogenous
395:Instruction Level Parallelism
276:
54:
33:Instruction-level parallelism
2771:Integrated memory controller
2753:Translation lookaside buffer
1952:Memory dependence prediction
1395:Random-access stored program
1348:Probabilistic Turing machine
1213:Category: Parallel computing
328:10.1007/978-3-642-28151-8_27
203:memory dependence prediction
161:concepts, in which multiple
7:
2227:Synaptic updates per second
258:
10:
3198:
2631:Heterogeneous architecture
1553:Orthogonal instruction set
1323:Alternating Turing machine
1311:Quantum cellular automaton
520:High-performance computing
291:"The History of Computing"
157:, and the closely related
3121:Microprocessor chronology
3108:
3084:Dynamic frequency scaling
3057:
2993:
2931:
2885:
2837:
2792:
2712:
2639:
2608:
2513:
2434:
2398:
2352:
2252:
2239:Cache performance metrics
2178:
2112:
2062:
1973:
1964:
1937:
1892:
1859:
1831:
1822:
1642:
1545:
1534:
1405:
1261:
1202:
1154:Automatic parallelization
1146:
1008:
848:
798:
790:Application checkpointing
752:
716:
660:
604:
553:
492:
3136:Hardware security module
2479:Digital signal processor
2456:Graphics processing unit
2268:Graphics processing unit
270:Memory-level parallelism
207:cache latency prediction
107:
26:Atanasoff–Berry computer
3089:Dynamic voltage scaling
2872:Memory address register
2766:Branch target predictor
2730:Address generation unit
2473:Physics processing unit
2262:Central processing unit
2221:Transactions per second
2209:Instructions per second
2132:Array processing (SIMT)
1276:Stored-program computer
1169:Embarrassingly parallel
1164:Deterministic algorithm
244:IBM System/360 Model 91
3177:Instruction processing
2895:Hardwired control unit
2777:Memory management unit
2742:Memory management unit
2491:Secure cryptoprocessor
2485:Tensor Processing Unit
2467:Vision processing unit
2201:Cycles per instruction
2195:Instructions per cycle
2142:Associative processing
1833:Instruction pipelining
1255:Processor technologies
884:Associative processing
840:Non-blocking algorithm
646:Clustered multi-thread
232:Dataflow architectures
222:instruction scheduling
169:Out-of-order execution
145:Instruction pipelining
29:
2978:Sum-addressed decoder
2724:Arithmetic logic unit
1851:Classic RISC pipeline
1805:Epiphany architecture
1652:Motorola 68000 series
1000:Hardware acceleration
913:Superscalar processor
903:Dataflow architecture
500:Distributed computing
194:Speculative execution
24:
3099:Performance per watt
2677:replacement policies
2343:Package on a package
2233:Performance per watt
2137:Pipelined processing
1907:Tomasulo's algorithm
1712:Clipper architecture
1568:Application-specific
1281:Finite-state machine
879:Pipelined processing
828:Explicit parallelism
823:Implicit parallelism
813:Dataflow programming
131:scientific computing
3131:Digital electronics
2784:Instruction decoder
2736:Floating-point unit
2390:Soft microprocessor
2337:System in a package
1912:Reservation station
1442:Transport-triggered
1103:Parallel Extensions
908:Pipelined processor
226:register allocation
3182:Parallel computing
3003:Integrated circuit
2847:Processor register
2501:Baseband processor
1846:Operand forwarding
1306:Cellular automaton
977:Massively parallel
955:distributed shared
775:Cache invalidation
739:Instruction window
530:Manycore processor
510:Massively parallel
505:Parallel computing
486:Parallel computing
448:2016-03-04 at the
236:TRIPS architecture
67:of execution of a
30:
3164:
3163:
3053:
3052:
2672:Instruction cache
2662:Scratchpad memory
2509:
2508:
2496:Network processor
2425:Network on a chip
2380:Ultra-low-voltage
2331:Multi-chip module
2174:
2173:
1960:
1959:
1947:Branch prediction
1924:Register renaming
1818:
1817:
1800:VISC architecture
1622:Quantum computing
1617:VISC architecture
1499:Secondary storage
1415:Microarchitecture
1375:Register machines
1221:
1220:
1174:Parallel slowdown
808:Stream processing
698:Karp–Flatt metric
404:978-1-4899-7795-3
337:978-3-642-28150-1
213:Branch prediction
188:Register renaming
3189:
3126:Processor design
3018:Power management
2900:Instruction unit
2761:Branch predictor
2710:
2709:
2408:System on a chip
2350:
2349:
2190:Transistor count
2114:Flynn's taxonomy
1971:
1970:
1829:
1828:
1632:Addressing modes
1543:
1542:
1489:Memory hierarchy
1353:Hypercomputation
1271:Abstract machine
1248:
1241:
1234:
1225:
1224:
1210:
1209:
1184:Software lockout
983:Computer cluster
918:Vector processor
873:Array processing
858:Flynn's taxonomy
765:Memory coherence
540:Computer network
479:
472:
465:
456:
455:
416:
380:
375:
369:
368:
366:
356:
350:
349:
311:
305:
304:
302:
301:
287:
199:value prediction
182:instruction sets
49:computer program
3197:
3196:
3192:
3191:
3190:
3188:
3187:
3186:
3167:
3166:
3165:
3160:
3146:Tick–tock model
3104:
3060:
3049:
2989:
2973:Address decoder
2927:
2881:
2877:Program counter
2852:Status register
2833:
2788:
2748:Load–store unit
2715:
2708:
2635:
2604:
2505:
2462:Image processor
2437:
2430:
2400:
2394:
2370:Microcontroller
2360:Embedded system
2348:
2248:
2181:
2170:
2108:
2058:
1956:
1933:
1917:Re-order buffer
1888:
1869:Data dependency
1855:
1814:
1644:
1638:
1537:
1536:Instruction set
1530:
1516:Multiprocessing
1484:Cache hierarchy
1477:Register/memory
1401:
1301:Queue automaton
1257:
1252:
1222:
1217:
1198:
1142:
1048:Coarray Fortran
1004:
988:Beowulf cluster
844:
794:
785:Synchronization
770:Cache coherence
760:Multiprocessing
748:
712:
693:Cost efficiency
688:Gustafson's law
656:
600:
549:
525:Multiprocessing
515:Cloud computing
488:
483:
450:Wayback Machine
424:
405:
389:
387:Further reading
384:
383:
376:
372:
357:
353:
338:
312:
308:
299:
297:
289:
288:
284:
279:
265:Data dependency
261:
249:multiprocessing
163:execution units
111:
110:
57:
17:
12:
11:
5:
3195:
3185:
3184:
3179:
3162:
3161:
3159:
3158:
3153:
3151:Pin grid array
3148:
3143:
3138:
3133:
3128:
3123:
3118:
3112:
3110:
3106:
3105:
3103:
3102:
3096:
3091:
3086:
3081:
3076:
3071:
3065:
3063:
3055:
3054:
3051:
3050:
3048:
3047:
3042:
3037:
3032:
3027:
3022:
3021:
3020:
3015:
3010:
2999:
2997:
2991:
2990:
2988:
2987:
2985:Barrel shifter
2982:
2981:
2980:
2975:
2968:Binary decoder
2965:
2964:
2963:
2953:
2948:
2943:
2937:
2935:
2929:
2928:
2926:
2925:
2920:
2912:
2907:
2902:
2897:
2891:
2889:
2883:
2882:
2880:
2879:
2874:
2869:
2864:
2859:
2857:Stack register
2854:
2849:
2843:
2841:
2835:
2834:
2832:
2831:
2830:
2829:
2824:
2814:
2809:
2804:
2798:
2796:
2790:
2789:
2787:
2786:
2781:
2780:
2779:
2768:
2763:
2758:
2757:
2756:
2750:
2739:
2733:
2727:
2720:
2718:
2707:
2706:
2701:
2696:
2691:
2686:
2685:
2684:
2679:
2674:
2669:
2664:
2659:
2649:
2643:
2641:
2637:
2636:
2634:
2633:
2628:
2623:
2618:
2612:
2610:
2606:
2605:
2603:
2602:
2601:
2600:
2590:
2585:
2580:
2575:
2570:
2565:
2560:
2555:
2550:
2545:
2540:
2535:
2530:
2525:
2519:
2517:
2511:
2510:
2507:
2506:
2504:
2503:
2498:
2493:
2488:
2482:
2476:
2470:
2464:
2459:
2453:
2451:AI accelerator
2448:
2442:
2440:
2432:
2431:
2429:
2428:
2422:
2417:
2414:Multiprocessor
2411:
2404:
2402:
2396:
2395:
2393:
2392:
2387:
2382:
2377:
2372:
2367:
2365:Microprocessor
2362:
2356:
2354:
2353:By application
2347:
2346:
2340:
2334:
2328:
2323:
2318:
2313:
2308:
2303:
2298:
2296:Tile processor
2293:
2288:
2283:
2278:
2277:
2276:
2265:
2258:
2256:
2250:
2249:
2247:
2246:
2241:
2236:
2230:
2224:
2218:
2212:
2206:
2205:
2204:
2192:
2186:
2184:
2176:
2175:
2172:
2171:
2169:
2168:
2167:
2166:
2156:
2151:
2150:
2149:
2144:
2139:
2134:
2124:
2118:
2116:
2110:
2109:
2107:
2106:
2101:
2096:
2091:
2090:
2089:
2084:
2082:Hyperthreading
2074:
2068:
2066:
2064:Multithreading
2060:
2059:
2057:
2056:
2051:
2046:
2045:
2044:
2034:
2033:
2032:
2027:
2017:
2016:
2015:
2010:
2000:
1995:
1994:
1993:
1988:
1977:
1975:
1968:
1962:
1961:
1958:
1957:
1955:
1954:
1949:
1943:
1941:
1935:
1934:
1932:
1931:
1926:
1921:
1920:
1919:
1914:
1904:
1898:
1896:
1890:
1889:
1887:
1886:
1881:
1876:
1871:
1865:
1863:
1857:
1856:
1854:
1853:
1848:
1843:
1841:Pipeline stall
1837:
1835:
1826:
1820:
1819:
1816:
1815:
1813:
1812:
1807:
1802:
1797:
1794:
1793:
1792:
1790:z/Architecture
1787:
1782:
1777:
1769:
1764:
1759:
1754:
1749:
1744:
1739:
1734:
1729:
1724:
1719:
1714:
1709:
1708:
1707:
1702:
1697:
1689:
1684:
1679:
1674:
1669:
1664:
1659:
1654:
1648:
1646:
1640:
1639:
1637:
1636:
1635:
1634:
1624:
1619:
1614:
1609:
1604:
1599:
1594:
1593:
1592:
1582:
1581:
1580:
1570:
1565:
1560:
1555:
1549:
1547:
1540:
1532:
1531:
1529:
1528:
1523:
1518:
1513:
1508:
1503:
1502:
1501:
1496:
1494:Virtual memory
1486:
1481:
1480:
1479:
1474:
1469:
1464:
1454:
1449:
1444:
1439:
1434:
1433:
1432:
1422:
1417:
1411:
1409:
1403:
1402:
1400:
1399:
1398:
1397:
1392:
1387:
1382:
1372:
1367:
1362:
1361:
1360:
1355:
1350:
1345:
1340:
1335:
1330:
1325:
1318:Turing machine
1315:
1314:
1313:
1308:
1303:
1298:
1293:
1288:
1278:
1273:
1267:
1265:
1259:
1258:
1251:
1250:
1243:
1236:
1228:
1219:
1218:
1216:
1215:
1203:
1200:
1199:
1197:
1196:
1191:
1186:
1181:
1179:Race condition
1176:
1171:
1166:
1161:
1156:
1150:
1148:
1144:
1143:
1141:
1140:
1135:
1130:
1125:
1120:
1115:
1110:
1105:
1100:
1095:
1090:
1085:
1080:
1075:
1070:
1065:
1060:
1055:
1050:
1045:
1040:
1035:
1030:
1025:
1020:
1014:
1012:
1006:
1005:
1003:
1002:
997:
992:
991:
990:
980:
974:
973:
972:
967:
962:
957:
952:
947:
937:
936:
935:
930:
923:Multiprocessor
920:
915:
910:
905:
900:
899:
898:
893:
888:
887:
886:
881:
876:
865:
854:
852:
846:
845:
843:
842:
837:
836:
835:
830:
825:
815:
810:
804:
802:
796:
795:
793:
792:
787:
782:
777:
772:
767:
762:
756:
754:
750:
749:
747:
746:
741:
736:
731:
726:
720:
718:
714:
713:
711:
710:
705:
700:
695:
690:
685:
680:
675:
670:
664:
662:
658:
657:
655:
654:
652:Hardware scout
649:
643:
638:
633:
627:
622:
616:
610:
608:
606:Multithreading
602:
601:
599:
598:
593:
588:
583:
578:
573:
568:
563:
557:
555:
551:
550:
548:
547:
545:Systolic array
542:
537:
532:
527:
522:
517:
512:
507:
502:
496:
494:
490:
489:
482:
481:
474:
467:
459:
453:
452:
440:
435:
430:
423:
422:External links
420:
419:
418:
403:
388:
385:
382:
381:
370:
351:
336:
306:
281:
280:
278:
275:
274:
273:
267:
260:
257:
253:multithreading
217:
216:
210:
191:
185:
166:
148:
108:
56:
53:
15:
9:
6:
4:
3:
2:
3194:
3183:
3180:
3178:
3175:
3174:
3172:
3157:
3154:
3152:
3149:
3147:
3144:
3142:
3139:
3137:
3134:
3132:
3129:
3127:
3124:
3122:
3119:
3117:
3114:
3113:
3111:
3107:
3100:
3097:
3095:
3092:
3090:
3087:
3085:
3082:
3080:
3077:
3075:
3072:
3070:
3067:
3066:
3064:
3062:
3056:
3046:
3043:
3041:
3038:
3036:
3033:
3031:
3028:
3026:
3023:
3019:
3016:
3014:
3011:
3009:
3006:
3005:
3004:
3001:
3000:
2998:
2996:
2992:
2986:
2983:
2979:
2976:
2974:
2971:
2970:
2969:
2966:
2962:
2959:
2958:
2957:
2954:
2952:
2949:
2947:
2946:Demultiplexer
2944:
2942:
2939:
2938:
2936:
2934:
2930:
2924:
2921:
2919:
2916:
2913:
2911:
2908:
2906:
2903:
2901:
2898:
2896:
2893:
2892:
2890:
2888:
2884:
2878:
2875:
2873:
2870:
2868:
2867:Memory buffer
2865:
2863:
2862:Register file
2860:
2858:
2855:
2853:
2850:
2848:
2845:
2844:
2842:
2840:
2836:
2828:
2825:
2823:
2820:
2819:
2818:
2815:
2813:
2810:
2808:
2805:
2803:
2802:Combinational
2800:
2799:
2797:
2795:
2791:
2785:
2782:
2778:
2775:
2774:
2772:
2769:
2767:
2764:
2762:
2759:
2754:
2751:
2749:
2746:
2745:
2743:
2740:
2737:
2734:
2731:
2728:
2725:
2722:
2721:
2719:
2717:
2711:
2705:
2702:
2700:
2697:
2695:
2692:
2690:
2687:
2683:
2680:
2678:
2675:
2673:
2670:
2668:
2665:
2663:
2660:
2658:
2655:
2654:
2653:
2650:
2648:
2645:
2644:
2642:
2638:
2632:
2629:
2627:
2624:
2622:
2619:
2617:
2614:
2613:
2611:
2607:
2599:
2596:
2595:
2594:
2591:
2589:
2586:
2584:
2581:
2579:
2576:
2574:
2571:
2569:
2566:
2564:
2561:
2559:
2556:
2554:
2551:
2549:
2546:
2544:
2541:
2539:
2536:
2534:
2531:
2529:
2526:
2524:
2521:
2520:
2518:
2516:
2512:
2502:
2499:
2497:
2494:
2492:
2489:
2486:
2483:
2480:
2477:
2474:
2471:
2468:
2465:
2463:
2460:
2457:
2454:
2452:
2449:
2447:
2444:
2443:
2441:
2439:
2433:
2426:
2423:
2421:
2418:
2415:
2412:
2409:
2406:
2405:
2403:
2397:
2391:
2388:
2386:
2383:
2381:
2378:
2376:
2373:
2371:
2368:
2366:
2363:
2361:
2358:
2357:
2355:
2351:
2344:
2341:
2338:
2335:
2332:
2329:
2327:
2324:
2322:
2319:
2317:
2314:
2312:
2309:
2307:
2304:
2302:
2299:
2297:
2294:
2292:
2289:
2287:
2284:
2282:
2279:
2275:
2272:
2271:
2269:
2266:
2263:
2260:
2259:
2257:
2255:
2251:
2245:
2242:
2240:
2237:
2234:
2231:
2228:
2225:
2222:
2219:
2216:
2213:
2210:
2207:
2202:
2199:
2198:
2196:
2193:
2191:
2188:
2187:
2185:
2183:
2177:
2165:
2162:
2161:
2160:
2157:
2155:
2152:
2148:
2145:
2143:
2140:
2138:
2135:
2133:
2130:
2129:
2128:
2125:
2123:
2120:
2119:
2117:
2115:
2111:
2105:
2102:
2100:
2097:
2095:
2092:
2088:
2085:
2083:
2080:
2079:
2078:
2075:
2073:
2070:
2069:
2067:
2065:
2061:
2055:
2052:
2050:
2047:
2043:
2040:
2039:
2038:
2035:
2031:
2028:
2026:
2023:
2022:
2021:
2018:
2014:
2011:
2009:
2006:
2005:
2004:
2001:
1999:
1996:
1992:
1989:
1987:
1984:
1983:
1982:
1979:
1978:
1976:
1972:
1969:
1967:
1963:
1953:
1950:
1948:
1945:
1944:
1942:
1940:
1936:
1930:
1927:
1925:
1922:
1918:
1915:
1913:
1910:
1909:
1908:
1905:
1903:
1902:Scoreboarding
1900:
1899:
1897:
1895:
1891:
1885:
1884:False sharing
1882:
1880:
1877:
1875:
1872:
1870:
1867:
1866:
1864:
1862:
1858:
1852:
1849:
1847:
1844:
1842:
1839:
1838:
1836:
1834:
1830:
1827:
1825:
1821:
1811:
1808:
1806:
1803:
1801:
1798:
1795:
1791:
1788:
1786:
1783:
1781:
1778:
1776:
1773:
1772:
1770:
1768:
1765:
1763:
1760:
1758:
1755:
1753:
1750:
1748:
1745:
1743:
1740:
1738:
1735:
1733:
1730:
1728:
1725:
1723:
1720:
1718:
1715:
1713:
1710:
1706:
1703:
1701:
1698:
1696:
1693:
1692:
1690:
1688:
1685:
1683:
1680:
1678:
1677:Stanford MIPS
1675:
1673:
1670:
1668:
1665:
1663:
1660:
1658:
1655:
1653:
1650:
1649:
1647:
1641:
1633:
1630:
1629:
1628:
1625:
1623:
1620:
1618:
1615:
1613:
1610:
1608:
1605:
1603:
1600:
1598:
1595:
1591:
1588:
1587:
1586:
1583:
1579:
1576:
1575:
1574:
1571:
1569:
1566:
1564:
1561:
1559:
1556:
1554:
1551:
1550:
1548:
1544:
1541:
1539:
1538:architectures
1533:
1527:
1524:
1522:
1519:
1517:
1514:
1512:
1509:
1507:
1506:Heterogeneous
1504:
1500:
1497:
1495:
1492:
1491:
1490:
1487:
1485:
1482:
1478:
1475:
1473:
1470:
1468:
1465:
1463:
1460:
1459:
1458:
1457:Memory access
1455:
1453:
1450:
1448:
1445:
1443:
1440:
1438:
1435:
1431:
1428:
1427:
1426:
1423:
1421:
1418:
1416:
1413:
1412:
1410:
1408:
1404:
1396:
1393:
1391:
1390:Random-access
1388:
1386:
1383:
1381:
1378:
1377:
1376:
1373:
1371:
1370:Stack machine
1368:
1366:
1363:
1359:
1356:
1354:
1351:
1349:
1346:
1344:
1341:
1339:
1336:
1334:
1331:
1329:
1326:
1324:
1321:
1320:
1319:
1316:
1312:
1309:
1307:
1304:
1302:
1299:
1297:
1294:
1292:
1289:
1287:
1286:with datapath
1284:
1283:
1282:
1279:
1277:
1274:
1272:
1269:
1268:
1266:
1264:
1260:
1256:
1249:
1244:
1242:
1237:
1235:
1230:
1229:
1226:
1214:
1205:
1204:
1201:
1195:
1192:
1190:
1187:
1185:
1182:
1180:
1177:
1175:
1172:
1170:
1167:
1165:
1162:
1160:
1157:
1155:
1152:
1151:
1149:
1145:
1139:
1136:
1134:
1131:
1129:
1126:
1124:
1121:
1119:
1116:
1114:
1111:
1109:
1106:
1104:
1101:
1099:
1096:
1094:
1091:
1089:
1086:
1084:
1081:
1079:
1076:
1074:
1071:
1069:
1068:Global Arrays
1066:
1064:
1061:
1059:
1056:
1054:
1051:
1049:
1046:
1044:
1041:
1039:
1036:
1034:
1031:
1029:
1026:
1024:
1021:
1019:
1016:
1015:
1013:
1011:
1007:
1001:
998:
996:
995:Grid computer
993:
989:
986:
985:
984:
981:
978:
975:
971:
968:
966:
963:
961:
958:
956:
953:
951:
948:
946:
943:
942:
941:
938:
934:
931:
929:
926:
925:
924:
921:
919:
916:
914:
911:
909:
906:
904:
901:
897:
894:
892:
889:
885:
882:
880:
877:
874:
871:
870:
869:
866:
864:
861:
860:
859:
856:
855:
853:
851:
847:
841:
838:
834:
831:
829:
826:
824:
821:
820:
819:
816:
814:
811:
809:
806:
805:
803:
801:
797:
791:
788:
786:
783:
781:
778:
776:
773:
771:
768:
766:
763:
761:
758:
757:
755:
751:
745:
742:
740:
737:
735:
732:
730:
727:
725:
722:
721:
719:
715:
709:
706:
704:
701:
699:
696:
694:
691:
689:
686:
684:
681:
679:
676:
674:
671:
669:
666:
665:
663:
659:
653:
650:
647:
644:
642:
639:
637:
634:
631:
628:
626:
623:
620:
617:
615:
612:
611:
609:
607:
603:
597:
594:
592:
589:
587:
584:
582:
579:
577:
574:
572:
569:
567:
564:
562:
559:
558:
556:
552:
546:
543:
541:
538:
536:
533:
531:
528:
526:
523:
521:
518:
516:
513:
511:
508:
506:
503:
501:
498:
497:
495:
491:
487:
480:
475:
473:
468:
466:
461:
460:
457:
451:
447:
444:
441:
439:
436:
434:
431:
429:
426:
425:
414:
413:1-4899-7795-3
410:
406:
400:
396:
391:
390:
379:
374:
365:
364:
355:
347:
343:
339:
333:
329:
325:
321:
317:
310:
296:
295:mason.gmu.edu
292:
286:
282:
271:
268:
266:
263:
262:
256:
254:
250:
245:
239:
237:
233:
229:
227:
223:
214:
211:
208:
204:
200:
195:
192:
189:
186:
183:
179:
175:
170:
167:
164:
160:
156:
152:
149:
146:
143:
142:
141:
138:
136:
132:
127:
124:
120:
115:
106:
103:
101:
97:
93:
87:
85:
81:
76:
74:
70:
66:
62:
52:
50:
46:
42:
38:
34:
27:
23:
19:
3156:Chip carrier
3094:Clock gating
3013:Mixed-signal
2910:Write buffer
2887:Control unit
2699:Clock signal
2438:accelerators
2420:Cypress PSoC
2077:Simultaneous
1997:
1894:Out-of-order
1526:Neuromorphic
1407:Architecture
1365:Belt machine
1358:Zeno machine
1291:Hierarchical
753:Coordination
683:Amdahl's law
619:Simultaneous
565:
394:
373:
362:
354:
319:
309:
298:. Retrieved
294:
285:
240:
230:
218:
178:compile time
139:
135:cryptography
128:
116:
112:
104:
88:
77:
58:
45:instructions
36:
32:
31:
18:
2941:Multiplexer
2905:Data buffer
2616:Single-core
2588:bit slicing
2446:Coprocessor
2301:Coprocessor
2182:performance
2104:Cooperative
2094:Speculative
2054:Distributed
2013:Superscalar
1998:Instruction
1966:Parallelism
1939:Speculative
1771:System/3x0
1643:Instruction
1420:Von Neumann
1333:Post–Turing
1189:Scalability
950:distributed
833:Concurrency
800:Programming
641:Cooperative
630:Speculative
566:Instruction
417:(276 pages)
174:dynamically
153:execution,
151:Superscalar
61:concurrency
3171:Categories
3061:management
2956:Multiplier
2817:Logic gate
2807:Sequential
2714:Functional
2694:Clock rate
2667:Data cache
2640:Components
2621:Multi-core
2609:Core count
2099:Preemptive
2003:Pipelining
1986:Bit-serial
1929:Wide-issue
1874:Structural
1796:Tilera ISA
1762:MicroBlaze
1732:ETRAX CRIS
1627:Comparison
1472:Load–store
1452:Endianness
1194:Starvation
933:asymmetric
668:PRAM model
636:Preemptive
300:2019-03-24
277:References
117:A goal of
55:Discussion
2995:Circuitry
2915:Microcode
2839:Registers
2682:coherence
2657:CPU cache
2515:Word size
2180:Processor
1824:Execution
1727:DEC Alpha
1705:Power ISA
1521:Cognitive
1328:Universal
928:symmetric
673:PEM model
123:processor
39:) is the
2933:Datapath
2626:Manycore
2598:variable
2436:Hardware
2072:Temporal
1752:OpenRISC
1447:Cellular
1437:Dataflow
1430:modified
1159:Deadlock
1147:Problems
1113:pthreads
1093:OpenHMPP
1018:Ateji PX
979:computer
850:Hardware
717:Elements
703:Slowdown
614:Temporal
596:Pipeline
446:Archived
346:26665479
259:See also
119:compiler
92:compiler
84:software
80:hardware
41:parallel
3109:Related
3040:Quantum
3030:Digital
3025:Boolean
2923:Counter
2822:Quantum
2583:512-bit
2578:256-bit
2573:128-bit
2416:(MPSoC)
2401:on chip
2399:Systems
2217:(FLOPS)
2030:Process
1879:Control
1861:Hazards
1747:Itanium
1742:Unicore
1700:PowerPC
1425:Harvard
1385:Pointer
1380:Counter
1338:Quantum
1118:RaftLib
1098:OpenACC
1073:GPUOpen
1063:C++ AMP
1038:Charm++
780:Barrier
724:Process
708:Speedup
493:General
100:Itanium
96:Pentium
69:process
3045:Switch
3035:Analog
2773:(IMC)
2744:(MMU)
2593:others
2568:64-bit
2563:48-bit
2558:32-bit
2553:24-bit
2548:16-bit
2543:15-bit
2538:12-bit
2375:Mobile
2291:Stream
2286:Barrel
2281:Vector
2270:(GPU)
2229:(SUPS)
2197:(IPC)
2049:Memory
2042:Vector
2025:Thread
2008:Scalar
1810:Others
1757:RISC-V
1722:SuperH
1691:Power
1687:MIPS-X
1662:PDP-11
1511:Fabric
1263:Models
1211:
1088:OpenCL
1083:OpenMP
1028:Chapel
945:shared
940:Memory
875:(SIMT)
818:Models
729:Thread
661:Theory
632:(SpMT)
586:Memory
571:Thread
554:Levels
411:
401:
344:
334:
65:thread
3101:(PPW)
3059:Power
2951:Adder
2827:Array
2794:Logic
2755:(TLB)
2738:(FPU)
2732:(AGU)
2726:(ALU)
2716:units
2652:Cache
2533:8-bit
2528:4-bit
2523:1-bit
2487:(TPU)
2481:(DSP)
2475:(PPU)
2469:(VPU)
2458:(GPU)
2427:(NoC)
2410:(SoC)
2345:(PoP)
2339:(SiP)
2333:(MCM)
2274:GPGPU
2264:(CPU)
2254:Types
2235:(PPW)
2223:(TPS)
2211:(IPS)
2203:(CPI)
1974:Level
1785:S/390
1780:S/370
1775:S/360
1717:SPARC
1695:POWER
1578:TRIPS
1546:Types
1058:Dryad
1023:Boost
744:Array
734:Fiber
648:(CMT)
621:(SMT)
535:GPGPU
342:S2CID
272:(MLP)
47:in a
3079:ACPI
2812:Glue
2704:FIFO
2647:Core
2385:ASIP
2326:CPLD
2321:FPOA
2316:FPGA
2311:ASIC
2164:SPMD
2159:MIMD
2154:MISD
2147:SWAR
2127:SIMD
2122:SISD
2037:Data
2020:Task
1991:Word
1737:M32R
1682:MIPS
1645:sets
1612:ZISC
1607:NISC
1602:OISC
1597:MISC
1590:EPIC
1585:VLIW
1573:EDGE
1563:RISC
1558:CISC
1467:HUMA
1462:NUMA
1123:ROCm
1053:CUDA
1043:Cilk
1010:APIs
970:COMA
965:NUMA
896:MIMD
891:MISD
868:SIMD
863:SISD
591:Loop
581:Data
576:Task
409:ISBN
399:ISBN
332:ISBN
251:and
205:and
155:VLIW
121:and
82:and
3074:APM
3069:PMU
2961:CPU
2918:ROM
2689:Bus
2306:PAL
1981:Bit
1767:LMC
1672:ARM
1667:x86
1657:VAX
1138:ZPL
1133:TBB
1128:UPC
1108:PVM
1078:MPI
1033:HPX
960:UMA
561:Bit
324:doi
86:.
73:CPU
37:ILP
3173::
3008:3D
407:.
340:.
330:.
318:.
293:.
255:.
238:.
224:,
201:,
1247:e
1240:t
1233:v
478:e
471:t
464:v
415:.
367:.
348:.
326::
303:.
209:.
35:(
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.