Instruction-level parallelism

22: 1208: 247:

dissipation costs are disproportionate. Moreover, the complexity and often the latency of the underlying hardware structures results in reduced operating frequency further reducing any benefits. Hence, the aforementioned techniques prove inadequate to keep the CPU from stalling for the off-chip data. Instead, the industry is heading towards exploiting higher levels of parallelism that can be exploited through techniques such as

196:

which allows the execution of complete instructions or parts of instructions before being certain whether this execution should take place. A commonly used form of speculative execution is control flow speculation where instructions past a control flow instruction (e.g., a branch) are executed before

113:

Operation 3 depends on the results of operations 1 and 2, so it cannot be calculated until both of them are completed. However, operations 1 and 2 do not depend on any other operation, so they can be calculated simultaneously. If we assume that each operation can be completed in one unit of time

246:

used ILP techniques to overcome the limitations imposed by a relatively small register file). Presently, a cache miss penalty to main memory costs several hundreds of CPU cycles. While in principle it is possible to use ILP to tolerate even such memory latencies, the associated resource and power

125:

designers is to identify and take advantage of as much ILP as possible. Ordinary programs are typically written under a sequential execution model where instructions execute one after the other and in the order specified by the programmer. ILP allows the compiler and the processor to overlap the

219:

It is known that the ILP is exploited by both the compiler and hardware support but the compiler also provides inherent and implicit ILP in programs to hardware by compile-time optimizations. Some optimization techniques for extracting available ILP in programs would include

89:

Hardware level works upon dynamic parallelism, whereas the software level works on static parallelism. Dynamic parallelism means the processor decides at run time which instructions to execute in parallel, whereas static parallelism means the

171:

where instructions execute in any order that does not violate data dependencies. Note that this technique is independent of both pipelining and superscalar execution. Current implementations of out-of-order execution

241:

In recent years, ILP techniques have been used to provide performance improvements in spite of the growing disparity between processor operating frequencies and memory access times (early ILP designs such as the

190:

which refers to a technique used to avoid unnecessary serialization of program operations imposed by the reuse of registers by those operations, used to enable out-of-order execution.

197:

the target of the control flow instruction is determined. Several other forms of speculative execution have been proposed and are in use including speculative execution driven by

206: 176:(i.e., while the program is executing and without any help from the compiler) extract ILP from ordinary programs. An alternative is to extract this parallelism at 2273: 534: 180:

and somehow convey this information to the hardware. Due to the complexity of scaling the out-of-order execution technique, the industry has re-examined

198: 1245: 2384: 1567: 2086: 624: 1364: 476: 2243: 1809: 1626: 2597: 1589: 158: 2238: 75:'s core in a strict alternation, or in true parallelism if there are enough CPU cores, ideally one core for each runnable thread. 2310: 445: 2592: 2063: 605: 427: 252: 215:

which is used to avoid stalling for control dependencies to be resolved. Branch prediction is used with speculative execution.

402: 377: 335: 645: 3007: 2131: 1394: 1238: 872: 3017: 2158: 895: 1285: 784: 173: 640: 2325: 2153: 2126: 1505: 890: 867: 412: 3140: 2703: 1596: 1562: 1557: 1476: 1441: 469: 3176: 3115: 3012: 2413: 2320: 2121: 1342: 1231: 862: 677: 2141: 1860: 1295: 969: 883: 832: 60: 51:. More specifically ILP refers to the average number of instructions run per step of this parallel execution. 2315: 2163: 2136: 1611: 1572: 1429: 1193: 1027: 878: 2752: 2514: 1990: 1951: 1606: 1601: 1535: 1347: 437: 202: 25: 3181: 2379: 2076: 1774: 1471: 1212: 1158: 618: 462: 3029: 2676: 2093: 1584: 1552: 1322: 1310: 1290: 1137: 932: 817: 779: 629: 519: 154: 129:

How much ILP exists in programs is very application specific. In certain fields, such as graphics and

114:

then these three instructions can be completed in a total of two units of time, giving an ILP of 3/2.

3120: 3083: 3073: 1461: 1153: 1132: 1077: 964: 954: 927: 789: 3135: 2542: 2478: 2455: 2305: 2267: 2103: 2053: 2048: 1525: 1419: 1327: 1107: 733: 672: 585: 269: 234:

are another class of architectures where ILP is explicitly specified, for a recent example see the

126:

execution of multiple instructions or even to change the order in which instructions are executed.

3088: 2871: 2765: 2729: 2646: 2630: 2472: 2261: 2220: 2208: 2071: 1985: 1906: 1671: 1332: 1275: 1168: 1163: 1022: 613: 243: 122: 72: 2894: 2866: 2776: 2741: 2490: 2484: 2466: 2200: 2194: 2098: 2002: 1893: 1832: 1694: 1337: 907: 839: 743: 635: 590: 221: 168: 144: 3068: 2977: 2723: 2435: 2253: 2012: 1980: 1938: 1850: 1651: 1466: 1456: 1446: 1436: 1406: 1389: 1254: 999: 959: 912: 902: 697: 560: 499: 231: 193: 21: 3098: 3034: 2620: 2342: 2232: 2179: 1711: 1424: 1280: 1262: 939: 827: 822: 812: 799: 595: 130: 8: 3145: 3130: 2950: 2801: 2783: 2747: 2735: 2389: 2336: 2113: 2029: 1911: 1766: 1661: 1520: 1102: 1057: 857: 723: 315: 225: 68: 3002: 2994: 2846: 2821: 2625: 2500: 2024: 1965: 1845: 1577: 1305: 1127: 976: 949: 774: 738: 728: 687: 529: 509: 504: 485: 341: 290: 235: 64: 40: 442: 2955: 2922: 2838: 2770: 2671: 2661: 2651: 2582: 2577: 2572: 2495: 2424: 2330: 2290: 1923: 1873: 1823: 1799: 1681: 1621: 1616: 1498: 1414: 1173: 849: 807: 702: 432: 408: 398: 361: 331: 212: 187: 79: 345: 3125: 3058: 3044: 2899: 2806: 2760: 2567: 2562: 2557: 2552: 2547: 2537: 2407: 2374: 2285: 2280: 2189: 2041: 2036: 2019: 2007: 1946: 1510: 1488: 1374: 1352: 1270: 1183: 982: 917: 764: 580: 575: 570: 539: 323: 48: 16:

Ability of computer instructions to be executed simultaneously with correct results

71:. On the other hand, concurrency involves the assignment of multiple threads to a 3039: 3024: 2972: 2876: 2851: 2688: 2681: 2532: 2527: 2522: 2461: 2369: 2359: 2081: 1916: 1868: 1631: 1515: 1483: 1384: 1379: 1300: 1047: 987: 922: 769: 759: 692: 682: 524: 514: 449: 327: 264: 248: 181: 44: 393:

Aiken, Alex; Banerjee, Utpal; Kejariwal, Arun; Nicolau, Alexandru (2016-11-30).

3150: 2984: 2967: 2960: 2856: 2713: 2450: 2364: 2295: 1878: 1840: 1789: 1784: 1779: 1493: 1317: 1178: 994: 651: 544: 162: 3170: 2945: 2861: 1901: 1883: 1676: 1369: 1067: 944: 1804: 3155: 3093: 2909: 2886: 2698: 2419: 1357: 667: 314:

Goossens, Bernard; Langlois, Philippe; Parello, David; Petit, Eric (2012).

177: 134: 2940: 2904: 2615: 2587: 2445: 2300: 1223: 1188: 150: 147:

where the execution of multiple instructions can be partially overlapped.

438:

https://www.scribd.com/doc/33700101/Instruction-Level-Parallelism#scribd

184:

which explicitly encode multiple independent operations per instruction.

2826: 2816: 2811: 2793: 2693: 2666: 1928: 1761: 1731: 1451: 322:. Lecture Notes in Computer Science. Vol. 7133. pp. 270–281. 98:

processor works on the dynamic sequence of parallel execution, but the

2917: 2914: 2656: 1726: 1704: 1062: 1037: 454: 140:

Micro-architectural techniques that are used to exploit ILP include:

2932: 1751: 1112: 1092: 1017: 118: 91: 83: 1741: 1699: 1117: 1097: 1072: 707: 99: 95: 1756: 1721: 1686: 1087: 1082: 2214: 1746: 1716: 392: 313: 3078: 2226: 2146: 1736: 1122: 1052: 1042: 78:

There are two approaches to instruction-level parallelism:

1666: 1656: 1032: 1009: 133:

the amount can be very large. However, workloads such as

316:"PerPI: A Tool to Measure Instruction Level Parallelism" 94:

decides which instructions to execute in parallel. The

165:

are used to execute multiple instructions in parallel.

433:

Wired magazine article that refers to the above paper

443:

http://www.hpl.hp.com/techreports/92/HPL-92-132.pdf

360: 397:. Professional Computing (1 ed.). Springer. 358: 102:processor works on the static level parallelism. 3168: 359:Hennessy, John L.; Patterson, David A. (1996). 363:Computer Architecture: A Quantitative Approach 1239: 470: 28:, the first computer with parallel processing 228:/renaming, and memory access optimization. 2244:Computer performance by orders of magnitude 43:or simultaneous execution of a sequence of 1253: 1246: 1232: 477: 463: 320:Applied Parallel and Scientific Computing 159:explicitly parallel instruction computing 428:Approaches to addressing the Memory Wall 20: 3169: 484: 1227: 458: 2215:Floating-point operations per second 63:. In ILP there is a single specific 137:may exhibit much less parallelism. 13: 386: 14: 3193: 421: 3141:Semiconductor device fabrication 1207: 1206: 105:Consider the following program: 3116:History of general-purpose CPUs 1343:Nondeterministic Turing machine 678:Analysis of parallel algorithms 1296:Deterministic finite automaton 378:Reflections of the Memory Wall 371: 352: 307: 283: 109:e = a + b f = c + d m = e * f 59:ILP must not be confused with 1: 2087:Simultaneous and heterogenous 625:Simultaneous and heterogenous 395:Instruction Level Parallelism 276: 54: 33:Instruction-level parallelism 2771:Integrated memory controller 2753:Translation lookaside buffer 1952:Memory dependence prediction 1395:Random-access stored program 1348:Probabilistic Turing machine 1213:Category: Parallel computing 328:10.1007/978-3-642-28151-8_27 203:memory dependence prediction 161:concepts, in which multiple 7: 2227:Synaptic updates per second 258: 10: 3198: 2631:Heterogeneous architecture 1553:Orthogonal instruction set 1323:Alternating Turing machine 1311:Quantum cellular automaton 520:High-performance computing 291:"The History of Computing" 157:, and the closely related 3121:Microprocessor chronology 3108: 3084:Dynamic frequency scaling 3057: 2993: 2931: 2885: 2837: 2792: 2712: 2639: 2608: 2513: 2434: 2398: 2352: 2252: 2239:Cache performance metrics 2178: 2112: 2062: 1973: 1964: 1937: 1892: 1859: 1831: 1822: 1642: 1545: 1534: 1405: 1261: 1202: 1154:Automatic parallelization 1146: 1008: 848: 798: 790:Application checkpointing 752: 716: 660: 604: 553: 492: 3136:Hardware security module 2479:Digital signal processor 2456:Graphics processing unit 2268:Graphics processing unit 270:Memory-level parallelism 207:cache latency prediction 107: 26:Atanasoff–Berry computer 3089:Dynamic voltage scaling 2872:Memory address register 2766:Branch target predictor 2730:Address generation unit 2473:Physics processing unit 2262:Central processing unit 2221:Transactions per second 2209:Instructions per second 2132:Array processing (SIMT) 1276:Stored-program computer 1169:Embarrassingly parallel 1164:Deterministic algorithm 244:IBM System/360 Model 91 3177:Instruction processing 2895:Hardwired control unit 2777:Memory management unit 2742:Memory management unit 2491:Secure cryptoprocessor 2485:Tensor Processing Unit 2467:Vision processing unit 2201:Cycles per instruction 2195:Instructions per cycle 2142:Associative processing 1833:Instruction pipelining 1255:Processor technologies 884:Associative processing 840:Non-blocking algorithm 646:Clustered multi-thread 232:Dataflow architectures 222:instruction scheduling 169:Out-of-order execution 145:Instruction pipelining 29: 2978:Sum-addressed decoder 2724:Arithmetic logic unit 1851:Classic RISC pipeline 1805:Epiphany architecture 1652:Motorola 68000 series 1000:Hardware acceleration 913:Superscalar processor 903:Dataflow architecture 500:Distributed computing 194:Speculative execution 24: 3099:Performance per watt 2677:replacement policies 2343:Package on a package 2233:Performance per watt 2137:Pipelined processing 1907:Tomasulo's algorithm 1712:Clipper architecture 1568:Application-specific 1281:Finite-state machine 879:Pipelined processing 828:Explicit parallelism 823:Implicit parallelism 813:Dataflow programming 131:scientific computing 3131:Digital electronics 2784:Instruction decoder 2736:Floating-point unit 2390:Soft microprocessor 2337:System in a package 1912:Reservation station 1442:Transport-triggered 1103:Parallel Extensions 908:Pipelined processor 226:register allocation 3182:Parallel computing 3003:Integrated circuit 2847:Processor register 2501:Baseband processor 1846:Operand forwarding 1306:Cellular automaton 977:Massively parallel 955:distributed shared 775:Cache invalidation 739:Instruction window 530:Manycore processor 510:Massively parallel 505:Parallel computing 486:Parallel computing 448:2016-03-04 at the 236:TRIPS architecture 67:of execution of a 30: 3164: 3163: 3053: 3052: 2672:Instruction cache 2662:Scratchpad memory 2509: 2508: 2496:Network processor 2425:Network on a chip 2380:Ultra-low-voltage 2331:Multi-chip module 2174: 2173: 1960: 1959: 1947:Branch prediction 1924:Register renaming 1818: 1817: 1800:VISC architecture 1622:Quantum computing 1617:VISC architecture 1499:Secondary storage 1415:Microarchitecture 1375:Register machines 1221: 1220: 1174:Parallel slowdown 808:Stream processing 698:Karp–Flatt metric 404:978-1-4899-7795-3 337:978-3-642-28150-1 213:Branch prediction 188:Register renaming 3189: 3126:Processor design 3018:Power management 2900:Instruction unit 2761:Branch predictor 2710: 2709: 2408:System on a chip 2350: 2349: 2190:Transistor count 2114:Flynn's taxonomy 1971: 1970: 1829: 1828: 1632:Addressing modes 1543: 1542: 1489:Memory hierarchy 1353:Hypercomputation 1271:Abstract machine 1248: 1241: 1234: 1225: 1224: 1210: 1209: 1184:Software lockout 983:Computer cluster 918:Vector processor 873:Array processing 858:Flynn's taxonomy 765:Memory coherence 540:Computer network 479: 472: 465: 456: 455: 416: 380: 375: 369: 368: 366: 356: 350: 349: 311: 305: 304: 302: 301: 287: 199:value prediction 182:instruction sets 49:computer program 3197: 3196: 3192: 3191: 3190: 3188: 3187: 3186: 3167: 3166: 3165: 3160: 3146:Tick–tock model 3104: 3060: 3049: 2989: 2973:Address decoder 2927: 2881: 2877:Program counter 2852:Status register 2833: 2788: 2748:Load–store unit 2715: 2708: 2635: 2604: 2505: 2462:Image processor 2437: 2430: 2400: 2394: 2370:Microcontroller 2360:Embedded system 2348: 2248: 2181: 2170: 2108: 2058: 1956: 1933: 1917:Re-order buffer 1888: 1869:Data dependency 1855: 1814: 1644: 1638: 1537: 1536:Instruction set 1530: 1516:Multiprocessing 1484:Cache hierarchy 1477:Register/memory 1401: 1301:Queue automaton 1257: 1252: 1222: 1217: 1198: 1142: 1048:Coarray Fortran 1004: 988:Beowulf cluster 844: 794: 785:Synchronization 770:Cache coherence 760:Multiprocessing 748: 712: 693:Cost efficiency 688:Gustafson's law 656: 600: 549: 525:Multiprocessing 515:Cloud computing 488: 483: 450:Wayback Machine 424: 405: 389: 387:Further reading 384: 383: 376: 372: 357: 353: 338: 312: 308: 299: 297: 289: 288: 284: 279: 265:Data dependency 261: 249:multiprocessing 163:execution units 111: 110: 57: 17: 12: 11: 5: 3195: 3185: 3184: 3179: 3162: 3161: 3159: 3158: 3153: 3151:Pin grid array 3148: 3143: 3138: 3133: 3128: 3123: 3118: 3112: 3110: 3106: 3105: 3103: 3102: 3096: 3091: 3086: 3081: 3076: 3071: 3065: 3063: 3055: 3054: 3051: 3050: 3048: 3047: 3042: 3037: 3032: 3027: 3022: 3021: 3020: 3015: 3010: 2999: 2997: 2991: 2990: 2988: 2987: 2985:Barrel shifter 2982: 2981: 2980: 2975: 2968:Binary decoder 2965: 2964: 2963: 2953: 2948: 2943: 2937: 2935: 2929: 2928: 2926: 2925: 2920: 2912: 2907: 2902: 2897: 2891: 2889: 2883: 2882: 2880: 2879: 2874: 2869: 2864: 2859: 2857:Stack register 2854: 2849: 2843: 2841: 2835: 2834: 2832: 2831: 2830: 2829: 2824: 2814: 2809: 2804: 2798: 2796: 2790: 2789: 2787: 2786: 2781: 2780: 2779: 2768: 2763: 2758: 2757: 2756: 2750: 2739: 2733: 2727: 2720: 2718: 2707: 2706: 2701: 2696: 2691: 2686: 2685: 2684: 2679: 2674: 2669: 2664: 2659: 2649: 2643: 2641: 2637: 2636: 2634: 2633: 2628: 2623: 2618: 2612: 2610: 2606: 2605: 2603: 2602: 2601: 2600: 2590: 2585: 2580: 2575: 2570: 2565: 2560: 2555: 2550: 2545: 2540: 2535: 2530: 2525: 2519: 2517: 2511: 2510: 2507: 2506: 2504: 2503: 2498: 2493: 2488: 2482: 2476: 2470: 2464: 2459: 2453: 2451:AI accelerator 2448: 2442: 2440: 2432: 2431: 2429: 2428: 2422: 2417: 2414:Multiprocessor 2411: 2404: 2402: 2396: 2395: 2393: 2392: 2387: 2382: 2377: 2372: 2367: 2365:Microprocessor 2362: 2356: 2354: 2353:By application 2347: 2346: 2340: 2334: 2328: 2323: 2318: 2313: 2308: 2303: 2298: 2296:Tile processor 2293: 2288: 2283: 2278: 2277: 2276: 2265: 2258: 2256: 2250: 2249: 2247: 2246: 2241: 2236: 2230: 2224: 2218: 2212: 2206: 2205: 2204: 2192: 2186: 2184: 2176: 2175: 2172: 2171: 2169: 2168: 2167: 2166: 2156: 2151: 2150: 2149: 2144: 2139: 2134: 2124: 2118: 2116: 2110: 2109: 2107: 2106: 2101: 2096: 2091: 2090: 2089: 2084: 2082:Hyperthreading 2074: 2068: 2066: 2064:Multithreading 2060: 2059: 2057: 2056: 2051: 2046: 2045: 2044: 2034: 2033: 2032: 2027: 2017: 2016: 2015: 2010: 2000: 1995: 1994: 1993: 1988: 1977: 1975: 1968: 1962: 1961: 1958: 1957: 1955: 1954: 1949: 1943: 1941: 1935: 1934: 1932: 1931: 1926: 1921: 1920: 1919: 1914: 1904: 1898: 1896: 1890: 1889: 1887: 1886: 1881: 1876: 1871: 1865: 1863: 1857: 1856: 1854: 1853: 1848: 1843: 1841:Pipeline stall 1837: 1835: 1826: 1820: 1819: 1816: 1815: 1813: 1812: 1807: 1802: 1797: 1794: 1793: 1792: 1790:z/Architecture 1787: 1782: 1777: 1769: 1764: 1759: 1754: 1749: 1744: 1739: 1734: 1729: 1724: 1719: 1714: 1709: 1708: 1707: 1702: 1697: 1689: 1684: 1679: 1674: 1669: 1664: 1659: 1654: 1648: 1646: 1640: 1639: 1637: 1636: 1635: 1634: 1624: 1619: 1614: 1609: 1604: 1599: 1594: 1593: 1592: 1582: 1581: 1580: 1570: 1565: 1560: 1555: 1549: 1547: 1540: 1532: 1531: 1529: 1528: 1523: 1518: 1513: 1508: 1503: 1502: 1501: 1496: 1494:Virtual memory 1486: 1481: 1480: 1479: 1474: 1469: 1464: 1454: 1449: 1444: 1439: 1434: 1433: 1432: 1422: 1417: 1411: 1409: 1403: 1402: 1400: 1399: 1398: 1397: 1392: 1387: 1382: 1372: 1367: 1362: 1361: 1360: 1355: 1350: 1345: 1340: 1335: 1330: 1325: 1318:Turing machine 1315: 1314: 1313: 1308: 1303: 1298: 1293: 1288: 1278: 1273: 1267: 1265: 1259: 1258: 1251: 1250: 1243: 1236: 1228: 1219: 1218: 1216: 1215: 1203: 1200: 1199: 1197: 1196: 1191: 1186: 1181: 1179:Race condition 1176: 1171: 1166: 1161: 1156: 1150: 1148: 1144: 1143: 1141: 1140: 1135: 1130: 1125: 1120: 1115: 1110: 1105: 1100: 1095: 1090: 1085: 1080: 1075: 1070: 1065: 1060: 1055: 1050: 1045: 1040: 1035: 1030: 1025: 1020: 1014: 1012: 1006: 1005: 1003: 1002: 997: 992: 991: 990: 980: 974: 973: 972: 967: 962: 957: 952: 947: 937: 936: 935: 930: 923:Multiprocessor 920: 915: 910: 905: 900: 899: 898: 893: 888: 887: 886: 881: 876: 865: 854: 852: 846: 845: 843: 842: 837: 836: 835: 830: 825: 815: 810: 804: 802: 796: 795: 793: 792: 787: 782: 777: 772: 767: 762: 756: 754: 750: 749: 747: 746: 741: 736: 731: 726: 720: 718: 714: 713: 711: 710: 705: 700: 695: 690: 685: 680: 675: 670: 664: 662: 658: 657: 655: 654: 652:Hardware scout 649: 643: 638: 633: 627: 622: 616: 610: 608: 606:Multithreading 602: 601: 599: 598: 593: 588: 583: 578: 573: 568: 563: 557: 555: 551: 550: 548: 547: 545:Systolic array 542: 537: 532: 527: 522: 517: 512: 507: 502: 496: 494: 490: 489: 482: 481: 474: 467: 459: 453: 452: 440: 435: 430: 423: 422:External links 420: 419: 418: 403: 388: 385: 382: 381: 370: 351: 336: 306: 281: 280: 278: 275: 274: 273: 267: 260: 257: 253:multithreading 217: 216: 210: 191: 185: 166: 148: 108: 56: 53: 15: 9: 6: 4: 3: 2: 3194: 3183: 3180: 3178: 3175: 3174: 3172: 3157: 3154: 3152: 3149: 3147: 3144: 3142: 3139: 3137: 3134: 3132: 3129: 3127: 3124: 3122: 3119: 3117: 3114: 3113: 3111: 3107: 3100: 3097: 3095: 3092: 3090: 3087: 3085: 3082: 3080: 3077: 3075: 3072: 3070: 3067: 3066: 3064: 3062: 3056: 3046: 3043: 3041: 3038: 3036: 3033: 3031: 3028: 3026: 3023: 3019: 3016: 3014: 3011: 3009: 3006: 3005: 3004: 3001: 3000: 2998: 2996: 2992: 2986: 2983: 2979: 2976: 2974: 2971: 2970: 2969: 2966: 2962: 2959: 2958: 2957: 2954: 2952: 2949: 2947: 2946:Demultiplexer 2944: 2942: 2939: 2938: 2936: 2934: 2930: 2924: 2921: 2919: 2916: 2913: 2911: 2908: 2906: 2903: 2901: 2898: 2896: 2893: 2892: 2890: 2888: 2884: 2878: 2875: 2873: 2870: 2868: 2867:Memory buffer 2865: 2863: 2862:Register file 2860: 2858: 2855: 2853: 2850: 2848: 2845: 2844: 2842: 2840: 2836: 2828: 2825: 2823: 2820: 2819: 2818: 2815: 2813: 2810: 2808: 2805: 2803: 2802:Combinational 2800: 2799: 2797: 2795: 2791: 2785: 2782: 2778: 2775: 2774: 2772: 2769: 2767: 2764: 2762: 2759: 2754: 2751: 2749: 2746: 2745: 2743: 2740: 2737: 2734: 2731: 2728: 2725: 2722: 2721: 2719: 2717: 2711: 2705: 2702: 2700: 2697: 2695: 2692: 2690: 2687: 2683: 2680: 2678: 2675: 2673: 2670: 2668: 2665: 2663: 2660: 2658: 2655: 2654: 2653: 2650: 2648: 2645: 2644: 2642: 2638: 2632: 2629: 2627: 2624: 2622: 2619: 2617: 2614: 2613: 2611: 2607: 2599: 2596: 2595: 2594: 2591: 2589: 2586: 2584: 2581: 2579: 2576: 2574: 2571: 2569: 2566: 2564: 2561: 2559: 2556: 2554: 2551: 2549: 2546: 2544: 2541: 2539: 2536: 2534: 2531: 2529: 2526: 2524: 2521: 2520: 2518: 2516: 2512: 2502: 2499: 2497: 2494: 2492: 2489: 2486: 2483: 2480: 2477: 2474: 2471: 2468: 2465: 2463: 2460: 2457: 2454: 2452: 2449: 2447: 2444: 2443: 2441: 2439: 2433: 2426: 2423: 2421: 2418: 2415: 2412: 2409: 2406: 2405: 2403: 2397: 2391: 2388: 2386: 2383: 2381: 2378: 2376: 2373: 2371: 2368: 2366: 2363: 2361: 2358: 2357: 2355: 2351: 2344: 2341: 2338: 2335: 2332: 2329: 2327: 2324: 2322: 2319: 2317: 2314: 2312: 2309: 2307: 2304: 2302: 2299: 2297: 2294: 2292: 2289: 2287: 2284: 2282: 2279: 2275: 2272: 2271: 2269: 2266: 2263: 2260: 2259: 2257: 2255: 2251: 2245: 2242: 2240: 2237: 2234: 2231: 2228: 2225: 2222: 2219: 2216: 2213: 2210: 2207: 2202: 2199: 2198: 2196: 2193: 2191: 2188: 2187: 2185: 2183: 2177: 2165: 2162: 2161: 2160: 2157: 2155: 2152: 2148: 2145: 2143: 2140: 2138: 2135: 2133: 2130: 2129: 2128: 2125: 2123: 2120: 2119: 2117: 2115: 2111: 2105: 2102: 2100: 2097: 2095: 2092: 2088: 2085: 2083: 2080: 2079: 2078: 2075: 2073: 2070: 2069: 2067: 2065: 2061: 2055: 2052: 2050: 2047: 2043: 2040: 2039: 2038: 2035: 2031: 2028: 2026: 2023: 2022: 2021: 2018: 2014: 2011: 2009: 2006: 2005: 2004: 2001: 1999: 1996: 1992: 1989: 1987: 1984: 1983: 1982: 1979: 1978: 1976: 1972: 1969: 1967: 1963: 1953: 1950: 1948: 1945: 1944: 1942: 1940: 1936: 1930: 1927: 1925: 1922: 1918: 1915: 1913: 1910: 1909: 1908: 1905: 1903: 1902:Scoreboarding 1900: 1899: 1897: 1895: 1891: 1885: 1884:False sharing 1882: 1880: 1877: 1875: 1872: 1870: 1867: 1866: 1864: 1862: 1858: 1852: 1849: 1847: 1844: 1842: 1839: 1838: 1836: 1834: 1830: 1827: 1825: 1821: 1811: 1808: 1806: 1803: 1801: 1798: 1795: 1791: 1788: 1786: 1783: 1781: 1778: 1776: 1773: 1772: 1770: 1768: 1765: 1763: 1760: 1758: 1755: 1753: 1750: 1748: 1745: 1743: 1740: 1738: 1735: 1733: 1730: 1728: 1725: 1723: 1720: 1718: 1715: 1713: 1710: 1706: 1703: 1701: 1698: 1696: 1693: 1692: 1690: 1688: 1685: 1683: 1680: 1678: 1677:Stanford MIPS 1675: 1673: 1670: 1668: 1665: 1663: 1660: 1658: 1655: 1653: 1650: 1649: 1647: 1641: 1633: 1630: 1629: 1628: 1625: 1623: 1620: 1618: 1615: 1613: 1610: 1608: 1605: 1603: 1600: 1598: 1595: 1591: 1588: 1587: 1586: 1583: 1579: 1576: 1575: 1574: 1571: 1569: 1566: 1564: 1561: 1559: 1556: 1554: 1551: 1550: 1548: 1544: 1541: 1539: 1538:architectures 1533: 1527: 1524: 1522: 1519: 1517: 1514: 1512: 1509: 1507: 1506:Heterogeneous 1504: 1500: 1497: 1495: 1492: 1491: 1490: 1487: 1485: 1482: 1478: 1475: 1473: 1470: 1468: 1465: 1463: 1460: 1459: 1458: 1457:Memory access 1455: 1453: 1450: 1448: 1445: 1443: 1440: 1438: 1435: 1431: 1428: 1427: 1426: 1423: 1421: 1418: 1416: 1413: 1412: 1410: 1408: 1404: 1396: 1393: 1391: 1390:Random-access 1388: 1386: 1383: 1381: 1378: 1377: 1376: 1373: 1371: 1370:Stack machine 1368: 1366: 1363: 1359: 1356: 1354: 1351: 1349: 1346: 1344: 1341: 1339: 1336: 1334: 1331: 1329: 1326: 1324: 1321: 1320: 1319: 1316: 1312: 1309: 1307: 1304: 1302: 1299: 1297: 1294: 1292: 1289: 1287: 1286:with datapath 1284: 1283: 1282: 1279: 1277: 1274: 1272: 1269: 1268: 1266: 1264: 1260: 1256: 1249: 1244: 1242: 1237: 1235: 1230: 1229: 1226: 1214: 1205: 1204: 1201: 1195: 1192: 1190: 1187: 1185: 1182: 1180: 1177: 1175: 1172: 1170: 1167: 1165: 1162: 1160: 1157: 1155: 1152: 1151: 1149: 1145: 1139: 1136: 1134: 1131: 1129: 1126: 1124: 1121: 1119: 1116: 1114: 1111: 1109: 1106: 1104: 1101: 1099: 1096: 1094: 1091: 1089: 1086: 1084: 1081: 1079: 1076: 1074: 1071: 1069: 1068:Global Arrays 1066: 1064: 1061: 1059: 1056: 1054: 1051: 1049: 1046: 1044: 1041: 1039: 1036: 1034: 1031: 1029: 1026: 1024: 1021: 1019: 1016: 1015: 1013: 1011: 1007: 1001: 998: 996: 995:Grid computer 993: 989: 986: 985: 984: 981: 978: 975: 971: 968: 966: 963: 961: 958: 956: 953: 951: 948: 946: 943: 942: 941: 938: 934: 931: 929: 926: 925: 924: 921: 919: 916: 914: 911: 909: 906: 904: 901: 897: 894: 892: 889: 885: 882: 880: 877: 874: 871: 870: 869: 866: 864: 861: 860: 859: 856: 855: 853: 851: 847: 841: 838: 834: 831: 829: 826: 824: 821: 820: 819: 816: 814: 811: 809: 806: 805: 803: 801: 797: 791: 788: 786: 783: 781: 778: 776: 773: 771: 768: 766: 763: 761: 758: 757: 755: 751: 745: 742: 740: 737: 735: 732: 730: 727: 725: 722: 721: 719: 715: 709: 706: 704: 701: 699: 696: 694: 691: 689: 686: 684: 681: 679: 676: 674: 671: 669: 666: 665: 663: 659: 653: 650: 647: 644: 642: 639: 637: 634: 631: 628: 626: 623: 620: 617: 615: 612: 611: 609: 607: 603: 597: 594: 592: 589: 587: 584: 582: 579: 577: 574: 572: 569: 567: 564: 562: 559: 558: 556: 552: 546: 543: 541: 538: 536: 533: 531: 528: 526: 523: 521: 518: 516: 513: 511: 508: 506: 503: 501: 498: 497: 495: 491: 487: 480: 475: 473: 468: 466: 461: 460: 457: 451: 447: 444: 441: 439: 436: 434: 431: 429: 426: 425: 414: 413:1-4899-7795-3 410: 406: 400: 396: 391: 390: 379: 374: 365: 364: 355: 347: 343: 339: 333: 329: 325: 321: 317: 310: 296: 295:mason.gmu.edu 292: 286: 282: 271: 268: 266: 263: 262: 256: 254: 250: 245: 239: 237: 233: 229: 227: 223: 214: 211: 208: 204: 200: 195: 192: 189: 186: 183: 179: 175: 170: 167: 164: 160: 156: 152: 149: 146: 143: 142: 141: 138: 136: 132: 127: 124: 120: 115: 106: 103: 101: 97: 93: 87: 85: 81: 76: 74: 70: 66: 62: 52: 50: 46: 42: 38: 34: 27: 23: 19: 3156:Chip carrier 3094:Clock gating 3013:Mixed-signal 2910:Write buffer 2887:Control unit 2699:Clock signal 2438:accelerators 2420:Cypress PSoC 2077:Simultaneous 1997: 1894:Out-of-order 1526:Neuromorphic 1407:Architecture 1365:Belt machine 1358:Zeno machine 1291:Hierarchical 753:Coordination 683:Amdahl's law 619:Simultaneous 565: 394: 373: 362: 354: 319: 309: 298:. Retrieved 294: 285: 240: 230: 218: 178:compile time 139: 135:cryptography 128: 116: 112: 104: 88: 77: 58: 45:instructions 36: 32: 31: 18: 2941:Multiplexer 2905:Data buffer 2616:Single-core 2588:bit slicing 2446:Coprocessor 2301:Coprocessor 2182:performance 2104:Cooperative 2094:Speculative 2054:Distributed 2013:Superscalar 1998:Instruction 1966:Parallelism 1939:Speculative 1771:System/3x0 1643:Instruction 1420:Von Neumann 1333:Post–Turing 1189:Scalability 950:distributed 833:Concurrency 800:Programming 641:Cooperative 630:Speculative 566:Instruction 417:(276 pages) 174:dynamically 153:execution, 151:Superscalar 61:concurrency 3171:Categories 3061:management 2956:Multiplier 2817:Logic gate 2807:Sequential 2714:Functional 2694:Clock rate 2667:Data cache 2640:Components 2621:Multi-core 2609:Core count 2099:Preemptive 2003:Pipelining 1986:Bit-serial 1929:Wide-issue 1874:Structural 1796:Tilera ISA 1762:MicroBlaze 1732:ETRAX CRIS 1627:Comparison 1472:Load–store 1452:Endianness 1194:Starvation 933:asymmetric 668:PRAM model 636:Preemptive 300:2019-03-24 277:References 117:A goal of 55:Discussion 2995:Circuitry 2915:Microcode 2839:Registers 2682:coherence 2657:CPU cache 2515:Word size 2180:Processor 1824:Execution 1727:DEC Alpha 1705:Power ISA 1521:Cognitive 1328:Universal 928:symmetric 673:PEM model 123:processor 39:) is the 2933:Datapath 2626:Manycore 2598:variable 2436:Hardware 2072:Temporal 1752:OpenRISC 1447:Cellular 1437:Dataflow 1430:modified 1159:Deadlock 1147:Problems 1113:pthreads 1093:OpenHMPP 1018:Ateji PX 979:computer 850:Hardware 717:Elements 703:Slowdown 614:Temporal 596:Pipeline 446:Archived 346:26665479 259:See also 119:compiler 92:compiler 84:software 80:hardware 41:parallel 3109:Related 3040:Quantum 3030:Digital 3025:Boolean 2923:Counter 2822:Quantum 2583:512-bit 2578:256-bit 2573:128-bit 2416:(MPSoC) 2401:on chip 2399:Systems 2217:(FLOPS) 2030:Process 1879:Control 1861:Hazards 1747:Itanium 1742:Unicore 1700:PowerPC 1425:Harvard 1385:Pointer 1380:Counter 1338:Quantum 1118:RaftLib 1098:OpenACC 1073:GPUOpen 1063:C++ AMP 1038:Charm++ 780:Barrier 724:Process 708:Speedup 493:General 100:Itanium 96:Pentium 69:process 3045:Switch 3035:Analog 2773:(IMC) 2744:(MMU) 2593:others 2568:64-bit 2563:48-bit 2558:32-bit 2553:24-bit 2548:16-bit 2543:15-bit 2538:12-bit 2375:Mobile 2291:Stream 2286:Barrel 2281:Vector 2270:(GPU) 2229:(SUPS) 2197:(IPC) 2049:Memory 2042:Vector 2025:Thread 2008:Scalar 1810:Others 1757:RISC-V 1722:SuperH 1691:Power 1687:MIPS-X 1662:PDP-11 1511:Fabric 1263:Models 1211: 1088:OpenCL 1083:OpenMP 1028:Chapel 945:shared 940:Memory 875:(SIMT) 818:Models 729:Thread 661:Theory 632:(SpMT) 586:Memory 571:Thread 554:Levels 411: 401: 344: 334: 65:thread 3101:(PPW) 3059:Power 2951:Adder 2827:Array 2794:Logic 2755:(TLB) 2738:(FPU) 2732:(AGU) 2726:(ALU) 2716:units 2652:Cache 2533:8-bit 2528:4-bit 2523:1-bit 2487:(TPU) 2481:(DSP) 2475:(PPU) 2469:(VPU) 2458:(GPU) 2427:(NoC) 2410:(SoC) 2345:(PoP) 2339:(SiP) 2333:(MCM) 2274:GPGPU 2264:(CPU) 2254:Types 2235:(PPW) 2223:(TPS) 2211:(IPS) 2203:(CPI) 1974:Level 1785:S/390 1780:S/370 1775:S/360 1717:SPARC 1695:POWER 1578:TRIPS 1546:Types 1058:Dryad 1023:Boost 744:Array 734:Fiber 648:(CMT) 621:(SMT) 535:GPGPU 342:S2CID 272:(MLP) 47:in a 3079:ACPI 2812:Glue 2704:FIFO 2647:Core 2385:ASIP 2326:CPLD 2321:FPOA 2316:FPGA 2311:ASIC 2164:SPMD 2159:MIMD 2154:MISD 2147:SWAR 2127:SIMD 2122:SISD 2037:Data 2020:Task 1991:Word 1737:M32R 1682:MIPS 1645:sets 1612:ZISC 1607:NISC 1602:OISC 1597:MISC 1590:EPIC 1585:VLIW 1573:EDGE 1563:RISC 1558:CISC 1467:HUMA 1462:NUMA 1123:ROCm 1053:CUDA 1043:Cilk 1010:APIs 970:COMA 965:NUMA 896:MIMD 891:MISD 868:SIMD 863:SISD 591:Loop 581:Data 576:Task 409:ISBN 399:ISBN 332:ISBN 251:and 205:and 155:VLIW 121:and 82:and 3074:APM 3069:PMU 2961:CPU 2918:ROM 2689:Bus 2306:PAL 1981:Bit 1767:LMC 1672:ARM 1667:x86 1657:VAX 1138:ZPL 1133:TBB 1128:UPC 1108:PVM 1078:MPI 1033:HPX 960:UMA 561:Bit 324:doi 86:. 73:CPU 37:ILP 3173:: 3008:3D 407:. 340:. 330:. 318:. 293:. 255:. 238:. 224:, 201:, 1247:e 1240:t 1233:v 478:e 471:t 464:v 415:. 367:. 348:. 326:: 303:. 209:. 35:(

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Knowledge

Instruction-level parallelism

Index