Flynn's taxonomy - Knowledge

1902: 420: 411: 402: 393: 233:– These receive the one (same) instruction but then read data from a central resource, each processes fragments of that data, then writes back the results to the same central resource. In Figure 5 of Flynn's 1972 paper that resource is main memory: for modern CPUs that resource is now more typically the register file. 464:- the use of this terminology for SPMD is technically incorrect, as SPMD is a parallel execution model and assumes multiple cooperating processors executing a program. SPMD is the most common style of explicit parallel programming. The SPMD model and the term was proposed by Frederica Darema of the RP3 team. 187:

A sequential computer which exploits no parallelism in either the instruction or data streams. Single control unit (CU) fetches a single instruction stream (IS) from memory. The CU then generates appropriate control signals to direct a single processing element (PE) to operate on a single data stream

492:

These are both distinct from the explicit parallel programming used in HPC in that the individual programs are generic building blocks rather than implementing part of a specific parallel algorithm. In the pipelining approach, the amount of available parallelism does not increase with the size of

472:

Multiple autonomous processors simultaneously operating at least two independent programs. In HPC contexts, such systems often pick one node to be the "host" ("the explicit host/node programming model") or "manager" (the "Manager/Worker" strategy), which runs one program that farms out data to all

291:

At the time that Flynn wrote his 1972 paper many systems were using main memory as the resource from which pipelines were reading and writing. When the resource that all "pipelines" read and write from is the register file rather than main memory, modern variants of SIMD result. Examples include

260:" (SIMT). This is a distinct classification in Flynn's 1972 taxonomy, as a subcategory of SIMD. It is identifiable by the parallel subelements having their own independent register file and memory (cache and data memory). Flynn's original papers cite two historic examples of SIMT processors: 278:

ALUs and bit-level predication (Flynn's taxonomy: associative processing), and each of the 4096 processors had their own registers and memory (Flynn's taxonomy: array processing). The Linedancer, released in 2010, contained 4096 2-bit predicated SIMD ALUs, each with its own

217:

A single instruction is simultaneously applied to multiple different data streams. Instructions can be executed sequentially, such as by pipelining, or in parallel by multiple functional units. Flynn's 1972 paper subdivided SIMD down into three further categories:

72:

The four initial classifications defined by Flynn are based upon the number of concurrent instruction (or control) streams and data streams available in the architecture. Flynn defined three additional sub-categories of SIMD in 1972.

348:

Multiple instructions operate on one data stream. This is an uncommon architecture which is generally used for fault tolerance. Heterogeneous systems operate on the same data stream and must agree on the result. Examples include the

484:

build system can build multiple dependencies in parallel, using target-dependent programs in addition to the make executable itself. MPMD also often takes the form of pipelines. A simple Unix shell command like

473:

the other nodes which all run a second program. Those other nodes then return their results directly to the manager. An example of this would be the Sony PlayStation 3 game console, with its

271:

Nvidia commonly uses the term in its marketing materials and technical documents, where it argues for the novelty of its architecture. SOLOMON predates Nvidia by more than 60 years.

334:

in particular) take features of more than one of these subcategories: GPUs of today are SIMT but also are Associative i.e. each processing element in the SIMT array is also predicated.

516: 41:

in 1966 and extended in 1972. The classification system has stuck, and it has been used as a tool in the design of modern processors and their functionalities. Since the rise of

1079: 442:

Although these are not part of Flynn's work, some further divide the MIMD category into the two categories below, and even further subdivisions are sometimes considered.

1228: 965: 804: 320: 666: 283:, and was capable of 800 billion instructions per second. Aspex's ASP associative array SIMT processor predates NVIDIA by 20 years. 1318: 247:

to the unit, as to whether to perform the execution or whether to skip it. In modern terminology this is known as "predicated" (masked) SIMD.

1170: 274:

The Aspex Microelectronics Associative String Processor (ASP) categorised itself in its marketing material as "massive wide SIMD" but had

1125:; George, David A.; Norton, V. Alan; Pfister, Gregory F. (1988). "A single-program-multiple-data computational model for EPEX/FORTRAN". 746: 227:– These receive the one (same) instruction but each parallel processing unit has its own separate and distinct memory and register file. 1033: 1061: 990: 1299: 1087: 847: 386:

These four architectures are shown below visually. Each processing unit (PU) is shown for a uni-core or multi-core computer:

1339: 1566: 367:

Multiple autonomous processors simultaneously execute different instructions on different data. MIMD architectures include

257: 1589: 362: 119: 456:

Multiple autonomous processors simultaneously executing the same program (but at independent points, rather than in the

1478: 1334: 1584: 1561: 908: 521: 489:

launches three processes running separate programs in parallel with the output of one used as the input to the next.

343: 212: 114: 98: 1163: 831: 1556: 1371: 1012: 969: 812: 182: 93: 1663: 1526: 719: 597: 1932: 1887: 1721: 1259: 161: 1927: 865:"Effective Monte Carlo simulation on System-V massively parallel associative string processing architecture" 1906: 1852: 1312: 1156: 883:

An algorithm of hardware unit generation for processor core synthesis with packed SIMD type instructions

1831: 1626: 1511: 1473: 1323: 1213: 301: 280: 1847: 1826: 1771: 1658: 1648: 1621: 1483: 633: 1801: 1427: 1366: 1279: 502: 1862: 1857: 1716: 1307: 553: 45: 1601: 1533: 1437: 1329: 1284: 297: 1693: 1653: 1606: 1596: 1391: 1254: 1193: 474: 34: 711: 589: 1633: 1521: 1516: 1506: 1493: 1289: 507: 457: 57: 17: 8: 1796: 1751: 1417: 690: 641: 307:

An alternative name for this type of register-based SIMD is "packed SIMD" and another is

512: 1821: 1670: 1643: 1468: 1432: 1422: 1381: 1223: 1203: 1198: 1179: 1037: 947: 786: 658: 614: 375: 200: 1101: 1065: 998: 1867: 1543: 1501: 1396: 1138: 904: 843: 196: 53: 951: 790: 662: 618: 1877: 1676: 1611: 1458: 1274: 1269: 1264: 1233: 1134: 1122: 937: 894: 886: 835: 778: 728: 707: 650: 606: 585: 562: 544: 239:– These receive the one (same) instruction but in each parallel processing unit an 223: 49: 38: 1741: 1681: 1616: 1463: 1453: 1386: 1376: 1218: 1208: 839: 481: 42: 890: 1872: 1688: 1345: 1238: 864: 548: 1921: 1761: 1638: 1062:"Understanding parallel job management and message passing on IBM SP systems" 436: 356: 350: 261: 732: 610: 1361: 566: 192: 942: 925: 828:

Artificial Neural Network on a Massively Parallel Associative Architecture

337: 206: 1882: 371: 311:. When predication is applied, it becomes associative processing (below) 881:

Miyaoka, Y.; Choi, J.; Togawa, N.; Yanagisawa, M.; Ohtsuki, T. (2002).

769: 368: 176: 899: 863:Ódor, Géza; Krikelis, Argy; Vesztergombi, György; Rohrbach, Francois. 378:, using either one shared memory space or a distributed memory space. 1756: 1731: 1148: 885:. Asia-Pacific Conference on Circuits and Systems. pp. 171–176. 782: 265: 991:"Programming requirements for compiling, building, and running jobs" 654: 1806: 1786: 1711: 767:

Lea, R. M. (1988). "ASP: A Cost-Effective Parallel Microcomputer".

467: 64:

was released in 1977: Flynn's second paper was published in 1972.

52:

context has evolved as an extension of the classification system.

1811: 1791: 1766: 1401: 324: 293: 1781: 1776: 691:"Data-Level Parallelism in Vector, SIMD, and GPU Architectures" 445: 433: 61: 862: 1816: 1746: 1736: 880: 747:"NVIDIA's Next Generation CUDA Compute Architecture: Fermi" 451: 308: 166: 357:

Multiple instruction streams, multiple data streams (MIMD)

1726: 1703: 1121: 331: 419: 410: 401: 392: 338:

Multiple instruction streams, single data stream (MISD)

207:

Single instruction stream, multiple data streams (SIMD)

830:. International Neural Network Conference. Dordrecht: 480:

MPMD is common in non-HPC contexts. For example, the

712:"Some Computer Organizations and Their Effectiveness" 590:"Some Computer Organizations and Their Effectiveness" 381: 177:

Single instruction stream, single data stream (SISD)

191:Examples of SISD architectures are the traditional 460:that SIMD imposes) on different data. Also termed 1056: 1054: 1034:"NIST SP2 Primer: Distributed-memory programming" 199:(PCs) (by 2010, many PCs had multiple cores) and 1919: 468:Multiple programs, multiple data streams (MPMD) 146:Associative processing (predicated/masked SIMD) 1051: 923: 432:As of 2006, all of the top 10 and most of the 319:The modern term for associative processor is " 1164: 634:"A Survey of Parallel Computer Architectures" 580: 578: 576: 966:"Single Program Multiple Data stream (SPMD)" 446:Single program, multiple data streams (SPMD) 926:"The space shuttle primary computer system" 924:Spector, A.; Gifford, D. (September 1984). 256:The modern term for an array processor is " 60:, is missing from Flynn's work because the 1577: 1171: 1157: 573: 145: 941: 898: 700: 1572: 825: 314: 140: 27:Classification of computer architectures 14: 1920: 1178: 631: 286: 1152: 706: 584: 543: 323:" (or masked) SIMD. Examples include 427: 258:single instruction, multiple threads 188:(DS) i.e., one operation at a time. 135: 766: 549:"Very high-speed computing systems" 363:Multiple instruction, multiple data 24: 439:are based on a MIMD architecture. 251: 141:Pipelined processing (packed SIMD) 67: 25: 1944: 382:Diagram comparing classifications 344:Multiple instruction, single data 213:Single instruction, multiple data 1901: 1900: 418: 409: 400: 391: 243:decision is made, based on data 1372:Analysis of parallel algorithms 1115: 1094: 1072: 1036:. Math.nist.gov. Archived from 1026: 1005: 983: 958: 917: 874: 856: 672:from the original on 2018-07-18 632:Duncan, Ralph (February 1990). 183:Single instruction, single data 1102:"Single program multiple data" 1084:Distributed Memory Programming 819: 797: 760: 739: 720:IEEE Transactions on Computers 683: 625: 598:IEEE Transactions on Computers 537: 517:Erlangen Classification System 13: 1: 1319:Simultaneous and heterogenous 530: 462:single process, multiple data 309:SIMD within a register (SWAR) 1907:Category: Parallel computing 1139:10.1016/0167-8191(88)90094-4 840:10.1007/978-94-009-0643-3_39 7: 891:10.1109/APCCAS.2002.1114930 496: 10: 1949: 1214:High-performance computing 968:. Llnl.gov. Archived from 805:"Linedancer HD – Overview" 449: 360: 341: 281:content-addressable memory 210: 180: 1896: 1848:Automatic parallelization 1840: 1702: 1542: 1492: 1484:Application checkpointing 1446: 1410: 1354: 1298: 1247: 1186: 930:Communications of the ACM 353:flight control computer. 46:central processing units 1863:Embarrassingly parallel 1858:Deterministic algorithm 733:10.1109/TC.1972.5009071 611:10.1109/TC.1972.5009071 554:Proceedings of the IEEE 136:Array processing (SIMT) 33:is a classification of 1578:Associative processing 1534:Non-blocking algorithm 1340:Clustered multi-thread 1104:. Nist.gov. 2004-12-17 1090:on September 10, 2006. 1013:"CTC Virtual Workshop" 567:10.1109/PROC.1966.5273 35:computer architectures 1694:Hardware acceleration 1607:Superscalar processor 1597:Dataflow architecture 1194:Distributed computing 1015:. Web0.tc.cornell.edu 1001:on September 1, 2006. 943:10.1145/358234.358246 826:Krikelis, A. (1988). 503:Feng's classification 330:Some modern designs ( 315:Associative processor 237:Associative processor 107:Multiple data streams 1933:Classes of computers 1573:Pipelined processing 1522:Explicit parallelism 1517:Implicit parallelism 1507:Dataflow programming 1068:on February 3, 2007. 995:Lightning User Guide 487:ls | grep "A" | more 195:machines like older 1797:Parallel Extensions 1602:Pipelined processor 815:on 13 October 2006. 809:Aspex Semiconductor 696:. 12 November 2013. 376:distributed systems 287:Pipelined processor 231:Pipelined processor 201:mainframe computers 1671:Massively parallel 1649:distributed shared 1469:Cache invalidation 1433:Instruction window 1224:Manycore processor 1204:Massively parallel 1199:Parallel computing 1180:Parallel computing 1127:Parallel Computing 710:(September 1972). 588:(September 1972). 197:personal computers 128:SIMD subcategories 86:Single data stream 1915: 1914: 1868:Parallel slowdown 1502:Stream processing 1392:Karp–Flatt metric 1123:Darema, Frederica 849:978-94-009-0643-3 708:Flynn, Michael J. 586:Flynn, Michael J. 561:(12): 1901–1909. 547:(December 1966). 545:Flynn, Michael J. 508:Duncan's taxonomy 475:SPU/PPU processor 428:Further divisions 174: 173: 58:Duncan's taxonomy 54:Vector processing 16:(Redirected from 1940: 1928:Flynn's taxonomy 1904: 1903: 1878:Software lockout 1677:Computer cluster 1612:Vector processor 1567:Array processing 1552:Flynn's taxonomy 1459:Memory coherence 1234:Computer network 1173: 1166: 1159: 1150: 1149: 1143: 1142: 1119: 1113: 1112: 1110: 1109: 1098: 1092: 1091: 1086:. Archived from 1080:"9.2 Strategies" 1076: 1070: 1069: 1064:. Archived from 1058: 1049: 1048: 1046: 1045: 1030: 1024: 1023: 1021: 1020: 1009: 1003: 1002: 997:. Archived from 987: 981: 980: 978: 977: 962: 956: 955: 945: 921: 915: 914: 902: 878: 872: 871: 869: 860: 854: 853: 823: 817: 816: 811:. Archived from 801: 795: 794: 783:10.1109/40.87518 764: 758: 757: 751: 743: 737: 736: 716: 704: 698: 697: 695: 687: 681: 680: 678: 677: 671: 638: 629: 623: 622: 594: 582: 571: 570: 541: 525: 422: 413: 404: 395: 374:processors, and 80:Flynn's taxonomy 76: 75: 50:multiprogramming 39:Michael J. Flynn 31:Flynn's taxonomy 21: 1948: 1947: 1943: 1942: 1941: 1939: 1938: 1937: 1918: 1917: 1916: 1911: 1892: 1836: 1742:Coarray Fortran 1698: 1682:Beowulf cluster 1538: 1488: 1479:Synchronization 1464:Cache coherence 1454:Multiprocessing 1442: 1406: 1387:Cost efficiency 1382:Gustafson's law 1350: 1294: 1243: 1219:Multiprocessing 1209:Cloud computing 1182: 1177: 1147: 1146: 1120: 1116: 1107: 1105: 1100: 1099: 1095: 1078: 1077: 1073: 1060: 1059: 1052: 1043: 1041: 1032: 1031: 1027: 1018: 1016: 1011: 1010: 1006: 989: 988: 984: 975: 973: 964: 963: 959: 922: 918: 911: 879: 875: 867: 861: 857: 850: 824: 820: 803: 802: 798: 765: 761: 749: 745: 744: 740: 714: 705: 701: 693: 689: 688: 684: 675: 673: 669: 655:10.1109/2.44900 636: 630: 626: 592: 583: 574: 542: 538: 533: 519: 499: 470: 454: 448: 430: 423: 414: 405: 396: 384: 365: 359: 346: 340: 317: 289: 254: 252:Array processor 224:Array processor 215: 209: 185: 179: 70: 68:Classifications 43:multiprocessing 28: 23: 22: 15: 12: 11: 5: 1946: 1936: 1935: 1930: 1913: 1912: 1910: 1909: 1897: 1894: 1893: 1891: 1890: 1885: 1880: 1875: 1873:Race condition 1870: 1865: 1860: 1855: 1850: 1844: 1842: 1838: 1837: 1835: 1834: 1829: 1824: 1819: 1814: 1809: 1804: 1799: 1794: 1789: 1784: 1779: 1774: 1769: 1764: 1759: 1754: 1749: 1744: 1739: 1734: 1729: 1724: 1719: 1714: 1708: 1706: 1700: 1699: 1697: 1696: 1691: 1686: 1685: 1684: 1674: 1668: 1667: 1666: 1661: 1656: 1651: 1646: 1641: 1631: 1630: 1629: 1624: 1617:Multiprocessor 1614: 1609: 1604: 1599: 1594: 1593: 1592: 1587: 1582: 1581: 1580: 1575: 1570: 1559: 1548: 1546: 1540: 1539: 1537: 1536: 1531: 1530: 1529: 1524: 1519: 1509: 1504: 1498: 1496: 1490: 1489: 1487: 1486: 1481: 1476: 1471: 1466: 1461: 1456: 1450: 1448: 1444: 1443: 1441: 1440: 1435: 1430: 1425: 1420: 1414: 1412: 1408: 1407: 1405: 1404: 1399: 1394: 1389: 1384: 1379: 1374: 1369: 1364: 1358: 1356: 1352: 1351: 1349: 1348: 1346:Hardware scout 1343: 1337: 1332: 1327: 1321: 1316: 1310: 1304: 1302: 1300:Multithreading 1296: 1295: 1293: 1292: 1287: 1282: 1277: 1272: 1267: 1262: 1257: 1251: 1249: 1245: 1244: 1242: 1241: 1239:Systolic array 1236: 1231: 1226: 1221: 1216: 1211: 1206: 1201: 1196: 1190: 1188: 1184: 1183: 1176: 1175: 1168: 1161: 1153: 1145: 1144: 1114: 1093: 1071: 1050: 1025: 1004: 982: 957: 936:(9): 872–900. 916: 909: 873: 855: 848: 818: 796: 759: 738: 727:(9): 948–960. 699: 682: 624: 605:(9): 948–960. 572: 535: 534: 532: 529: 528: 527: 510: 505: 498: 495: 493:the data set. 469: 466: 450:Main article: 447: 444: 437:supercomputers 429: 426: 425: 424: 417: 415: 408: 406: 399: 397: 390: 383: 380: 361:Main article: 358: 355: 342:Main article: 339: 336: 316: 313: 288: 285: 266:ILLIAC IV 253: 250: 249: 248: 234: 228: 211:Main article: 208: 205: 181:Main article: 178: 175: 172: 171: 170: 169: 164: 156: 155: 151: 150: 149: 148: 143: 138: 130: 129: 125: 124: 123: 122: 117: 109: 108: 104: 103: 102: 101: 96: 88: 87: 83: 82: 69: 66: 37:, proposed by 26: 9: 6: 4: 3: 2: 1945: 1934: 1931: 1929: 1926: 1925: 1923: 1908: 1899: 1898: 1895: 1889: 1886: 1884: 1881: 1879: 1876: 1874: 1871: 1869: 1866: 1864: 1861: 1859: 1856: 1854: 1851: 1849: 1846: 1845: 1843: 1839: 1833: 1830: 1828: 1825: 1823: 1820: 1818: 1815: 1813: 1810: 1808: 1805: 1803: 1800: 1798: 1795: 1793: 1790: 1788: 1785: 1783: 1780: 1778: 1775: 1773: 1770: 1768: 1765: 1763: 1762:Global Arrays 1760: 1758: 1755: 1753: 1750: 1748: 1745: 1743: 1740: 1738: 1735: 1733: 1730: 1728: 1725: 1723: 1720: 1718: 1715: 1713: 1710: 1709: 1707: 1705: 1701: 1695: 1692: 1690: 1689:Grid computer 1687: 1683: 1680: 1679: 1678: 1675: 1672: 1669: 1665: 1662: 1660: 1657: 1655: 1652: 1650: 1647: 1645: 1642: 1640: 1637: 1636: 1635: 1632: 1628: 1625: 1623: 1620: 1619: 1618: 1615: 1613: 1610: 1608: 1605: 1603: 1600: 1598: 1595: 1591: 1588: 1586: 1583: 1579: 1576: 1574: 1571: 1568: 1565: 1564: 1563: 1560: 1558: 1555: 1554: 1553: 1550: 1549: 1547: 1545: 1541: 1535: 1532: 1528: 1525: 1523: 1520: 1518: 1515: 1514: 1513: 1510: 1508: 1505: 1503: 1500: 1499: 1497: 1495: 1491: 1485: 1482: 1480: 1477: 1475: 1472: 1470: 1467: 1465: 1462: 1460: 1457: 1455: 1452: 1451: 1449: 1445: 1439: 1436: 1434: 1431: 1429: 1426: 1424: 1421: 1419: 1416: 1415: 1413: 1409: 1403: 1400: 1398: 1395: 1393: 1390: 1388: 1385: 1383: 1380: 1378: 1375: 1373: 1370: 1368: 1365: 1363: 1360: 1359: 1357: 1353: 1347: 1344: 1341: 1338: 1336: 1333: 1331: 1328: 1325: 1322: 1320: 1317: 1314: 1311: 1309: 1306: 1305: 1303: 1301: 1297: 1291: 1288: 1286: 1283: 1281: 1278: 1276: 1273: 1271: 1268: 1266: 1263: 1261: 1258: 1256: 1253: 1252: 1250: 1246: 1240: 1237: 1235: 1232: 1230: 1227: 1225: 1222: 1220: 1217: 1215: 1212: 1210: 1207: 1205: 1202: 1200: 1197: 1195: 1192: 1191: 1189: 1185: 1181: 1174: 1169: 1167: 1162: 1160: 1155: 1154: 1151: 1140: 1136: 1132: 1128: 1124: 1118: 1103: 1097: 1089: 1085: 1081: 1075: 1067: 1063: 1057: 1055: 1040:on 2013-12-13 1039: 1035: 1029: 1014: 1008: 1000: 996: 992: 986: 972:on 2004-06-04 971: 967: 961: 953: 949: 944: 939: 935: 931: 927: 920: 912: 910:0-7803-7690-0 906: 901: 896: 892: 888: 884: 877: 866: 859: 851: 845: 841: 837: 833: 829: 822: 814: 810: 806: 800: 792: 788: 784: 780: 776: 772: 771: 763: 755: 748: 742: 734: 730: 726: 722: 721: 713: 709: 703: 692: 686: 668: 664: 660: 656: 652: 648: 644: 643: 635: 628: 620: 616: 612: 608: 604: 600: 599: 591: 587: 581: 579: 577: 568: 564: 560: 556: 555: 550: 546: 540: 536: 523: 518: 514: 511: 509: 506: 504: 501: 500: 494: 490: 488: 483: 478: 476: 465: 463: 459: 453: 443: 440: 438: 435: 421: 416: 412: 407: 403: 398: 394: 389: 388: 387: 379: 377: 373: 370: 364: 354: 352: 351:Space Shuttle 345: 335: 333: 328: 326: 322: 312: 310: 305: 303: 299: 295: 284: 282: 277: 272: 269: 267: 263: 259: 246: 242: 238: 235: 232: 229: 226: 225: 221: 220: 219: 214: 204: 202: 198: 194: 189: 184: 168: 165: 163: 160: 159: 158: 157: 153: 152: 147: 144: 142: 139: 137: 134: 133: 132: 131: 127: 126: 121: 118: 116: 113: 112: 111: 110: 106: 105: 100: 97: 95: 92: 91: 90: 89: 85: 84: 81: 78: 77: 74: 65: 63: 59: 56:, covered by 55: 51: 47: 44: 40: 36: 32: 19: 1551: 1447:Coordination 1377:Amdahl's law 1313:Simultaneous 1133:(1): 11–24. 1130: 1126: 1117: 1106:. Retrieved 1096: 1088:the original 1083: 1074: 1066:the original 1042:. Retrieved 1038:the original 1028: 1017:. Retrieved 1007: 999:the original 994: 985: 974:. Retrieved 970:the original 960: 933: 929: 919: 882: 876: 858: 827: 821: 813:the original 808: 799: 777:(5): 10–29. 774: 768: 762: 753: 741: 724: 718: 702: 685: 674:. Retrieved 646: 640: 627: 602: 596: 558: 552: 539: 491: 486: 479: 471: 461: 455: 441: 431: 385: 366: 347: 329: 318: 306: 290: 275: 273: 270: 255: 244: 240: 236: 230: 222: 216: 193:uniprocessor 190: 186: 79: 71: 30: 29: 1883:Scalability 1644:distributed 1527:Concurrency 1494:Programming 1335:Cooperative 1324:Speculative 1260:Instruction 649:(2): 5–16. 520: [ 372:superscalar 241:independent 1922:Categories 1888:Starvation 1627:asymmetric 1362:PRAM model 1330:Preemptive 1108:2013-12-09 1044:2013-12-09 1019:2013-12-09 976:2013-12-09 900:2065/10689 770:IEEE Micro 676:2018-07-18 531:References 369:multi-core 321:predicated 48:(CPUs), a 1622:symmetric 1367:PEM model 276:bit-level 1853:Deadlock 1841:Problems 1807:pthreads 1787:OpenHMPP 1712:Ateji PX 1673:computer 1544:Hardware 1411:Elements 1397:Slowdown 1308:Temporal 1290:Pipeline 952:39724471 832:Springer 791:25901856 667:Archived 663:15036692 642:Computer 619:18573685 497:See also 458:lockstep 154:See also 1812:RaftLib 1792:OpenACC 1767:GPUOpen 1757:C++ AMP 1732:Charm++ 1474:Barrier 1418:Process 1402:Speedup 1187:General 513:Händler 325:AVX-512 294:Altivec 262:SOLOMON 1905: 1782:OpenCL 1777:OpenMP 1722:Chapel 1639:shared 1634:Memory 1569:(SIMT) 1512:Models 1423:Thread 1355:Theory 1326:(SpMT) 1280:Memory 1265:Thread 1248:Levels 950: 907: 846: 789: 754:Nvidia 661: 617: 434:TOP500 300:, and 62:Cray-1 1752:Dryad 1717:Boost 1438:Array 1428:Fiber 1342:(CMT) 1315:(SMT) 1229:GPGPU 948:S2CID 868:(PDF) 787:S2CID 750:(PDF) 715:(PDF) 694:(PDF) 670:(PDF) 659:S2CID 637:(PDF) 615:S2CID 593:(PDF) 526:(ECS) 524:] 245:local 1817:ROCm 1747:CUDA 1737:Cilk 1704:APIs 1664:COMA 1659:NUMA 1590:MIMD 1585:MISD 1562:SIMD 1557:SISD 1285:Loop 1275:Data 1270:Task 905:ISBN 844:ISBN 725:C-21 603:C-21 482:make 452:SPMD 332:GPUs 298:NEON 264:and 167:MPMD 162:SPMD 120:MIMD 115:SIMD 99:MISD 94:SISD 18:MPMD 1832:ZPL 1827:TBB 1822:UPC 1802:PVM 1772:MPI 1727:HPX 1654:UMA 1255:Bit 1135:doi 938:doi 895:hdl 887:doi 836:doi 779:doi 729:doi 651:doi 607:doi 563:doi 515:'s 304:. 302:AVX 1924:: 1129:. 1082:. 1053:^ 993:. 946:. 934:27 932:. 928:. 903:. 893:. 842:. 834:. 807:. 785:. 773:. 752:. 723:. 717:. 665:. 657:. 647:23 645:. 639:. 613:. 601:. 595:. 575:^ 559:54 557:. 551:. 522:de 477:. 327:. 296:, 268:. 203:. 1172:e 1165:t 1158:v 1141:. 1137:: 1131:7 1111:. 1047:. 1022:. 979:. 954:. 940:: 913:. 897:: 889:: 870:. 852:. 838:: 793:. 781:: 775:8 756:. 735:. 731:: 679:. 653:: 621:. 609:: 569:. 565:: 20:)

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index