Knowledge

SSE4

Source 📝

180:-based FX processors. With SSE4a the misaligned SSE feature was also introduced which meant unaligned load instructions were as fast as aligned versions on aligned addresses. It also allowed disabling the alignment check on non-load SSE operations accessing memory. Intel later introduced similar speed improvements to unaligned SSE in their Nehalem processors, but did not introduce misaligned access by non-load SSE instructions until 132:
addition/multiplication and vector scalar addition/multiplication, process multiple bytes of data in a single CPU instruction. The parallel operation packs noticeable increases in performance. SSE4.2 introduced new SIMD string operations, including an instruction to compare two string fragments of up to 16 bytes each. SSE4.2 is a subset of SSE4 and it was released a few years after the initial release of SSE4.
36: 204:
processor line, was referred to as SSE4 by some media until Intel came up with the SSSE3 moniker. Internally dubbed Merom New Instructions, Intel originally did not plan to assign a special name to them, which was criticized by some journalists. Intel eventually cleared up the confusion and reserved
131:
Like other previous generation CPU SIMD instruction sets, SSE4 supports up to 16 registers, each 128-bits wide which can load four 32-bit integers, four 32-bit single precision floating point numbers, or two 64-bit double precision floating point numbers. SIMD operations, such as vector element-wise
127:
instruction set which was released in early 2004. All software using previous Intel SIMD instructions (ex. SSE3) are compatible with modern microprocessors supporting SSE4 instructions. All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as
457:
for AOS (Array of Structs) data. This takes an immediate operand consisting of four (or two for DPPD) bits to select which of the entries in the input to multiply and accumulate, and another four (or two for DPPD) to select whether to put 0 or the dot-product in the appropriate field of the output.
822:
These instructions operate on integer rather than SSE registers, because they are not SIMD instructions, but appear at the same time and although introduced by AMD with the SSE4a instruction set, they are counted as separate extensions with their own dedicated CPUID bits to indicate support. Intel
584:
The INSERTPS and PINSR instructions read 8, 16 or 32 bits from an x86 register or memory location and inserts it into a field in the destination register given by an immediate operand. EXTRACTPS and PEXTR read a field from the source register and insert it into an x86 register or memory location.
220:
Unlike all previous iterations of SSE, SSE4 contains instructions that execute operations which are not specific to multimedia applications. It features a number of instructions whose action is determined by a constant field and a set of instructions that take XMM0 as an implicit third operand.
175:
instruction set, which has four SSE4 instructions and four new SSE instructions. These instructions are not found in Intel's processors supporting SSE4.1 and AMD processors only started supporting Intel's SSE4.1 and SSE4.2 (the full SSE4 instruction set) in the
701:
SSE4.2 added STTNI (String and Text New Instructions), several new instructions that perform character searches and comparison on two operands of 16 bytes at a time. These were designed (among other things) to speed up the parsing of
1611: 1585: 1564: 1442: 1634: 420:
Sets the bottom unsigned 16-bit word of the destination to the smallest unsigned 16-bit word in the source, and the next-from-bottom to the index of that word in the source.
488:
Conditional copying of elements in one location with another, based (for non-V form) on the bits in an immediate operand, and (for V form) on the bits in register XMM0.
1786: 1472: 1374: 1704: 657:
to the result of an AND between its operands: ZF is set, if DEST AND SRC is equal to 0. Additionally it sets the C flag if (NOT DEST) AND SRC equals zero.
1515: 1805: 1607: 690:
Efficient read from write-combining memory area into SSE register; this is useful for retrieving results from peripherals attached to the memory bus.
54: 1827: 1891: 1663: 1417: 1581: 1561: 1393: 224:
Several of these instructions are enabled by the single-cycle shuffle engine in Penryn. (Shuffle operations reorder bytes within a register.)
659:
This is equivalent to setting the Z flag if none of the bits masked by SRC are set, and the C flag if all of the bits masked by SRC are set.
362:, and allows an 8×8 block difference to be computed in fewer than seven cycles. One bit of a three-bit immediate operand indicates whether y 1760: 1540: 1446: 430:
Packed 32-bit signed "long" multiplication, two (1st and 3rd) out of four packed integers multiplied giving two packed 64-bit results.
2296: 2260: 1512: 585:
For example, PEXTRD eax, , 1; EXTRACTPS , xmm1, 1 stores the first field of xmm1 in the address given by the first field of xmm0.
1496: 1733: 1790: 955:. These instructions are not available in Intel processors. Support is indicated via the CPUID.80000001H:ECX.SSE4A flag. 548:
Round values in a floating-point register to integers, using one of four rounding modes specified by an immediate operand
1884: 1851: 2162: 1985: 1903: 1674: 1941: 1468: 440:
Packed 32-bit signed "low" multiplication, four packed sets of integers multiplied giving four packed 32-bit results.
2266: 2145: 2021: 1911: 847: 72: 1809: 2284: 2276: 1915: 1371: 1095: 2290: 1218: 722:
product line, and complete the SSE4 instruction set. AMD on the other hand first added support starting with the
160:. Intel credits feedback from developers as playing an important role in the development of the instruction set. 1877: 1212: 1019: 1224: 1206: 723: 177: 1973: 1077: 952: 840: 1113: 1073: 1049: 836: 828: 715: 153: 1731:
Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 2B: Instruction Set Reference, N–Z
868:
called on some CPUs not supporting it, such as Intel CPUs prior to Haswell, may incorrectly execute the
2170: 2120: 2084: 1823: 1165: 1059: 233: 181: 145: 119:; more precise details of 47 instructions became available at the Spring 2007 Intel Developer Forum in 1660: 1410: 2329: 2235: 2191: 2046: 1929: 1390: 1179: 898: 729: 237: 104: 1635:"Microsoft blocks some PCs from Windows 11 24H2 — CPU must support SSE4.2 or the OS will not boot" 2334: 2241: 1946: 1936: 1232: 711: 454: 164: 2009: 1129: 140:
Intel SSE4 consists of 54 instructions. A subset consisting of 47 instructions, referred to as
17: 1753: 1705:"Microsoft fixes a misfired PopCnt block but Windows 11 24H2 requirements may be here to stay" 714:
as used in certain data transfer protocols. These instructions were first implemented in the
2211: 2034: 1532: 356: 112: 2252: 2223: 1968: 8: 2205: 2102: 2096: 927:(count number of bits set to 1). Support is indicated via the CPUID.01H:ECX.POPCNT flag. 1007: 152:, a second subset consisting of the seven remaining instructions, is first available in 1151: 937: 755:
C value using the polynomial 0x11EDC6F41 (or, without the high order bit, 0x1EDC6F41).
1861: 2175: 1953: 1639: 1063: 1014: 1493: 2137: 1730: 1318: 2180: 1900: 1737: 1678: 1667: 1568: 1519: 1500: 1397: 1378: 97: 732:
requires the CPU to support SSE4.2, otherwise the Windows kernel is unbootable.
259:
Compute eight offset sums of absolute differences, four at a time (i.e., |x
128:
well as in the presence of existing and new applications that incorporate SSE4.
1608:"XML Parsing Accelerator with Intel® Streaming SIMD Extensions 4 (Intel® SSE4)" 1391:
Tuning for Intel SSE4 for the 45nm Next Generation Intel Core Microarchitecture
924: 50: 2323: 1869: 1039: 719: 1671: 1857: 201: 2199: 1582:"Schema Validation with Intel® Streaming SIMD Extensions 4 (Intel® SSE4)" 116: 1691:
Fast, Parallelized CRC Computation Using the Nehalem CRC32 Instruction
2229: 2151: 1992: 1924: 1562:
Motion Estimation with Intel Streaming SIMD Extensions 4 (Intel SSE4)
654: 378:
should be used from the destination operand, the other two whether x
1690: 1372:
Intel Streaming SIMD Extensions 4 (SSE4) Instruction Set Innovation
1335: 1329: 1323: 1029: 2114: 1997: 1980: 1963: 1343: 1134: 1107: 1103: 1089: 1085: 1067: 157: 120: 108: 2217: 2040: 1709: 864:(bit scan reverse) instruction. This results in an issue where 940:. Support is indicated via the CPUID.80000001H:ECX.ABM flag. 2126: 2064: 2004: 1302: 1288: 1274: 1260: 1246: 752: 359: 193: 101: 2108: 2090: 2076: 2058: 2052: 1958: 1332:
QuadCore C4000-series processors (SSE4.1, SSE4.2 supported)
839:
microarchitecture. AMD implements both, beginning with the
680:
Convert signed DWORDs into unsigned WORDs with saturation.
526:
Packed minimum/maximum for different integer operand types
197: 124: 111:. It was announced on September 27, 2006, at the Fall 2006 93: 726:. Support is indicated via the CPUID.01H:ECX.SSE42 flag. 240:. Support is indicated via the CPUID.01H:ECX.SSE41 flag. 2025: 1513:
Extending the World’s Most Popular Processor Architecture
703: 168: 205:
the SSE4 name for their next instruction set extension.
1824:"AMD FX-Series FX-6300 - FD6300WMW6KHK / FD6300WMHKBOX" 1533:"Intel - Data Center Solutions, IOT, and PC Innovation" 1858:
PCMPSTR calculator for the SSE 4.2 string instructions
1787:""Barcelona" Processor Feature: SSE4a Instruction Set" 1443:""Barcelona" Processor Feature: SSE Misaligned Access" 1195:"Heavy Equipment" processors (SSE4a, SSE4.1, SSE4.2, 785:
Packed Compare Implicit Length Strings, Return Index
765:
Packed Compare Explicit Length Strings, Return Index
1348:
ZX-C processors and newer (SSE4.1, SSE4.2 supported)
951:
The SSE4a instruction group was introduced in AMD's
876:
exception. This is an issue as the result values of
860:
takes the same encoding path as the encoding of the
795:
Packed Compare Implicit Length Strings, Return Mask
775:
Packed Compare Explicit Length Strings, Return Mask
805:Compare Packed Signed 64-bit data For Greater Than 45:
may be too technical for most readers to understand
1506: 2321: 1803: 1784: 1326:3000, X2, QuadCore processors (SSE4.1 supported) 2148:(ABM: 2007, BMI1: 2012, BMI2: 2013, TBM: 2012) 1899: 1806:""Barcelona" Processor Feature: SSE4a, part 2" 905:, otherwise the Windows kernel is unbootable. 355:|); this operation is important for some 1885: 1182:processors and newer (SSE4a, SSE4.1, SSE4.2, 1168:processors and newer (SSE4a, SSE4.1, SSE4.2, 144:in some Intel documentation, is available in 1748: 1746: 1487: 1892: 1878: 639:Packed sign/zero extension to wider types 123:, in the presentation. SSE4 extended the 1743: 1098:processors and newer (SSE4.1, SSE4.2 and 73:Learn how and when to remove this message 57:, without removing the technical details. 1681:for discussion of the CRC32C polynomial. 1469:"Inside Intel Nehalem Microarchitecture" 1338:X4 processors (SSE4.1, SSE4.2 supported) 887:Trailing zeros can be counted using the 670:Quadword (64 bits) compare for equality 232:These instructions were introduced with 2208:(2008); ARMv8 also has AES instructions 1785:Rahul Chaturvedi (September 17, 2007). 14: 2322: 1423:from the original on February 15, 2020 1116:processors and newer (SSE4.1, SSE4.2, 1873: 1766:from the original on November 1, 2013 1632: 1543:from the original on February 7, 2013 1062:processors (SSE4.1 supported, except 996:Scalar streaming store instructions. 55:make it understandable to non-experts 1830:from the original on August 17, 2017 1804:Rahul Chaturvedi (October 2, 2007). 1367: 1365: 1363: 846:AMD calls this pair of instructions 29: 1702: 1305:processors (SSE4a, SSE4.1, SSE4.2, 1291:processors (SSE4a, SSE4.1, SSE4.2, 1277:processors (SSE4a, SSE4.1, SSE4.2, 1263:processors (SSE4a, SSE4.1, SSE4.2, 1249:processors (SSE4a, SSE4.1, SSE4.2, 1235:processors (SSE4a, SSE4.1, SSE4.2, 811: 236:, the 45 nm shrink of Intel's 215: 24: 1614:from the original on June 17, 2018 1588:from the original on June 17, 2018 1475:from the original on April 2, 2015 1411:"Intel SSE4 Programming Reference" 1002: 980:Combined mask-shift instructions. 208:Intel is using the marketing term 25: 2346: 1845: 1693:— Dr. Dobbs, April 12, 2011 1360: 653:instruction, in that it sets the 200:Extensions 3), introduced in the 187: 2308:Suspended extensions' dates are 1661:Intel SSE4 Programming Reference 872:operation instead of raising an 410:should be used from the source. 34: 1816: 1797: 1778: 1724: 1696: 1684: 1654: 1633:Klotz, Aaron (April 24, 2024). 1626: 1600: 1574: 1080:processors (SSE4.1, SSE4.2 and 1052:processors (SSE4.1, SSE4.2 and 1042:processors (SSE4.1, SSE4.2 and 1032:processors (SSE4.1, SSE4.2 and 1022:processors (SSE4.1, SSE4.2 and 135: 1555: 1525: 1461: 1435: 1403: 1384: 13: 1: 1703:Sen, Sayan (March 17, 2024). 1354: 901:requires the CPU to support 706:documents. It also added a 7: 1494:My Experience With "Conroe" 1148:"Cat" low-power processors 953:Barcelona microarchitecture 841:Barcelona microarchitecture 724:Bulldozer microarchitecture 90:Streaming SIMD Extensions 4 10: 2351: 1852:SSE4 Programming Reference 1666:February 15, 2020, at the 1518:November 24, 2011, at the 115:, with vague details in a 2306: 2275: 2251: 2189: 2161: 2136: 2020: 1910: 1754:"AMD CPUID Specification" 1499:October 15, 2013, at the 963: 960: 913: 910: 849:Advanced Bit Manipulation 740: 737: 696: 248: 245: 227: 2105:(FMA4: 2011, FMA3: 2012) 946: 712:cyclic redundancy checks 234:Penryn microarchitecture 196:(Supplemental Streaming 27:SIMD CPU instruction set 2163:Compressed instructions 710:instruction to compute 649:This is similar to the 1736:March 8, 2011, at the 1677:June 19, 2008, at the 1567:June 16, 2018, at the 1396:March 8, 2021, at the 891:(bit scan forward) or 831:microarchitecture and 238:Core microarchitecture 105:Core microarchitecture 1445:. AMD. Archived from 1377:May 30, 2009, at the 192:What is now known as 113:Intel Developer Forum 2253:Transactional memory 1812:on October 25, 2013. 1793:on October 25, 2013. 1227:processors and newer 323:|, ..., |x 1154:processors (SSE4a, 1137:processors (SSE4a, 1102:supported, include 874:invalid instruction 835:beginning with the 827:beginning with the 167:-based processors, 1084:supported, except 938:Leading zero count 212:to refer to SSE4. 2317: 2316: 1670:p. 61. See also 1449:on August 9, 2016 1219:Steamroller-based 1064:Pentium Dual-Core 1000: 999: 944: 943: 809: 808: 694: 693: 83: 82: 75: 16:(Redirected from 2342: 2330:X86 instructions 2138:Bit manipulation 1894: 1887: 1880: 1871: 1870: 1862:Ghostarchive.org 1840: 1839: 1837: 1835: 1820: 1814: 1813: 1808:. Archived from 1801: 1795: 1794: 1789:. Archived from 1782: 1776: 1775: 1773: 1771: 1765: 1758: 1750: 1741: 1728: 1722: 1721: 1719: 1717: 1700: 1694: 1688: 1682: 1658: 1652: 1651: 1649: 1647: 1630: 1624: 1623: 1621: 1619: 1604: 1598: 1597: 1595: 1593: 1578: 1572: 1559: 1553: 1552: 1550: 1548: 1529: 1523: 1510: 1504: 1491: 1485: 1484: 1482: 1480: 1465: 1459: 1458: 1456: 1454: 1439: 1433: 1432: 1430: 1428: 1422: 1415: 1407: 1401: 1388: 1382: 1369: 1312: 1308: 1298: 1294: 1284: 1280: 1270: 1266: 1256: 1252: 1242: 1238: 1213:Piledriver-based 1202: 1198: 1189: 1185: 1175: 1171: 1161: 1157: 1144: 1140: 1123: 1119: 1101: 1083: 1055: 1045: 1035: 1025: 993: 989: 987: 977: 973: 971: 958: 957: 934: 925:Population count 921: 908: 907: 904: 894: 890: 883: 879: 871: 867: 863: 859: 856:The encoding of 834: 826: 818: 814: 802: 792: 782: 772: 762: 748: 735: 734: 709: 687: 677: 667: 652: 646: 636: 632: 628: 624: 620: 616: 612: 608: 604: 600: 596: 592: 581: 577: 573: 569: 565: 563: 559: 555: 545: 541: 537: 533: 523: 519: 515: 511: 507: 503: 499: 495: 485: 481: 477: 473: 469: 465: 451: 447: 437: 427: 417: 256: 243: 242: 216:New instructions 148:. Additionally, 78: 71: 67: 64: 58: 38: 37: 30: 21: 2350: 2349: 2345: 2344: 2343: 2341: 2340: 2339: 2320: 2319: 2318: 2313: 2302: 2271: 2247: 2185: 2157: 2132: 2016: 1906: 1901:Instruction set 1898: 1864:at May 10, 2022 1848: 1843: 1833: 1831: 1822: 1821: 1817: 1802: 1798: 1783: 1779: 1769: 1767: 1763: 1756: 1752: 1751: 1744: 1738:Wayback Machine 1729: 1725: 1715: 1713: 1701: 1697: 1689: 1685: 1679:Wayback Machine 1668:Wayback Machine 1659: 1655: 1645: 1643: 1631: 1627: 1617: 1615: 1606: 1605: 1601: 1591: 1589: 1580: 1579: 1575: 1569:Wayback Machine 1560: 1556: 1546: 1544: 1531: 1530: 1526: 1520:Wayback Machine 1511: 1507: 1501:Wayback Machine 1492: 1488: 1478: 1476: 1467: 1466: 1462: 1452: 1450: 1441: 1440: 1436: 1426: 1424: 1420: 1413: 1409: 1408: 1404: 1398:Wayback Machine 1389: 1385: 1379:Wayback Machine 1370: 1361: 1357: 1310: 1306: 1296: 1292: 1282: 1278: 1268: 1264: 1254: 1250: 1240: 1236: 1225:Excavator-based 1207:Bulldozer-based 1200: 1196: 1187: 1183: 1173: 1169: 1159: 1155: 1142: 1138: 1121: 1117: 1099: 1081: 1076:processors and 1053: 1043: 1033: 1023: 1005: 1003:Supporting CPUs 991: 985: 975: 969: 949: 932: 919: 902: 899:Windows 11 24H2 892: 888: 884:are different. 881: 877: 869: 865: 861: 857: 832: 824: 820: 816: 812: 800: 790: 780: 770: 760: 746: 730:Windows 11 24H2 707: 699: 685: 675: 665: 650: 644: 634: 630: 626: 622: 618: 614: 610: 606: 602: 598: 594: 590: 579: 575: 571: 567: 561: 557: 553: 543: 539: 535: 531: 521: 517: 513: 509: 505: 501: 497: 493: 483: 479: 475: 471: 467: 463: 449: 445: 435: 425: 415: 409: 405: 401: 397: 393: 389: 385: 381: 377: 373: 369: 365: 354: 350: 346: 342: 338: 334: 330: 326: 322: 318: 314: 310: 306: 302: 298: 294: 291:|, |x 290: 286: 282: 278: 274: 270: 266: 262: 254: 230: 218: 190: 171:introduced the 138: 98:instruction set 79: 68: 62: 59: 51:help improve it 48: 39: 35: 28: 23: 22: 15: 12: 11: 5: 2348: 2338: 2337: 2335:SIMD computing 2332: 2315: 2314: 2310:struck through 2307: 2304: 2303: 2301: 2300: 2294: 2288: 2281: 2279: 2277:Virtualization 2273: 2272: 2270: 2269: 2264: 2257: 2255: 2249: 2248: 2246: 2245: 2239: 2233: 2227: 2221: 2215: 2209: 2203: 2196: 2194: 2187: 2186: 2184: 2183: 2178: 2173: 2167: 2165: 2159: 2158: 2156: 2155: 2149: 2142: 2140: 2134: 2133: 2131: 2130: 2124: 2118: 2112: 2106: 2100: 2094: 2088: 2082: 2074: 2068: 2062: 2056: 2050: 2044: 2038: 2031: 2029: 2018: 2017: 2015: 2014: 2013: 2012: 2002: 2001: 2000: 1990: 1989: 1988: 1978: 1977: 1976: 1971: 1966: 1961: 1951: 1950: 1949: 1944: 1934: 1933: 1932: 1921: 1919: 1908: 1907: 1897: 1896: 1889: 1882: 1874: 1866: 1865: 1855: 1847: 1846:External links 1844: 1842: 1841: 1815: 1796: 1777: 1742: 1723: 1695: 1683: 1653: 1640:Tom's Hardware 1625: 1599: 1573: 1554: 1524: 1505: 1486: 1460: 1434: 1402: 1383: 1358: 1356: 1353: 1352: 1351: 1350: 1349: 1341: 1340: 1339: 1333: 1327: 1316: 1315: 1314: 1300: 1286: 1272: 1258: 1244: 1230: 1229: 1228: 1222: 1216: 1210: 1193: 1192: 1191: 1177: 1163: 1146: 1127: 1126: 1125: 1111: 1093: 1071: 1057: 1047: 1037: 1027: 1004: 1001: 998: 997: 994: 982: 981: 978: 966: 965: 962: 948: 945: 942: 941: 935: 929: 928: 922: 916: 915: 912: 895:instructions. 819: 810: 807: 806: 803: 797: 796: 793: 787: 786: 783: 777: 776: 773: 767: 766: 763: 757: 756: 749: 743: 742: 739: 698: 695: 692: 691: 688: 682: 681: 678: 672: 671: 668: 662: 661: 647: 641: 640: 637: 587: 586: 582: 550: 549: 546: 528: 527: 524: 490: 489: 486: 460: 459: 452: 442: 441: 438: 432: 431: 428: 422: 421: 418: 412: 411: 407: 403: 399: 395: 391: 387: 383: 379: 375: 371: 367: 363: 352: 348: 347:|+|x 344: 340: 339:|+|x 336: 332: 331:|+|x 328: 324: 320: 316: 315:|+|x 312: 308: 307:|+|x 304: 300: 299:|+|x 296: 292: 288: 284: 283:|+|x 280: 276: 275:|+|x 272: 268: 267:|+|x 264: 260: 257: 251: 250: 247: 229: 226: 217: 214: 189: 188:Name confusion 186: 163:Starting with 137: 134: 81: 80: 42: 40: 33: 26: 9: 6: 4: 3: 2: 2347: 2336: 2333: 2331: 2328: 2327: 2325: 2311: 2305: 2298: 2295: 2292: 2289: 2286: 2283: 2282: 2280: 2278: 2274: 2268: 2265: 2262: 2259: 2258: 2256: 2254: 2250: 2243: 2240: 2237: 2234: 2231: 2228: 2225: 2222: 2219: 2216: 2213: 2210: 2207: 2204: 2201: 2198: 2197: 2195: 2193: 2190:Security and 2188: 2182: 2179: 2177: 2174: 2172: 2169: 2168: 2166: 2164: 2160: 2153: 2150: 2147: 2144: 2143: 2141: 2139: 2135: 2128: 2125: 2122: 2119: 2116: 2113: 2110: 2107: 2104: 2101: 2098: 2095: 2092: 2089: 2086: 2083: 2081: 2078: 2075: 2072: 2069: 2066: 2063: 2060: 2057: 2054: 2051: 2048: 2045: 2042: 2039: 2036: 2033: 2032: 2030: 2027: 2023: 2019: 2011: 2008: 2007: 2006: 2003: 1999: 1996: 1995: 1994: 1991: 1987: 1984: 1983: 1982: 1979: 1975: 1972: 1970: 1967: 1965: 1962: 1960: 1957: 1956: 1955: 1952: 1948: 1945: 1943: 1940: 1939: 1938: 1935: 1931: 1928: 1927: 1926: 1923: 1922: 1920: 1917: 1913: 1909: 1905: 1902: 1895: 1890: 1888: 1883: 1881: 1876: 1875: 1872: 1868: 1863: 1859: 1856: 1853: 1850: 1849: 1829: 1825: 1819: 1811: 1807: 1800: 1792: 1788: 1781: 1762: 1755: 1749: 1747: 1739: 1735: 1732: 1727: 1712: 1711: 1706: 1699: 1692: 1687: 1680: 1676: 1673: 1669: 1665: 1662: 1657: 1642: 1641: 1636: 1629: 1613: 1609: 1603: 1587: 1583: 1577: 1570: 1566: 1563: 1558: 1547:September 17, 1542: 1538: 1534: 1528: 1521: 1517: 1514: 1509: 1502: 1498: 1495: 1490: 1474: 1470: 1464: 1448: 1444: 1438: 1419: 1412: 1406: 1399: 1395: 1392: 1387: 1380: 1376: 1373: 1368: 1366: 1364: 1359: 1347: 1346: 1345: 1342: 1337: 1334: 1331: 1328: 1325: 1322: 1321: 1320: 1317: 1304: 1301: 1290: 1287: 1276: 1273: 1262: 1259: 1248: 1245: 1234: 1231: 1226: 1223: 1220: 1217: 1214: 1211: 1208: 1205: 1204: 1194: 1181: 1178: 1167: 1164: 1153: 1150: 1149: 1147: 1136: 1133: 1132: 1131: 1128: 1115: 1112: 1109: 1105: 1097: 1094: 1091: 1087: 1079: 1075: 1072: 1069: 1065: 1061: 1058: 1051: 1048: 1041: 1040:Goldmont Plus 1038: 1031: 1028: 1021: 1018: 1017: 1016: 1013: 1012: 1011: 1009: 995: 984: 983: 979: 968: 967: 959: 956: 954: 939: 936: 931: 930: 926: 923: 918: 917: 909: 906: 900: 896: 885: 875: 854: 852: 850: 844: 842: 838: 830: 804: 799: 798: 794: 789: 788: 784: 779: 778: 774: 769: 768: 764: 759: 758: 754: 750: 745: 744: 736: 733: 731: 727: 725: 721: 720:Intel Core i7 717: 713: 705: 689: 684: 683: 679: 674: 673: 669: 664: 663: 660: 656: 648: 643: 642: 638: 589: 588: 583: 580:PEXTRD/PEXTRQ 552: 551: 547: 530: 529: 525: 492: 491: 487: 462: 461: 456: 453: 444: 443: 439: 434: 433: 429: 424: 423: 419: 414: 413: 361: 358: 258: 253: 252: 244: 241: 239: 235: 225: 222: 213: 211: 206: 203: 199: 195: 185: 183: 179: 174: 170: 166: 161: 159: 155: 151: 147: 143: 133: 129: 126: 122: 118: 114: 110: 109:AMD K10 (K8L) 106: 103: 99: 95: 91: 87: 77: 74: 66: 56: 52: 46: 43:This article 41: 32: 31: 19: 2309: 2192:cryptography 2079: 2070: 1867: 1860:archived at 1832:. Retrieved 1818: 1810:the original 1799: 1791:the original 1780: 1768:. Retrieved 1726: 1714:. Retrieved 1708: 1698: 1686: 1656: 1644:. Retrieved 1638: 1628: 1616:. Retrieved 1602: 1590:. Retrieved 1576: 1557: 1545:. Retrieved 1536: 1527: 1508: 1489: 1477:. Retrieved 1463: 1451:. Retrieved 1447:the original 1437: 1427:December 26, 1425:. Retrieved 1405: 1386: 1166:Jaguar-based 1152:Bobcat-based 1096:Sandy Bridge 1006: 964:Description 961:Instruction 950: 914:Description 911:Instruction 897: 886: 873: 855: 848: 845: 821: 741:Description 738:Instruction 728: 700: 658: 249:Description 246:Instruction 231: 223: 219: 209: 207: 202:Intel Core 2 191: 172: 162: 149: 141: 139: 136:SSE4 subsets 130: 100:used in the 89: 85: 84: 69: 60: 44: 2176:MIPS16e ASE 1770:October 30, 1618:February 6, 1592:February 6, 1503:, DailyTech 1203:supported) 823:implements 751:Accumulate 455:Dot product 117:white paper 2324:Categories 1904:extensions 1834:October 9, 1355:References 1313:supported) 1303:Zen5-based 1299:supported) 1289:Zen4-based 1285:supported) 1275:Zen3-based 1271:supported) 1261:Zen2-based 1257:supported) 1247:Zen+-based 1243:supported) 1221:processors 1215:processors 1209:processors 1190:supported) 1180:Puma-based 1176:supported) 1162:supported) 1145:supported) 1124:supported) 1056:supported) 1046:supported) 1036:supported) 1026:supported) 1020:Silvermont 416:PHMINPOSUW 1993:Power ISA 1974:MIPS SIMD 1716:March 17, 1646:April 29, 1233:Zen-based 1135:K10-based 1008:X86-64 v2 791:PCMPISTRM 781:PCMPISTRI 771:PCMPESTRM 761:PCMPESTRI 572:EXTRACTPS 178:Bulldozer 165:Barcelona 63:July 2019 2299:(AMD-Vi) 1854:by Intel 1828:Archived 1761:Archived 1734:Archived 1675:Archived 1672:RFC 3385 1664:Archived 1612:Archived 1586:Archived 1571:, Intel. 1565:Archived 1541:Archived 1516:Archived 1497:Archived 1479:March 3, 1473:Archived 1453:March 3, 1418:Archived 1400:, Intel. 1394:Archived 1381:, Intel. 1375:Archived 1078:Westmere 1030:Goldmont 686:MOVNTDQA 676:PACKUSDW 635:PMOVZXDQ 631:PMOVSXDQ 627:PMOVZXWQ 623:PMOVSXWQ 619:PMOVZXWD 615:PMOVSXWD 611:PMOVZXBQ 607:PMOVSXBQ 603:PMOVZXBD 599:PMOVSXBD 595:PMOVZXBW 591:PMOVSXBW 554:INSERTPS 480:PBLENDVB 476:BLENDVPD 472:BLENDVPS 210:HD Boost 2200:PadLock 2115:AVX-512 1981:PA-RISC 1964:MIPS-3D 1522:, Intel 1344:Zhaoxin 1114:Haswell 1108:Celeron 1104:Pentium 1090:Celeron 1086:Pentium 1074:Nehalem 1068:Celeron 1050:Tremont 992:MOVNTSS 990:​ 986:MOVNTSD 976:INSERTQ 974:​ 837:Haswell 829:Nehalem 801:PCMPGTQ 718:-based 716:Nehalem 666:PCMPEQQ 566:​ 544:ROUNDSD 540:ROUNDPD 536:ROUNDSS 532:ROUNDPS 484:PBLENDW 468:BLENDPD 464:BLENDPS 255:MPSADBW 158:Core i7 156:-based 154:Nehalem 121:Beijing 92:) is a 49:Please 2293:(2006) 2287:(2005) 2263:(2013) 2244:(2021) 2238:(2015) 2232:(2015) 2226:(2013) 2220:(2012) 2218:RDRAND 2214:(2010) 2206:AES-NI 2202:(2003) 2154:(2014) 2129:(2023) 2123:(2022) 2117:(2015) 2111:(2013) 2099:(2009) 2093:(2009) 2087:(2008) 2080:(2007) 2073:(2006) 2067:(2006) 2061:(2004) 2055:(2001) 2049:(1999) 2043:(1998) 2041:3DNow! 2037:(1996) 1710:Neowin 1307:POPCNT 1293:POPCNT 1279:POPCNT 1265:POPCNT 1251:POPCNT 1237:POPCNT 1197:POPCNT 1184:POPCNT 1170:POPCNT 1156:POPCNT 1139:POPCNT 1118:POPCNT 1100:POPCNT 1082:POPCNT 1060:Penryn 1054:POPCNT 1044:POPCNT 1034:POPCNT 1024:POPCNT 1010:CPUs: 920:POPCNT 903:POPCNT 825:POPCNT 813:POPCNT 697:SSE4.2 655:Z flag 576:PEXTRB 568:PINSRQ 562:PINSRD 558:PINSRB 522:PMAXSD 518:PMINSD 514:PMAXUD 510:PMINUD 506:PMAXUW 502:PMINUW 498:PMAXSB 494:PMINSB 436:PMULLD 426:PMULDQ 360:codecs 228:SSE4.1 150:SSE4.2 146:Penryn 142:SSE4.1 18:SSE4.1 2291:AMD-V 2212:CLMUL 2171:Thumb 2127:AVX10 2065:SSSE3 2005:SPARC 1925:Alpha 1764:(PDF) 1757:(PDF) 1537:Intel 1421:(PDF) 1414:(PDF) 1311:LZCNT 1297:LZCNT 1283:LZCNT 1269:LZCNT 1255:LZCNT 1241:LZCNT 1201:LZCNT 1188:LZCNT 1174:LZCNT 1160:LZCNT 1143:LZCNT 1122:LZCNT 1015:Intel 970:EXTRQ 947:SSE4a 933:LZCNT 893:TZCNT 878:LZCNT 866:LZCNT 858:LZCNT 851:(ABM) 833:LZCNT 817:LZCNT 753:CRC32 747:CRC32 708:CRC32 645:PTEST 194:SSSE3 173:SSE4a 102:Intel 2297:VT-d 2285:VT-x 2109:AVX2 2091:F16C 2077:SSE5 2071:SSE4 2059:SSE3 2053:SSE2 2022:SIMD 1959:MDMX 1954:MIPS 1942:NEON 1916:RISC 1912:SIMD 1836:2015 1772:2013 1718:2024 1648:2024 1620:2012 1594:2012 1549:2009 1481:2015 1455:2015 1429:2014 1336:Eden 1330:Nano 1324:Nano 1309:and 1295:and 1281:and 1267:and 1253:and 1239:and 1199:and 1186:and 1172:and 1158:and 1141:and 1120:and 1106:and 1088:and 1066:and 880:and 815:and 651:TEST 450:DPPD 446:DPPS 402:or x 374:.. y 370:or y 366:.. y 198:SIMD 125:SSE3 107:and 96:CPU 94:SIMD 86:SSE4 2267:ASF 2261:TSX 2242:TDX 2236:SGX 2230:MPX 2224:SHA 2181:RVC 2152:ADX 2146:BMI 2121:AMX 2103:FMA 2097:XOP 2085:AVX 2047:SSE 2035:MMX 2026:x86 2010:VIS 1998:VMX 1986:MAX 1969:MXU 1947:SVE 1937:ARM 1930:MVI 1319:VIA 1130:AMD 889:BSF 882:BSR 870:BSR 862:BSR 704:XML 406:..x 398:..x 394:, x 390:..x 386:, x 382:..x 182:AVX 169:AMD 53:to 2326:: 1826:. 1759:. 1745:^ 1707:. 1637:. 1610:. 1584:. 1539:. 1535:. 1471:. 1416:. 1362:^ 853:. 843:. 633:, 629:, 625:, 621:, 617:, 613:, 609:, 605:, 601:, 597:, 593:, 578:, 574:, 570:, 560:, 556:, 542:, 538:, 534:, 520:, 516:, 512:, 508:, 504:, 500:, 496:, 482:, 478:, 474:, 470:, 466:, 448:, 408:15 404:12 400:11 376:14 368:10 357:HD 353:10 351:−y 343:−y 335:−y 327:−y 319:−y 311:−y 303:−y 295:−y 287:−y 279:−y 271:−y 263:−y 184:. 2312:. 2028:) 2024:( 1918:) 1914:( 1893:e 1886:t 1879:v 1838:. 1774:. 1740:. 1720:. 1650:. 1622:. 1596:. 1551:. 1483:. 1457:. 1431:. 1110:) 1092:) 1070:) 988:/ 972:/ 564:/ 396:8 392:7 388:4 384:3 380:0 372:4 364:0 349:3 345:9 341:2 337:8 333:1 329:7 325:0 321:4 317:3 313:3 309:2 305:2 301:1 297:1 293:0 289:3 285:3 281:2 277:2 273:1 269:1 265:0 261:0 88:( 76:) 70:( 65:) 61:( 47:. 20:)

Index

SSE4.1
help improve it
make it understandable to non-experts
Learn how and when to remove this message
SIMD
instruction set
Intel
Core microarchitecture
AMD K10 (K8L)
Intel Developer Forum
white paper
Beijing
SSE3
Penryn
Nehalem
Core i7
Barcelona
AMD
Bulldozer
AVX
SSSE3
SIMD
Intel Core 2
Penryn microarchitecture
Core microarchitecture
HD
codecs
Dot product
Z flag
XML

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.