Knowledge

CLMUL instruction set

Source 📝

190: 95: 192:. The CLMUL instruction also allows a more efficient implementation of the closely related multiplication of larger finite fields GF(2) than the traditional instruction set. 548: 573: 559: 1301: 240:. The source may be another XMM register or memory. An immediate operand specifies which halves of the 128-bit operands are multiplied. 1140: 1104: 994: 1706: 1182: 1670: 701: 100: 1227: 659: 849: 370: 939: 1294: 944: 934: 1572: 1395: 1313: 1212: 1170: 358: 40: 1351: 1676: 1555: 1431: 1321: 956: 929: 376: 522: 1694: 1686: 1325: 1033: 1700: 435: 1287: 1197: 987: 429: 237: 922: 917: 1252: 441: 423: 382: 50: 1383: 43:
announced in early 2010. Mathematically, the instruction implements multiplication of polynomials over the
1232: 1099: 694: 1187: 388: 1038: 1580: 1530: 1494: 1043: 897: 503: 408: 195:
One use of these instructions is to improve the speed of applications doing block cipher encryption in
199:, which depends on finite field GF(2) multiplication. Another application is the fast calculation of 1744: 1739: 1645: 1601: 1456: 1339: 980: 471: 1207: 414: 616: 1651: 1356: 1346: 1247: 1237: 1165: 687: 549:"Intel Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode – Rev 2.02" 483: 449: 200: 1419: 1268: 1192: 1058: 402: 36: 224:
ARMv8 also has a version of CLMUL. SPARC calls their version XMULX, for "XOR multiplication".
1444: 1242: 1094: 1023: 907: 824: 241: 1662: 1633: 1378: 498: 493: 320:
Multiply the low half of the destination register by the high half of the source register.
306:
Multiply the high half of the destination register by the low half of the source register.
663: 8: 1615: 1512: 1506: 961: 912: 902: 735: 488: 196: 636: 591: 1175: 819: 364: 233: 1585: 1363: 1150: 1109: 353: 32: 274:
Perform a carry-less multiplication of two 64-bit polynomials over the finite field
1547: 1160: 1053: 617:"The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads" 1590: 1310: 1217: 844: 470:
The presence of the CLMUL instruction set can be checked by testing one of the
207: 28: 1733: 1279: 1063: 859: 839: 834: 204: 1068: 44: 1609: 1073: 526: 1155: 1114: 1028: 854: 794: 789: 745: 595: 244:
specifying specific values of the immediate operand are also defined:
1639: 1561: 1402: 1334: 1222: 1202: 1078: 1048: 829: 799: 621: 39:
which was proposed by Intel in March 2008 and made available in the
1018: 972: 869: 864: 592:"Fighting Cancer: The Unexpected Benefit Of Open Sourcing Our Code" 394: 218: 1524: 1407: 1390: 1373: 773: 679: 341: 210: 1627: 1450: 892: 887: 740: 730: 574:"Fast CRC Computation for Generic Polynomials Using PCLMULQDQ" 546: 1536: 1474: 1414: 1124: 1119: 1003: 725: 555: 461: 455: 185:{\displaystyle a_{0}+a_{1}X+a_{2}X^{2}+\cdots +a_{63}X^{63}} 1518: 1500: 1486: 1480: 1468: 1462: 1368: 1145: 951: 768: 214: 1435: 710: 24: 385:
processor (with increased throughput and lower latency)
103: 53: 637:"Slide detailing improvements of Jaguar over Bobcat" 347: 340:A EVEX vectorized version (VPCLMULQDQ) is seen in 184: 89: 1731: 547:Shay Gueron; Michael E. Kounavis (2014-04-20). 334:Multiply the high halves of the two registers. 1558:(ABM: 2007, BMI1: 2012, BMI2: 2013, TBM: 2012) 1309: 614: 292:Multiply the low halves of the two registers. 1295: 988: 695: 657: 589: 236:of two 64-bit values. The destination is a 1302: 1288: 1141:Advanced Programmable Interrupt Controller 1105:Intel Communication Streaming Architecture 995: 981: 702: 688: 1183:High-bandwidth Digital Content Protection 203:, including those used to implement the 1618:(2008); ARMv8 also has AES instructions 90:{\displaystyle a_{0}a_{1}\ldots a_{63}} 1732: 1228:Platform Environment Control Interface 1283: 976: 683: 662:. AMD Developer blogs. Archived from 232:The instruction computes the 128-bit 1002: 227: 13: 1213:Host Embedded Controller Interface 709: 14: 1756: 1718:Suspended extensions' dates are 348:CPUs with CLMUL instruction set 651: 629: 608: 583: 566: 540: 515: 1: 615:Johan De Gelas (2017-03-31). 509: 420:"Heavy Equipment" processors 1171:Active Management Technology 1100:MultiProcessor Specification 658:Dave Christie (6 May 2009). 7: 590:Vlad Krasnov (2015-07-08). 477: 10: 1761: 263:PCLMULQDQ xmmreg,xmmrm,imm 97:represents the polynomial 47:GF(2) where the bitstring 1716: 1685: 1661: 1599: 1571: 1546: 1430: 1320: 1261: 1133: 1087: 1011: 880: 812: 782: 761: 754: 718: 326:PCLMULHQHQDQ xmmreg,xmmrm 312:PCLMULLQHQDQ xmmreg,xmmrm 298:PCLMULHQLQDQ xmmreg,xmmrm 284:PCLMULLQLQDQ xmmreg,xmmrm 41:Intel Westmere processors 23:) is an extension to the 17:Carry-less Multiplication 1515:(FMA4: 2011, FMA3: 2012) 1208:Serial Digital Video Out 1198:Rapid Storage Technology 523:"Intel Software Network" 27:instruction set used by 1573:Compressed instructions 1253:Ultra Path Interconnect 1238:Platform Controller Hub 1166:Intel Management Engine 525:. Intel. Archived from 484:Finite field arithmetic 361:processor (March 2010). 1269:Silicon Photonics Link 1233:QuickPath Interconnect 464:(and later) processors 186: 91: 1243:System Management Bus 1188:High Definition Audio 1095:Common Building Block 825:High Bandwidth Memory 639:. AMD. 29 August 2012 411:processors and newer 391:(and later) processor 187: 92: 1663:Transactional memory 660:"Striking a balance" 499:FMA4 instruction set 494:FMA3 instruction set 444:processors and newer 417:processors and newer 238:128-bit XMM register 101: 51: 504:AVX instruction set 489:AES instruction set 197:Galois/Counter Mode 666:on 9 November 2013 234:carry-less product 182: 87: 1727: 1726: 1277: 1276: 1151:Intel Turbo Boost 1110:Intel Inboard 386 970: 969: 808: 807: 436:Steamroller-based 338: 337: 1752: 1745:X86 instructions 1740:X86 architecture 1548:Bit manipulation 1304: 1297: 1290: 1281: 1280: 1193:Hub Architecture 1161:Intel Secure Key 997: 990: 983: 974: 973: 759: 758: 704: 697: 690: 681: 680: 675: 674: 672: 671: 655: 649: 648: 646: 644: 633: 627: 626: 612: 606: 605: 603: 602: 587: 581: 580: 578: 570: 564: 563: 558:. Archived from 553: 544: 538: 537: 535: 534: 519: 472:CPU feature bits 430:Piledriver-based 331: 327: 317: 313: 303: 299: 289: 285: 271: 270: 265: 264: 247: 246: 228:New instructions 191: 189: 188: 183: 181: 180: 171: 170: 152: 151: 142: 141: 126: 125: 113: 112: 96: 94: 93: 88: 86: 85: 73: 72: 63: 62: 1760: 1759: 1755: 1754: 1753: 1751: 1750: 1749: 1730: 1729: 1728: 1723: 1712: 1681: 1657: 1595: 1567: 1542: 1426: 1316: 1311:Instruction set 1308: 1278: 1273: 1257: 1218:Hyper-threading 1129: 1083: 1007: 1001: 971: 966: 876: 804: 778: 750: 736:Radeon Software 714: 708: 678: 669: 667: 656: 652: 642: 640: 635: 634: 630: 613: 609: 600: 598: 588: 584: 576: 572: 571: 567: 551: 545: 541: 532: 530: 521: 520: 516: 512: 480: 442:Excavator-based 424:Bulldozer-based 350: 330: 325: 316: 311: 302: 297: 288: 283: 269: 268: 262: 261: 230: 176: 172: 166: 162: 147: 143: 137: 133: 121: 117: 108: 104: 102: 99: 98: 81: 77: 68: 64: 58: 54: 52: 49: 48: 29:microprocessors 12: 11: 5: 1758: 1748: 1747: 1742: 1725: 1724: 1720:struck through 1717: 1714: 1713: 1711: 1710: 1704: 1698: 1691: 1689: 1687:Virtualization 1683: 1682: 1680: 1679: 1674: 1667: 1665: 1659: 1658: 1656: 1655: 1649: 1643: 1637: 1631: 1625: 1619: 1613: 1606: 1604: 1597: 1596: 1594: 1593: 1588: 1583: 1577: 1575: 1569: 1568: 1566: 1565: 1559: 1552: 1550: 1544: 1543: 1541: 1540: 1534: 1528: 1522: 1516: 1510: 1504: 1498: 1492: 1484: 1478: 1472: 1466: 1460: 1454: 1448: 1441: 1439: 1428: 1427: 1425: 1424: 1423: 1422: 1412: 1411: 1410: 1400: 1399: 1398: 1388: 1387: 1386: 1381: 1376: 1371: 1361: 1360: 1359: 1354: 1344: 1343: 1342: 1331: 1329: 1318: 1317: 1307: 1306: 1299: 1292: 1284: 1275: 1274: 1272: 1271: 1265: 1263: 1259: 1258: 1256: 1255: 1250: 1245: 1240: 1235: 1230: 1225: 1220: 1215: 1210: 1205: 1200: 1195: 1190: 1185: 1180: 1179: 1178: 1168: 1163: 1158: 1153: 1148: 1143: 1137: 1135: 1131: 1130: 1128: 1127: 1122: 1117: 1112: 1107: 1102: 1097: 1091: 1089: 1085: 1084: 1082: 1081: 1076: 1071: 1066: 1061: 1056: 1051: 1046: 1041: 1036: 1031: 1026: 1021: 1015: 1013: 1009: 1008: 1000: 999: 992: 985: 977: 968: 967: 965: 964: 959: 954: 949: 948: 947: 942: 937: 927: 926: 925: 920: 910: 905: 900: 895: 890: 884: 882: 878: 877: 875: 874: 873: 872: 862: 857: 852: 847: 842: 837: 832: 827: 822: 816: 814: 810: 809: 806: 805: 803: 802: 797: 792: 786: 784: 780: 779: 777: 776: 771: 765: 763: 756: 752: 751: 749: 748: 743: 738: 733: 728: 722: 720: 716: 715: 707: 706: 699: 692: 684: 677: 676: 650: 628: 607: 582: 565: 562:on 2019-08-06. 539: 513: 511: 508: 507: 506: 501: 496: 491: 486: 479: 476: 468: 467: 466: 465: 459: 453: 447: 446: 445: 439: 433: 427: 418: 412: 400: 399: 398: 392: 386: 380: 374: 368: 362: 349: 346: 336: 335: 332: 328: 322: 321: 318: 314: 308: 307: 304: 300: 294: 293: 290: 286: 280: 279: 272: 266: 258: 257: 254: 251: 229: 226: 208:sliding window 179: 175: 169: 165: 161: 158: 155: 150: 146: 140: 136: 132: 129: 124: 120: 116: 111: 107: 84: 80: 76: 71: 67: 61: 57: 9: 6: 4: 3: 2: 1757: 1746: 1743: 1741: 1738: 1737: 1735: 1721: 1715: 1708: 1705: 1702: 1699: 1696: 1693: 1692: 1690: 1688: 1684: 1678: 1675: 1672: 1669: 1668: 1666: 1664: 1660: 1653: 1650: 1647: 1644: 1641: 1638: 1635: 1632: 1629: 1626: 1623: 1620: 1617: 1614: 1611: 1608: 1607: 1605: 1603: 1600:Security and 1598: 1592: 1589: 1587: 1584: 1582: 1579: 1578: 1576: 1574: 1570: 1563: 1560: 1557: 1554: 1553: 1551: 1549: 1545: 1538: 1535: 1532: 1529: 1526: 1523: 1520: 1517: 1514: 1511: 1508: 1505: 1502: 1499: 1496: 1493: 1491: 1488: 1485: 1482: 1479: 1476: 1473: 1470: 1467: 1464: 1461: 1458: 1455: 1452: 1449: 1446: 1443: 1442: 1440: 1437: 1433: 1429: 1421: 1418: 1417: 1416: 1413: 1409: 1406: 1405: 1404: 1401: 1397: 1394: 1393: 1392: 1389: 1385: 1382: 1380: 1377: 1375: 1372: 1370: 1367: 1366: 1365: 1362: 1358: 1355: 1353: 1350: 1349: 1348: 1345: 1341: 1338: 1337: 1336: 1333: 1332: 1330: 1327: 1323: 1319: 1315: 1312: 1305: 1300: 1298: 1293: 1291: 1286: 1285: 1282: 1270: 1267: 1266: 1264: 1260: 1254: 1251: 1249: 1246: 1244: 1241: 1239: 1236: 1234: 1231: 1229: 1226: 1224: 1221: 1219: 1216: 1214: 1211: 1209: 1206: 1204: 1201: 1199: 1196: 1194: 1191: 1189: 1186: 1184: 1181: 1177: 1174: 1173: 1172: 1169: 1167: 1164: 1162: 1159: 1157: 1154: 1152: 1149: 1147: 1144: 1142: 1139: 1138: 1136: 1132: 1126: 1123: 1121: 1118: 1116: 1113: 1111: 1108: 1106: 1103: 1101: 1098: 1096: 1093: 1092: 1090: 1086: 1080: 1077: 1075: 1072: 1070: 1067: 1065: 1062: 1060: 1057: 1055: 1052: 1050: 1047: 1045: 1042: 1040: 1037: 1035: 1032: 1030: 1027: 1025: 1022: 1020: 1017: 1016: 1014: 1010: 1005: 998: 993: 991: 986: 984: 979: 978: 975: 963: 960: 958: 955: 953: 950: 946: 943: 941: 938: 936: 933: 932: 931: 928: 924: 921: 919: 916: 915: 914: 911: 909: 906: 904: 901: 899: 896: 894: 891: 889: 886: 885: 883: 879: 871: 868: 867: 866: 863: 861: 858: 856: 853: 851: 848: 846: 843: 841: 838: 836: 833: 831: 828: 826: 823: 821: 818: 817: 815: 811: 801: 798: 796: 793: 791: 788: 787: 785: 781: 775: 772: 770: 767: 766: 764: 760: 757: 753: 747: 744: 742: 739: 737: 734: 732: 729: 727: 724: 723: 721: 717: 712: 705: 700: 698: 693: 691: 686: 685: 682: 665: 661: 654: 638: 632: 624: 623: 618: 611: 597: 593: 586: 575: 569: 561: 557: 550: 543: 529:on 2008-04-07 528: 524: 518: 514: 505: 502: 500: 497: 495: 492: 490: 487: 485: 482: 481: 475: 473: 463: 460: 457: 454: 451: 448: 443: 440: 437: 434: 431: 428: 425: 422: 421: 419: 416: 413: 410: 407: 406: 404: 401: 396: 393: 390: 387: 384: 381: 378: 375: 372: 369: 366: 363: 360: 357: 356: 355: 352: 351: 345: 343: 333: 329: 324: 323: 319: 315: 310: 309: 305: 301: 296: 295: 291: 287: 282: 281: 277: 273: 267: 260: 259: 255: 252: 249: 248: 245: 243: 239: 235: 225: 222: 220: 216: 213:algorithm in 212: 209: 206: 202: 198: 193: 177: 173: 167: 163: 159: 156: 153: 148: 144: 138: 134: 130: 127: 122: 118: 114: 109: 105: 82: 78: 74: 69: 65: 59: 55: 46: 42: 38: 34: 30: 26: 22: 18: 1719: 1621: 1602:cryptography 1489: 1176:AMT versions 1088:Discontinued 881:Instructions 820:Cool'n'Quiet 668:. Retrieved 664:the original 653: 641:. Retrieved 631: 625:. p. 3. 620: 610: 599:. Retrieved 585: 568: 560:the original 542: 531:. Retrieved 527:the original 517: 469: 409:Jaguar-based 365:Sandy Bridge 339: 275: 256:Description 250:Instruction 231: 223: 194: 45:finite field 20: 16: 15: 1586:MIPS16e ASE 1248:Thunderbolt 426:processors 1734:Categories 1314:extensions 1115:Intel Play 1054:Skulltrail 1024:Centrino 2 1006:technology 908:CVT16/F16C 855:AMD Wraith 845:Turbo Core 813:Technology 746:Xilinx ISE 713:technology 670:2011-03-11 601:2016-09-04 596:CloudFlare 533:2008-04-05 510:References 458:processors 452:processors 438:processors 432:processors 415:Puma-based 371:Ivy Bridge 201:CRC values 1403:Power ISA 1384:MIPS SIMD 1223:Omni-Path 1203:SpeedStep 1049:Ultrabook 1012:Platforms 840:PowerTune 835:PowerPlay 830:PowerNow! 755:Platforms 643:August 3, 622:Anandtech 397:processor 383:Broadwell 379:processor 373:processor 367:processor 242:Mnemonics 157:⋯ 75:… 1709:(AMD-Vi) 1262:Upcoming 1019:Centrino 870:Ryzen AI 783:Obsolete 719:Software 478:See also 395:Goldmont 359:Westmere 219:pngcrush 1610:PadLock 1525:AVX-512 1391:PA-RISC 1374:MIPS-3D 1134:Current 1064:Galileo 774:GPUOpen 762:Current 389:Skylake 377:Haswell 342:AVX-512 253:Opcode 211:DEFLATE 1703:(2006) 1697:(2005) 1673:(2013) 1654:(2021) 1648:(2015) 1642:(2015) 1636:(2013) 1630:(2012) 1628:RDRAND 1624:(2010) 1616:AES-NI 1612:(2003) 1564:(2014) 1539:(2023) 1533:(2022) 1527:(2015) 1521:(2013) 1509:(2009) 1503:(2009) 1497:(2008) 1490:(2007) 1483:(2006) 1477:(2006) 1471:(2004) 1465:(2001) 1459:(1999) 1453:(1998) 1451:3DNow! 1447:(1996) 1069:Edison 1039:Tablet 893:3DNow! 888:X86-64 860:Virtex 795:Dragon 790:Spider 741:Vivado 731:AMDGPU 1701:AMD-V 1622:CLMUL 1581:Thumb 1537:AVX10 1475:SSSE3 1415:SPARC 1335:Alpha 1125:MMC-2 1120:MMC-1 1074:Curie 1004:Intel 800:Horus 726:AGESA 577:(PDF) 556:Intel 552:(PDF) 354:Intel 278:(2). 33:Intel 31:from 21:CLMUL 1707:VT-d 1695:VT-x 1519:AVX2 1501:F16C 1487:SSE5 1481:SSE4 1469:SSE3 1463:SSE2 1432:SIMD 1369:MDMX 1364:MIPS 1352:NEON 1326:RISC 1322:SIMD 1156:vPro 1146:CNVi 1044:CULV 1029:Viiv 952:SSE5 940:BMI1 923:FMA3 918:FMA4 865:XDNA 850:ASTC 769:ROCm 645:2013 462:Zen2 456:Zen+ 217:and 215:zlib 205:LZ77 35:and 1677:ASF 1671:TSX 1652:TDX 1646:SGX 1640:MPX 1634:SHA 1591:RVC 1562:ADX 1556:BMI 1531:AMX 1513:FMA 1507:XOP 1495:AVX 1457:SSE 1445:MMX 1436:x86 1420:VIS 1408:VMX 1396:MAX 1379:MXU 1357:SVE 1347:ARM 1340:MVI 1079:Evo 1059:NUC 1034:MID 962:AES 957:ASF 945:TBM 935:ABM 930:BMI 913:FMA 903:XOP 898:AVX 711:AMD 450:Zen 403:AMD 37:AMD 25:x86 1736:: 619:. 594:. 554:. 474:. 405:: 344:. 276:GF 221:. 178:63 168:63 83:63 1722:. 1438:) 1434:( 1328:) 1324:( 1303:e 1296:t 1289:v 996:e 989:t 982:v 703:e 696:t 689:v 673:. 647:. 604:. 579:. 536:. 174:X 164:a 160:+ 154:+ 149:2 145:X 139:2 135:a 131:+ 128:X 123:1 119:a 115:+ 110:0 106:a 79:a 70:1 66:a 60:0 56:a 19:(

Index

x86
microprocessors
Intel
AMD
Intel Westmere processors
finite field
Galois/Counter Mode
CRC values
LZ77
sliding window
DEFLATE
zlib
pngcrush
carry-less product
128-bit XMM register
Mnemonics
AVX-512
Intel
Westmere
Sandy Bridge
Ivy Bridge
Haswell
Broadwell
Skylake
Goldmont
AMD
Jaguar-based
Puma-based
Bulldozer-based
Piledriver-based

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.