Knowledge

Non-blocking algorithm

Source đź“ť

313:
But in practice these lower bounds do not present a real barrier as spending a cache line or exclusive reservation granule (up to 2 KB on ARM) of store per thread in the shared memory is not considered too costly for practical systems (typically the amount of store logically required is a word,
394:
Some obstruction-free algorithms use a pair of "consistency markers" in the data structure. Processes reading the data structure first read one consistency marker, then read the relevant data into an internal buffer, then read the other marker, and then compare the markers. The data is consistent
362:
In general, a lock-free algorithm can run in four phases: completing one's own operation, assisting an obstructing operation, aborting an obstructing operation, and waiting. Completing one's own operation is complicated by the possibility of concurrent assistance and abortion, but is invariably the
325:
primitive, generally available on common hardware. Their construction expanded the lock-free queue of Michael and Scott, which is an efficient queue often used in practice. A follow-up paper by Kogan and Petrank provided a method for making wait-free algorithms fast and used this method to make the
382:
Obstruction-freedom is the weakest natural non-blocking progress guarantee. An algorithm is obstruction-free if at any point, a single thread executed in isolation (i.e., with all obstructing threads suspended) for a bounded number of steps will complete its operation. All lock-free algorithms are
338:
Lock-freedom allows individual threads to starve but guarantees system-wide throughput. An algorithm is lock-free if, when the program threads are run for a sufficiently long time, at least one of the threads makes progress (for some sensible definition of progress). All wait-free algorithms are
150:
thread cannot be resumed, progress is still possible without it. In contrast, global data structures protected by mutual exclusion cannot safely be accessed in an interrupt handler, as the preempted thread may be the one holding the lock. While this can be rectified by masking interrupt requests
182:
are almost always implemented using standard interfaces over these primitives (in the general case, critical sections will be blocking, even when implemented with these primitives). In the 1990s all non-blocking algorithms had to be written "natively" with the underlying primitives to achieve
358:
processes will succeed in finishing the operation in a finite number of steps and others might fail and retry on failure. The difference between wait-free and lock-free is that wait-free operation by each process is guaranteed to succeed in a finite number of steps, regardless of the other
111:
are all mechanisms by which a programmer can ensure that certain sections of code do not execute concurrently, if doing so would corrupt shared memory structures. If one thread attempts to acquire a lock that is already held by another thread, the thread will block until the lock is free.
373:
Correct concurrent assistance is typically the most complex part of a lock-free algorithm, and often very costly to execute: not only does the assisting thread slow down, but thanks to the mechanics of shared memory, the thread being assisted will be slowed, too, if it is still running.
247:
Non-blocking algorithms generally involve a series of read, read-modify-write, and write instructions in a carefully designed order. Optimizing compilers can aggressively re-arrange operations. Even when they don't, many modern CPUs often re-arrange such operations (they have a "weak
326:
wait-free queue practically as fast as its lock-free counterpart. A subsequent paper by Timnat and Petrank provided an automatic mechanism for generating wait-free data structures from lock-free ones. Thus, wait-free implementations are now available for many data-structures.
288:-freedom. An algorithm is wait-free if every operation has a bound on the number of steps the algorithm will take before the operation completes. This property is critical for real-time systems and is always nice to have as long as the performance cost is not too high. 386:
Obstruction-freedom demands only that any partially completed operation can be aborted and the changes made rolled back. Dropping concurrent assistance can often result in much simpler algorithms that are easier to validate. Preventing the system from continually
329:
Under reasonable assumptions, Alistarh, Censor-Hillel, and Shavit showed that lock-free algorithms are practically wait-free. So the added algorithmic complexity of a wait-free algorithm might not be worth the effort, if there are not hard deadlines to be met.
295:, have been demonstrated. However, the resulting performance does not in general match even naĂŻve blocking designs. Several papers have since improved the performance of universal constructions, but still, their performance is far below blocking designs. 395:
if the two markers are identical. Markers may be non-identical when the read is interrupted by another process updating the data structure. In such a case, the process discards the data in the internal buffer and tries again.
342:
In particular, if one thread is suspended, then a lock-free algorithm guarantees that the remaining threads can still make progress. Hence, if two threads can contend for the same mutex lock or spinlock, then the algorithm is
154:
A lock-free data structure can be used to improve performance. A lock-free data structure increases the amount of time spent in parallel execution rather than serial execution, improving performance on a
115:
Blocking a thread can be undesirable for many reasons. An obvious reason is that while the thread is blocked, it cannot accomplish anything: if the blocked thread had been performing a high-priority or
314:
but physically CAS operations on the same cache line will collide, and LL/SC operations in the same exclusive reservation granule will collide, so the amount of store physically required is greater).
370:. This may be very simple (assist higher priority operations, abort lower priority ones), or may be more optimized to achieve better throughput, or lower the latency of prioritized operations. 151:
during the critical section, this requires the code in the critical section to have bounded (and preferably short) running time, or excessive interrupt latency may be observed.
62:
if there is also guaranteed per-thread progress. "Non-blocking" was used as a synonym for "lock-free" in the literature until the introduction of obstruction-freedom in 2003.
240:
Read-copy-update with multiple writers and any number of readers. (The readers are wait-free; multiple writers generally serialize with a lock and are not obstruction-free).
496: 712: 276:, both of which supply types and functions that tell the compiler not to re-arrange such instructions, and to insert the appropriate memory barriers. 514: 298:
Several papers have investigated the difficulty of creating wait-free algorithms. For example, it has been shown that the widely available atomic
566: 310:, cannot provide starvation-free implementations of many common data structures without memory costs growing linearly in the number of threads. 237:
with a single writer and any number of readers. (The readers are wait-free; the writer is usually lock-free, until it needs to reclaim memory).
213:
Additionally, some non-blocking data structures are weak enough to be implemented without special atomic primitives. These exceptions include:
225: 46:
cannot cause failure or suspension of another thread; for some operations, these algorithms provide a useful alternative to traditional
350:
An algorithm is lock-free if infinitely often operation by some processors will succeed in a finite number of steps. For instance, if
636:
Proceedings of the 2005 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '05 : Chicago, Illinois
1089: 291:
It was shown in the 1980s that all algorithms can be implemented wait-free, and many transformations from serial code, called
1041: 998: 965: 897: 651: 142:
Unlike blocking algorithms, non-blocking algorithms do not suffer from these downsides, and in addition are safe for use in
135:. Using locks also involves a trade-off between coarse-grained locking, which can significantly reduce opportunities for 74: 224:, with a size which evenly divides the overflow of one of the available unsigned integer types, can unconditionally be 123:
Other problems are less obvious. For example, certain interactions between locks can lead to error conditions such as
139:, and fine-grained locking, which requires more careful design, increases locking overhead and is more prone to bugs. 932: 859: 823: 470: 244:
Several libraries internally use lock-free techniques, but it is difficult to write lock-free code that is correct.
716: 284:
Wait-freedom is the strongest non-blocking guarantee of progress, combining guaranteed system-wide throughput with
221: 591: 86: 775: 731: 985:. Proc. 17th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPOPP). pp. 357–368. 952:. Proc. 17th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPOPP). pp. 141–150. 884:. Proc. 16th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPOPP). pp. 223–234. 634:
Harris, Tim; Marlow, S.; Peyton-Jones, S.; Herlihy, M. (June 15–17, 2005). "Composable memory transactions".
184: 1064: 1084: 409: 1059: 518: 404: 307: 124: 96: 69:
that could route a connection through a set of relays "without having to re-arrange existing calls" (see
317:
Wait-free algorithms were rare until 2011, both in research and in practice. However, in 2011 Kogan and
366:
The decision about when to assist, abort or wait when an obstruction is met is the responsibility of a
66: 269: 535: 1079: 611: 147: 419: 104: 92: 47: 791: 917:. Proc. 15th Annual ACM Symp. on Principles of Distributed Computing (PODC). pp. 267–275. 606: 530: 203: 199: 195: 159:, because access to the shared data structure does not need to be serialized to stay coherent. 39: 876: 347:
lock-free. (If we suspend one thread that holds the lock, then the second thread will block.)
73:). Also, if the telephone exchange "is not defective, it can always make the connection" (see 462: 455: 453:
Göetz, Brian; Peierls, Tim; Bloch, Joshua; Bowbeer, Joseph; Holmes, David; Lea, Doug (2006).
846:. Proc. 23rd Annual ACM Symp.on Principles of Distributed Computing (PODC). pp. 80–87. 156: 487: 8: 434: 285: 117: 55: 1019: 808:. Proc. 7th Annual ACM Symp. on Principles of Distributed Computing. pp. 276–290. 657: 548: 429: 136: 132: 43: 1018:. Proc. 46th Annual ACM Symposium on Theory of Computing (STOC’14). pp. 714–723. 1037: 994: 961: 928: 893: 855: 819: 647: 567:"A fast, lock-free approach for efficient parallel counting of occurrences of k-mers" 510: 466: 249: 171: 143: 661: 574: 1029: 986: 953: 918: 885: 847: 809: 639: 616: 570: 552: 540: 424: 322: 303: 234: 179: 175: 108: 100: 27: 20: 915:
Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms
218: 168: 210:. These allow programs to easily exchange data between threads asynchronously. 253: 229: 191: 1073: 1033: 990: 957: 889: 851: 643: 577: 318: 70: 923: 620: 544: 839: 814: 677:- C++ library of lock-free containers and safe memory reclamation schema 187:
promises standard abstractions for writing efficient non-blocking code.
174:
primitives that the hardware must provide, the most notable of which is
207: 16:
Algorithm in a thread whose failure cannot cause another thread to fail
792:"Safety: off: How not to shoot yourself in the foot with C++ atomics" 31: 806:
Impossibility and universality results for wait-free synchronization
762: 749: 735: 489:
Obstruction-Free Synchronization: Double-Ended Queues as an Example
414: 388: 128: 1024: 844:
On the inherent weakness of conditional synchronization primitives
674: 257: 91:
The traditional approach to multi-threaded programming is to use
509: 701:- A C library for non-blocking system design and implementation 633: 983:
A Practical Wait-Free Simulation for Lock-Free Data Structures
627: 698: 1013: 686: 354:
processors are trying to execute an operation, some of the
120:
task, it would be highly undesirable to halt its progress.
65:
The word "non-blocking" was traditionally used to describe
1016:
Are Lock-Free Concurrent Algorithms Practically Wait-Free?
1014:
Alistarh, Dan; Censor-Hillel, Keren; Shavit, Nir (2014).
704: 497:
International Conference on Distributed Computing Systems
452: 583: 183:
acceptable performance. However, the emerging field of
878:
Wait-free queues with multiple enqueuers and dequeuers
689:- A library of lock-free data structures, written in C 950:
A method for creating fast wait-free data structures
485: 190:
Much research has also been done in providing basic
461:. Upper Saddle River, NJ: Addison-Wesley. p.  454: 838: 723: 167:With few exceptions, non-blocking algorithms use 1071: 519:"Experience with Processes and Monitors in Mesa" 592:"Language support for lightweight transactions" 590:Harris, Tim; Fraser, Keir (26 November 2003). 980: 974: 486:Herlihy, M.; Luchangco, V.; Moir, M. (2003). 947: 912: 874: 786: 784: 589: 321:presented a wait-free queue building on the 732:"Writing Lock-Free Code: A Corrected Queue" 713:"Lock-Free Code: A False Sense of Security" 638:. New York, NY: ACM Press. pp. 48–60. 1023: 922: 832: 813: 781: 610: 534: 1060:An Introduction to Lock-Free Programming 906: 868: 750:"Writing a Generalized Concurrent Queue" 569:. Bioinformatics (2011) 27(6): 764-770. 256:is used to tell the CPU not to reorder. 941: 913:Michael, Maged; Scott, Michael (1996). 803: 565:Guillaume Marçais, and Carl Kingsford. 479: 1072: 981:Timnat, Shahar; Petrank, Erez (2014). 842:; Hendler, Danny; Shavit, Nir (2004). 410:Java ConcurrentMap#Lock-free atomicity 377: 1007: 391:is the task of a contention manager. 99:. Synchronization primitives such as 503: 948:Kogan, Alex; Petrank, Erez (2012). 875:Kogan, Alex; Petrank, Erez (2011). 797: 75:nonblocking minimal spanning switch 54:if there is guaranteed system-wide 13: 14: 1101: 1053: 162: 95:to synchronize access to shared 776:"ARM and Lock-Free Programming" 768: 755: 742: 333: 279: 1090:Concurrency control algorithms 692: 680: 668: 559: 446: 217:a single-reader single-writer 50:. A non-blocking algorithm is 1: 575:10.1093/bioinformatics/btr011 440: 185:software transactional memory 80: 804:Herlihy, Maurice P. (1988). 457:Java concurrency in practice 363:fastest path to completion. 7: 398: 67:telecommunications networks 10: 1106: 84: 18: 523:Communications of the ACM 763:"The Trouble With Locks" 48:blocking implementations 19:Not to be confused with 1065:Non-blocking Algorithms 1034:10.1145/2591796.2591836 991:10.1145/2692916.2555261 958:10.1145/2145816.2145835 890:10.1145/1941553.1941585 852:10.1145/1011767.1011780 644:10.1145/1065944.1065952 578:"Jellyfish mer counter" 420:Lock (computer science) 293:universal constructions 176:compare and swap (CAS) 87:Disadvantages of locks 924:10.1145/248052.248106 621:10.1145/949343.949340 545:10.1145/358818.358824 272:programmers can use 260:programmers can use 157:multi-core processor 1085:Concurrency control 815:10.1145/62546.62593 599:ACM SIGPLAN Notices 435:Resource starvation 378:Obstruction-freedom 274:<stdatomic.h> 790:Anthony Williams. 430:Priority inversion 383:obstruction-free. 368:contention manager 226:implemented safely 146:: even though the 144:interrupt handlers 133:priority inversion 1043:978-1-4503-2710-7 1000:978-1-4503-2656-8 967:978-1-4503-1160-1 899:978-1-4503-0119-0 653:978-1-59593-080-4 517:(February 1980). 511:Butler W. Lampson 250:consistency model 180:Critical sections 172:read-modify-write 109:critical sections 1097: 1048: 1047: 1027: 1011: 1005: 1004: 978: 972: 971: 945: 939: 938: 926: 910: 904: 903: 883: 872: 866: 865: 836: 830: 829: 817: 801: 795: 788: 779: 772: 766: 759: 753: 746: 740: 739: 734:. Archived from 727: 721: 720: 715:. Archived from 708: 702: 696: 690: 684: 678: 672: 666: 665: 631: 625: 624: 614: 596: 587: 581: 563: 557: 556: 538: 507: 501: 500: 494: 483: 477: 476: 460: 450: 425:Mutual exclusion 275: 267: 263: 235:Read-copy-update 28:computer science 21:non-blocking I/O 1105: 1104: 1100: 1099: 1098: 1096: 1095: 1094: 1080:Synchronization 1070: 1069: 1056: 1051: 1044: 1012: 1008: 1001: 979: 975: 968: 946: 942: 935: 911: 907: 900: 881: 873: 869: 862: 837: 833: 826: 802: 798: 789: 782: 773: 769: 760: 756: 747: 743: 730: 728: 724: 711: 709: 705: 699:Concurrency Kit 697: 693: 685: 681: 673: 669: 654: 632: 628: 594: 588: 584: 564: 560: 536:10.1.1.142.5765 515:David D. Redell 508: 504: 492: 484: 480: 473: 451: 447: 443: 401: 380: 357: 353: 336: 282: 273: 265: 261: 192:data structures 165: 89: 83: 24: 17: 12: 11: 5: 1103: 1093: 1092: 1087: 1082: 1068: 1067: 1062: 1055: 1054:External links 1052: 1050: 1049: 1042: 1006: 999: 973: 966: 940: 933: 905: 898: 867: 860: 831: 824: 796: 794:. 2015. p. 20. 780: 774:Bruce Dawson. 767: 754: 741: 738:on 2008-12-05. 722: 719:on 2015-09-01. 703: 691: 679: 667: 652: 626: 612:10.1.1.58.8466 582: 558: 529:(2): 105–117. 502: 499:. p. 522. 478: 471: 444: 442: 439: 438: 437: 432: 427: 422: 417: 412: 407: 400: 397: 379: 376: 355: 351: 335: 332: 281: 278: 266:<atomic> 254:memory barrier 242: 241: 238: 232: 230:memory barrier 164: 163:Implementation 161: 85:Main article: 82: 79: 38:if failure or 15: 9: 6: 4: 3: 2: 1102: 1091: 1088: 1086: 1083: 1081: 1078: 1077: 1075: 1066: 1063: 1061: 1058: 1057: 1045: 1039: 1035: 1031: 1026: 1021: 1017: 1010: 1002: 996: 992: 988: 984: 977: 969: 963: 959: 955: 951: 944: 936: 934:0-89791-800-2 930: 925: 920: 916: 909: 901: 895: 891: 887: 880: 879: 871: 863: 861:1-58113-802-4 857: 853: 849: 845: 841: 835: 827: 825:0-89791-277-2 821: 816: 811: 807: 800: 793: 787: 785: 777: 771: 764: 761:Herb Sutter. 758: 751: 748:Herb Sutter. 745: 737: 733: 729:Herb Sutter. 726: 718: 714: 710:Herb Sutter. 707: 700: 695: 688: 683: 676: 671: 663: 659: 655: 649: 645: 641: 637: 630: 622: 618: 613: 608: 604: 600: 593: 586: 579: 576: 572: 568: 562: 554: 550: 546: 542: 537: 532: 528: 524: 520: 516: 512: 506: 498: 491: 490: 482: 474: 472:9780321349606 468: 464: 459: 458: 449: 445: 436: 433: 431: 428: 426: 423: 421: 418: 416: 413: 411: 408: 406: 403: 402: 396: 392: 390: 384: 375: 371: 369: 364: 360: 348: 346: 340: 331: 327: 324: 320: 315: 311: 309: 305: 301: 296: 294: 289: 287: 277: 271: 259: 255: 252:"), unless a 251: 245: 239: 236: 233: 231: 228:using only a 227: 223: 220: 216: 215: 214: 211: 209: 205: 201: 197: 193: 188: 186: 181: 177: 173: 170: 160: 158: 152: 149: 145: 140: 138: 134: 130: 126: 121: 119: 113: 110: 106: 102: 98: 94: 88: 78: 76: 72: 68: 63: 61: 57: 53: 49: 45: 41: 37: 33: 29: 22: 1015: 1009: 982: 976: 949: 943: 914: 908: 877: 870: 843: 834: 805: 799: 770: 757: 744: 736:the original 725: 717:the original 706: 694: 682: 670: 635: 629: 602: 598: 585: 561: 526: 522: 505: 488: 481: 456: 448: 393: 389:live-locking 385: 381: 372: 367: 365: 361: 359:processors. 349: 344: 341: 337: 334:Lock-freedom 328: 316: 312: 302:primitives, 299: 297: 292: 290: 283: 280:Wait-freedom 246: 243: 212: 189: 166: 153: 141: 122: 114: 90: 71:Clos network 64: 59: 51: 36:non-blocking 35: 25: 840:Fich, Faith 605:(11): 388. 339:lock-free. 300:conditional 262:std::atomic 219:ring buffer 208:hash tables 137:parallelism 1074:Categories 441:References 286:starvation 105:semaphores 81:Motivation 40:suspension 34:is called 1025:1311.3200 607:CiteSeerX 531:CiteSeerX 148:preempted 118:real-time 97:resources 60:wait-free 52:lock-free 32:algorithm 662:53245159 415:Liveness 405:Deadlock 399:See also 194:such as 129:livelock 125:deadlock 56:progress 687:liblfds 553:1594544 495:. 23rd 319:Petrank 101:mutexes 42:of any 1040:  997:  964:  931:  896:  858:  822:  675:libcds 660:  650:  609:  551:  533:  469:  268:, and 206:, and 200:queues 196:stacks 169:atomic 131:, and 107:, and 58:, and 44:thread 1020:arXiv 882:(PDF) 658:S2CID 595:(PDF) 549:S2CID 493:(PDF) 308:LL/SC 258:C++11 93:locks 30:, an 1038:ISBN 995:ISBN 962:ISBN 929:ISBN 894:ISBN 856:ISBN 820:ISBN 648:ISBN 467:ISBN 306:and 222:FIFO 204:sets 1030:doi 987:doi 954:doi 919:doi 886:doi 848:doi 810:doi 640:doi 617:doi 571:doi 541:doi 345:not 323:CAS 304:CAS 270:C11 264:in 77:). 26:In 1076:: 1036:. 1028:. 993:. 960:. 927:. 892:. 854:. 818:. 783:^ 656:. 646:. 615:. 603:38 601:. 597:. 547:. 539:. 527:23 525:. 521:. 513:; 465:. 463:41 202:, 198:, 178:. 127:, 103:, 1046:. 1032:: 1022:: 1003:. 989:: 970:. 956:: 937:. 921:: 902:. 888:: 864:. 850:: 828:. 812:: 778:. 765:. 752:. 664:. 642:: 623:. 619:: 580:. 573:: 555:. 543:: 475:. 356:N 352:N 23:.

Index

non-blocking I/O
computer science
algorithm
suspension
thread
blocking implementations
progress
telecommunications networks
Clos network
nonblocking minimal spanning switch
Disadvantages of locks
locks
resources
mutexes
semaphores
critical sections
real-time
deadlock
livelock
priority inversion
parallelism
interrupt handlers
preempted
multi-core processor
atomic
read-modify-write
compare and swap (CAS)
Critical sections
software transactional memory
data structures

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑