38:
356:
3035:(PDF). University of California, Berkeley. Technical Report No. UCB/EECS-2006-183. "Old : Increasing clock frequency is the primary method of improving processor performance. New : Increasing parallelism is the primary method of improving processor performance… Even representatives from Intel, a company generally associated with the 'higher clock-speed is better' position, warned that traditional approaches to maximizing performance through maximizing clock speed have been pushed to their limits."
2674:
100:
1317:
1470:
5920:
1423:
1921:
5119:
2240:
2821:, which claims that "mind is formed from many little agents, each mindless by itself". The theory attempts to explain how what we call intelligence could be a product of the interaction of non-intelligent parts. Minsky says that the biggest source of ideas about the theory came from his work in trying to create a machine that uses a robotic arm, a video camera, and a computer to build with children's blocks.
344:
2102:
1387:
567:= 0.9), we can get no more than a 10 times speedup, regardless of how many processors are added. This puts an upper limit on the usefulness of adding more parallel execution units. "When a task cannot be partitioned because of sequential constraints, the application of more effort has no effect on the schedule. The bearing of a child takes nine months, no matter how many women are assigned."
571:
1840:
563:, it shows that a small part of the program which cannot be parallelized will limit the overall speedup available from parallelization. A program solving a large mathematical or engineering problem will typically consist of several parallelizable parts and several non-parallelizable (serial) parts. If the non-parallelizable part of a program accounts for 10% of the runtime (
1616:
2486:—; this information can be used to restore the program if the computer should fail. Application checkpointing means that the program has to restart from only its last checkpoint rather than the beginning. While checkpointing provides benefits in a variety of situations, it is especially useful in highly parallel systems with a large number of processors used in
2344:) have been created for programming parallel computers. These can generally be divided into classes based on the assumptions they make about the underlying memory architecture—shared memory, distributed memory, or shared distributed memory. Shared memory programming languages communicate by manipulating shared memory variables. Distributed memory uses
4538:. Bibliothèque Universelle de Genève. Retrieved on November 7, 2007. quote: "when a long series of identical computations is to be performed, such as those required for the formation of numerical tables, the machine can be brought into play so as to give several results at the same time, which will greatly abridge the whole amount of the processes."
1654:—small and fast memories located close to the processor which store temporary copies of memory values (nearby in both the physical and logical sense). Parallel computer systems have difficulties with caches that may store the same value in more than one location, with the possibility of incorrect program execution. These computers require a
2783:. However, ILLIAC IV was called "the most infamous of supercomputers", because the project was only one-fourth completed, but took 11 years and cost almost four times the original estimate. When it was finally ready to run its first real application in 1976, it was outperformed by existing commercial supercomputers such as the
4333:." University of California, San Diego. "Future design for manufacturing (DFM) technology must reduce design cost and directly address manufacturing —the cost of a mask set and probe card—which is well over $ 1 million at the 90 nm technology node and creates a significant damper on semiconductor-based innovation."
216:
can be diverse and include resources such as a single computer with multiple processors, several networked computers, specialized hardware, or any combination of the above. Historically parallel computing was used for scientific computing and the simulation of scientific problems, particularly in the natural and
1662:
is one of the most common methods for keeping track of which values are being accessed (and thus should be purged). Designing large, high-performance cache coherence systems is a very difficult problem in computer architecture. As a result, shared memory computer architectures do not scale as well as
215:
Parallel computing, on the other hand, uses multiple processing elements simultaneously to solve a problem. This is accomplished by breaking the problem into independent parts so that each processing element can execute its part of the algorithm simultaneously with the others. The processing elements
1888:
Because grid computing systems (described below) can easily handle embarrassingly parallel problems, modern clusters are typically designed to handle more difficult problems—problems that require nodes to share intermediate results with each other more often. This requires a high bandwidth and, more
1112:
locks multiple variables all at once. If it cannot lock all of them, it does not lock any of them. If two threads each need to lock the same two variables using non-atomic locks, it is possible that one thread will lock one of them and the second thread will lock the second variable. In such a case,
1717:
Parallel computers can be roughly classified according to the level at which the hardware supports parallelism. This classification is broadly analogous to the distance between basic computing nodes. These are not mutually exclusive; for example, clusters of symmetric multiprocessors are relatively
1148:
Applications are often classified according to how often their subtasks need to synchronize or communicate with each other. An application exhibits fine-grained parallelism if its subtasks must communicate many times per second; it exhibits coarse-grained parallelism if they do not communicate many
911:
Violation of the first condition introduces a flow dependency, corresponding to the first segment producing a result used by the second segment. The second condition represents an anti-dependency, when the second segment produces a variable needed by the first segment. The third and final condition
334:
can ensure that different tasks and user programs are run in parallel on the available cores. However, for a serial software program to take full advantage of the multi-core architecture the programmer needs to restructure and parallelize the code. A speed-up of application software runtime will no
2217:) tend to wipe out these gains in only one or two chip generations. High initial cost, and the tendency to be overtaken by Moore's-law-driven general-purpose computing, has rendered ASICs unfeasible for most parallel computing applications. However, some have been built. One example is the PFLOPS
1805:
prevents bus architectures from scaling. As a result, SMPs generally do not comprise more than 32 processors. Because of the small size of the processors and the significant reduction in the requirements for bus bandwidth achieved by large caches, such symmetric multiprocessors are extremely
1533:
Task parallelisms is the characteristic of a parallel program that "entirely different calculations can be performed on either the same or different sets of data". This contrasts with data parallelism, where the same calculation is performed on the same or different sets of data. Task parallelism
581:
Amdahl's law only applies to cases where the problem size is fixed. In practice, as more computing resources become available, they tend to get used on larger problems (larger datasets), and the time spent in the parallelizable part often grows much faster than the inherently serial work. In this
4391:
However, the holy grail of such research—automated parallelization of serial programs—has yet to materialize. While automated parallelization of certain classes of algorithms has been demonstrated, such success has largely been limited to scientific and numeric applications with predictable flow
4519:
Patterson and
Hennessy, pp. 749–50: "Although successful in pushing several technologies useful in later projects, the ILLIAC IV failed as a computer. Costs escalated from the $ 8 million estimated in 1966 to $ 31 million by 1972, despite the construction of only a quarter of the
2213:. This process requires a mask set, which can be extremely expensive. A mask set can cost over a million US dollars. (The smaller the transistors required for the chip, the more expensive the mask will be.) Meanwhile, performance increases in general-purpose computing over time (as described by
1939:
A massively parallel processor (MPP) is a single computer with many networked processors. MPPs have many of the same characteristics as clusters, but MPPs have specialized interconnect networks (whereas clusters use commodity hardware for networking). MPPs also tend to be larger than clusters,
1135:
Not all parallelization results in speed-up. Generally, as a task is split up into more and more threads, those threads spend an ever-increasing portion of their time communicating with each other or waiting on each other for access to resources. Once the overhead from resource contention or
322:
it can be predicted that the number of cores per processor will double every 18–24 months. This could mean that after 2020 a typical processor will have dozens or hundreds of cores, however in reality the standard is somewhere in the region of 4 to 16 cores, with some designs having a mix of
392:
from parallelization would be linear—doubling the number of processing elements should halve the runtime, and doubling it a second time should again halve the runtime. However, very few parallel algorithms achieve optimal speedup. Most of them have a near-linear speedup for small numbers of
2657:
if the results differ. These methods can be used to help prevent single-event upsets caused by transient errors. Although additional measures may be required in embedded or specialized systems, this method can provide a cost-effective approach to achieve n-modular redundancy in commercial
1893:
interconnection network. Many historic and current supercomputers use customized high-performance network hardware specifically designed for cluster computing, such as the Cray Gemini network. As of 2014, most current supercomputers use some off-the-shelf standard network hardware, often
242:
of a program is equal to the number of instructions multiplied by the average time per instruction. Maintaining everything else constant, increasing the clock frequency decreases the average time it takes to execute an instruction. An increase in frequency thus decreases runtime for all
1363:
microprocessors were replaced with 8-bit, then 16-bit, then 32-bit microprocessors. This trend generally came to an end with the introduction of 32-bit processors, which has been a standard in general-purpose computing for two decades. Not until the early 2000s, with the advent of
1850:
A cluster is a group of loosely coupled computers that work together closely, so that in some respects they can be regarded as a single computer. Clusters are composed of multiple standalone machines connected by a network. While machines in a cluster do not have to be symmetric,
351:. The speedup of a program from parallelization is limited by how much of the program can be parallelized. For example, if 90% of the program can be parallelized, the theoretical maximum speedup using parallel computing would be 10 times no matter how many processors are used.
1825:", "parallel computing", and "distributed computing" have a lot of overlap, and no clear distinction exists between them. The same system may be characterized both as "parallel" and "distributed"; the processors in a typical distributed system run concurrently in parallel.
1770:
is the best known) was an early form of pseudo-multi-coreism. A processor capable of concurrent multithreading includes multiple execution units in the same processing unit—that is it has a superscalar architecture—and can issue multiple instructions per clock cycle from
1534:
involves the decomposition of a task into sub-tasks and then allocating each sub-task to a processor for execution. The processors would then execute these sub-tasks concurrently and often cooperatively. Task parallelism does not usually scale with the size of a problem.
652:
Both Amdahl's law and
Gustafson's law assume that the running time of the serial part of the program is independent of the number of processors. Amdahl's law assumes that the entire problem is of fixed size so that the total amount of work to be done in parallel is also
1418:
and combined into groups which are then executed in parallel without changing the result of the program. This is known as instruction-level parallelism. Advances in instruction-level parallelism dominated computer architecture from the mid-1980s until the mid-1990s.
681:), since calculations that depend upon prior calculations in the chain must be executed in order. However, most algorithms do not consist of just a long chain of dependent calculations; there are usually opportunities to execute independent calculations in parallel.
1340:—the amount of information the processor can manipulate per cycle. Increasing the word size reduces the number of instructions the processor must execute to perform an operation on variables whose sizes are greater than the length of the word. For example, where an
1277:
The single-instruction-single-data (SISD) classification is equivalent to an entirely sequential program. The single-instruction-multiple-data (SIMD) classification is analogous to doing the same operation repeatedly over a large data set. This is commonly done in
1093:—unable to proceed until V is unlocked again. This guarantees correct execution of the program. Locks may be necessary to ensure correct program execution when threads must serialize access to resources, but their use can greatly slow a program and may affect its
1297:, "Some machines are hybrids of these categories, of course, but this classic model has survived because it is simple, easy to understand, and gives a good first approximation. It is also—perhaps because of its understandability—the most widely used scheme."
371:
takes roughly 25% of the time of the whole computation. By working very hard, one may be able to make this part 5 times faster, but this only reduces the time for the whole computation by a little. In contrast, one may need to perform less work to make part
2082:
technology to third-party vendors has become the enabling technology for high-performance reconfigurable computing. According to
Michael R. D'Amour, Chief Operating Officer of DRC Computer Corporation, "when we first walked into AMD, they called us 'the
302:(CPU or processor) manufacturers started to produce power efficient processors with multiple cores. The core is the computing unit of the processor and in multi-core processors each core is independent and can access the same memory concurrently.
2993:(PDF). Parallel@Illinois, University of Illinois at Urbana-Champaign. "The main techniques for these performance benefits—increased clock frequency and smarter but increasingly complex architectures—are now hitting the so-called power wall. The
2208:
Because an ASIC is (by definition) specific to a given application, it can be fully optimized for that application. As a result, for a given application, an ASIC tends to outperform a general-purpose computer. However, ASICs are created by
1136:
communication dominates the time spent on other computation, further parallelization (that is, splitting the workload over even more threads) increases rather than decreases the amount of time required to finish. This problem, known as
1030:. A lock is a programming language construct that allows one thread to take control of a variable and prevent other threads from reading or writing it, until that variable is unlocked. The thread holding the lock is free to execute its
2251:
A vector processor is a CPU or computer system that can execute the same instruction on large sets of data. Vector processors have high-level operations that work on linear arrays of numbers or vectors. An example vector operation is
506:
2371:
for hybrid multi-core parallel programming. The OpenHMPP directive-based programming model offers a syntax to efficiently offload computations on hardware accelerators and to optimize data movement to/from the hardware memory using
1820:
A distributed computer (also known as a distributed memory multiprocessor) is a distributed memory computer system in which the processing elements are connected by a network. Distributed computers are highly scalable. The terms
1600:. This model allows processes on one compute node to transparently access the remote memory of another compute node. All compute nodes are also connected to an external shared memory system via high-speed interconnect, such as
1940:
typically having "far more" than 100 processors. In an MPP, "each CPU contains its own memory and copy of the operating system and application. Each subsystem communicates with the others via a high-speed interconnect."
1591:
combine the two approaches, where the processing element has its own local memory and access to the memory on non-local processors. Accesses to local memory are typically faster than accesses to non-local memory. On the
1034:(the section of a program that requires exclusive access to some variable), and to unlock the data when it is finished. Therefore, to guarantee correct program execution, the above program can be rewritten to use locks:
921:
In this example, instruction 3 cannot be executed before (or even in parallel with) instruction 2, because instruction 3 uses a result from instruction 2. It violates condition 1, and thus introduces a flow dependency.
1174:. Flynn classified programs and computers by whether they were operating using a single set or multiple sets of instructions, and whether or not those instructions were using a single set or multiple sets of data.
2412:
is the "holy grail" of parallel computing, especially with the aforementioned limit of processor frequency. Despite decades of work by compiler researchers, automatic parallelization has had only limited success.
2130:
In the early days, GPGPU programs used the normal graphics APIs for executing programs. However, several new programming languages and platforms have been built to do general purpose computation on GPUs with both
5109:
4651:
1744:
and can issue multiple instructions per clock cycle from one instruction stream (thread); in contrast, a multi-core processor can issue multiple instructions per clock cycle from multiple instruction streams.
1583:(in which each processing element has its own local address space). Distributed memory refers to the fact that the memory is logically distributed, but often implies that it is physically distributed as well.
1355:
from the lower order addition; thus, an 8-bit processor requires two instructions to complete a single operation, where a 16-bit processor would be able to complete the operation with a single instruction.
1760:, is a prominent multi-core processor. Each core in a multi-core processor can potentially be superscalar as well—that is, on every clock cycle, each core can issue multiple instructions from one thread.
4392:
control (e.g., nested loop structures with statically determined iteration counts) and statically analyzable memory access patterns. (e.g., walks over large multidimensional arrays of float-point data).
1709:
to enable the passing of messages between nodes that are not directly connected. The medium used for communication between the processors is likely to be hierarchical in large multiprocessor machines.
2824:
Similar models (which also view the biological brain as a massively parallel computer, i.e., the brain is made up of a constellation of independent or semi-independent agents) were also described by:
1351:, the processor must first add the 8 lower-order bits from each integer using the standard addition instruction, then add the 8 higher-order bits using an add-with-carry instruction and the
2733:. In 1967, Amdahl and Slotnick published a debate about the feasibility of parallel processing at American Federation of Information Processing Societies Conference. It was during this debate that
4128:
All simulated circuits were described in very high speed integrated circuit (VHSIC) hardware description language (VHDL). Hardware modeling was performed on Xilinx FPGA Artix 7 xc7a200tfbg484-2.
4220:"Systematic Generation of Executing Programs for Processor Elements in Parallel ASIC or FPGA-Based Systems and Their Transformation into VHDL-Descriptions of Processor Element Control Units".
335:
longer be achieved through frequency scaling, instead programmers will need to parallelize their software code to take advantage of the increasing computing power of multicore architectures.
2286:
computers became famous for their vector-processing computers in the 1970s and 1980s. However, vector processors—both as CPUs and as full computer systems—have generally disappeared. Modern
5107:
906:
856:
806:
647:
1987:(software that sits between the operating system and the application to manage network resources and standardize the software interface). The most common grid computing middleware is the
119:, each core performing a task independently. On the other hand, concurrency enables a program to deal with multiple tasks even on a single CPU core; the core switches between tasks (i.e.
64:
are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing:
6051:
146:
use multiple computers to work on the same task. Specialized parallel computer architectures are sometimes used alongside traditional processors, for accelerating specific tasks.
287:
is the processor frequency (cycles per second). Increases in frequency increase the amount of power used in a processor. Increasing processor power consumption led ultimately to
1477:
processor with two execution units. In the best case scenario, it takes one clock cycle to complete two instructions and thus the processor can issue superscalar performance (
4189:
D'Amour, Michael R., Chief
Operating Officer, DRC Computer Corporation. "Standard Reconfigurable Computing". Invited speaker at the University of Delaware, February 28, 2007.
979:
that is shared between them. Without synchronization, the instructions between the two threads may be interleaved in any order. For example, consider the following program:
5108:
2756:
in the 1970s, was among the first multiprocessors with more than a few processors. The first bus-connected multiprocessor with snooping caches was the
Synapse N+1 in 1984.
2997:
has accepted that future performance increases must largely come from increasing the number of processors (or cores) on a die, rather than making a single core go faster."
2779:. The key to its design was a fairly high parallelism, with up to 256 processors, which allowed the machine to work on large datasets in what would later be known as
931:
Bernstein's conditions do not allow memory to be shared between different processes. For that, some means of enforcing an ordering between accesses is necessary, such as
4721:
4239:
Shimokawa, Y.; Fuwa, Y.; Aramaki, N. (18–21 November 1991). "A parallel ASIC VLSI neurocomputer for a large number of neurons and billion connections per second speed".
2796:
1402:
A computer program is, in essence, a stream of instructions executed by a processor. Without instruction-level parallelism, a processor can only issue less than one
88:. As power consumption (and consequently heat generation) by computers has become a concern in recent years, parallel computing has become the dominant paradigm in
4005:
1623:(NUMA) architecture. Processors in one directory can access that directory's memory with less latency than they can access memory in the other directory's memory.
3959:
1018:
If instruction 1B is executed between 1A and 3A, or if instruction 1A is executed between 1B and 3B, the program will produce incorrect data. This is known as a
405:
5957:
5246:
1976:
to work on a given problem. Because of the low bandwidth and extremely high latency available on the
Internet, distributed computing typically deals only with
1286:), few applications that fit this class materialized. Multiple-instruction-multiple-data (MIMD) programs are by far the most common type of parallel programs.
1132:, altogether avoids the use of locks and barriers. However, this approach is generally difficult to implement and requires correctly designed data structures.
6021:
2482:
is a technique whereby the computer system takes a "snapshot" of the application—a record of all current resource allocations and variable states, akin to a
2325:
2320:
1282:
applications. Multiple-instruction-single-data (MISD) is a rarely used classification. While computer architectures to deal with this were devised (such as
4565:. 1988. p. 8 quote: "The earliest reference to parallelism in computer design is thought to be in General L. F. Menabrea's publication in… 1842, entitled
4033:
5970:
1988:
4520:
planned machine . It was perhaps the most infamous of supercomputers. The project started in 1965 and ran its first real application in 1976."
3823:
3794:
1666:
Processor–processor and processor–memory communication can be implemented in hardware in several ways, including via shared (either multiported or
1462:
processor, with five stages: instruction fetch (IF), instruction decode (ID), execute (EX), memory access (MEM), and register write back (WB). The
4626:
4358:
3497:
3371:
2498:
As parallel computers become larger and faster, we are now able to solve problems that had previously taken too long to run. Fields as varied as
1503:
in that the several execution units are not entire processors (i.e. processing units). Instructions can be grouped together only if there is no
1430:
processor. In the best case scenario, it takes one clock cycle to complete one instruction and thus the processor can issue scalar performance (
6120:
3692:
2986:
6055:
5336:
1442:. Each stage in the pipeline corresponds to a different action the processor performs on that instruction in that stage; a processor with an
123:) without necessarily completing each one. A program can have both, neither or a combination of parallelism and concurrency characteristics.
4326:
3208:
2717:
In April 1958, Stanley Gill (Ferranti) discussed parallel programming and the need for branching and waiting. Also in 1958, IBM researchers
5188:
4096:
Valueva, Maria; Valuev, Georgii; Semyonova, Nataliya; Lyakhov, Pavel; Chervyakov, Nikolay; Kaplun, Dmitry; Bogaevskiy, Danil (2019-06-20).
912:
represents an output dependency: when two segments write to the same location, the result comes from the logically last executed segment.
2449:
2036:(FPGA) as a co-processor to a general-purpose computer. An FPGA is, in essence, a computer chip that can rewire itself for a given task.
5950:
3571:
1640:
1572:
2887:
310:. Thus parallelization of serial programs has become a mainstream programming task. In 2012 quad-core processors became standard for
4441:
4280:
Acken, Kevin P.; Irwin, Mary Jane; Owens, Robert M. (July 1998). "A Parallel ASIC Architecture for
Efficient Fractal Image Coding".
4202:
1336:(VLSI) computer-chip fabrication technology in the 1970s until about 1986, speed-up in computer architecture was driven by doubling
126:
Parallel computers can be roughly classified according to the level at which the hardware supports parallelism, with multi-core and
6184:
149:
In some cases parallelism is transparent to the programmer, such as in bit-level or instruction-level parallelism, but explicitly
6534:
6396:
2714:
and a "Program
Distributor" to dispatch and collect data to and from independent processing units connected to a central memory.
2510:) and economics have taken advantage of parallel computing. Common types of problems in parallel computing applications include:
2202:
2196:
3671:
6540:
5317:
2882:
2768:
2116:(GPGPU) is a fairly recent trend in computer engineering research. GPUs are co-processors that have been heavily optimized for
4731:
5943:
4980:
4952:
4807:
4776:
4384:
4256:
4161:
3778:
3455:
3339:
3191:
3166:
3091:
2963:
1797:
A symmetric multiprocessor (SMP) is a computer system with multiple identical processors that share memory and connect via a
1153:
if they rarely or never have to communicate. Embarrassingly parallel applications are considered the easiest to parallelize.
173:
between the different subtasks are typically some of the greatest obstacles to getting optimal parallel program performance.
5357:
1779:
on the other hand includes a single execution unit in the same processing unit and can issue one instruction at a time from
6725:
6473:
5584:
3621:
3073:
2018:
702:
be two program segments. Bernstein's conditions describe when the two are independent and can be executed in parallel. For
17:
212:
on one computer. Only one instruction may execute at a time—after that instruction is finished, the next one is executed.
5607:
1220:
2759:
SIMD parallel computers can be traced back to the 1970s. The motivation behind early SIMD computers was to amortize the
6193:
5496:
4002:
1129:
1117:
964:
940:
239:
170:
5352:
1170:
created one of the earliest classification systems for parallel (and sequential) computers and programs, now known as
5602:
5579:
4898:
4840:
3304:
3267:
3242:
3141:
3053:
2364:, where one part of a program promises to deliver a required datum to another part of a program at some future time.
2333:
1215:
1199:
295:
processors, which is generally cited as the end of frequency scaling as the dominant computer architecture paradigm.
5058:
Rodriguez, C.; Villagra, M.; Baran, B. (29 August 2008). "Asynchronous team algorithms for
Boolean Satisfiability".
2167:
specification, which is a framework for writing programs that execute across platforms consisting of CPUs and GPUs.
1972:
Grid computing is the most distributed form of parallel computing. It makes use of computers communicating over the
6565:
6245:
6189:
5181:
2360:(MPI) is the most widely used message-passing system API. One concept used in programming parallel programs is the
1658:
system, which keeps track of cached values and strategically purges them, thus ensuring correct program execution.
1643:
system, in which the memory is not physically distributed. A system that does not have this property is known as a
1597:
1519:) are two of the most common techniques for implementing out-of-order execution and instruction-level parallelism.
862:
812:
762:
592:
3471:
1489:. They usually combine this feature with pipelining and thus can issue more than one instruction per clock cycle (
6425:
6298:
6229:
6164:
6087:
5574:
5389:
2718:
2017:
Within parallel computing, there are specialized parallel devices that remain niche areas of interest. While not
1394:. It takes five clock cycles to complete one instruction and thus the processor can issue subscalar performance (
1194:
972:
6798:
6603:
6366:
5996:
5681:
5595:
5544:
5124:
4411:
3916:
3733:
2877:
2638:
2632:
2524:
2433:
1915:
1290:
1246:
139:
5161:
3956:
3932:
3928:
3912:
6793:
6381:
6371:
6149:
5905:
5739:
5590:
5277:
2040:
2033:
1450:
different instructions at different stages of completion and thus can issue one instruction per clock cycle (
1381:
1262:
1241:
69:
28:
4430:(PDF). University of California, Berkeley. Technical Report No. UCB/EECS-2006-183. See table on pages 17–19.
6765:
6745:
6675:
6618:
6580:
6570:
6530:
6455:
6391:
6361:
6288:
6277:
6174:
6154:
6129:
6092:
5016:
1333:
2767:
over multiple instructions. In 1964, Slotnick had proposed building a massively parallel computer for the
6788:
6720:
6483:
6450:
6345:
6321:
6283:
6263:
6159:
6068:
6046:
6031:
5924:
5870:
5330:
5174:
4702:. Virginia Tech/Norfolk State University, Interactive Learning with a Digital Library in Computer Science
3597:
2703:
1874:
1763:
1236:
1105:
928:
In this example, there are no dependencies between the instructions, so they can all be run in parallel.
541:
is the percentage of the execution time of the whole task concerning the parallelizable part of the task
188:, which states that it is limited by the fraction of time for which the parallelization can be utilised.
3820:
3646:
2144:
1550:
and basic block vectorization. It is distinct from loop vectorization algorithms in that it can exploit
393:
processing elements, which flattens out into a constant value for large numbers of processing elements.
208:
is constructed and implemented as a serial stream of instructions. These instructions are executed on a
6667:
6653:
6560:
6520:
6445:
6351:
6331:
6198:
6077:
6011:
5849:
5644:
5529:
5491:
5341:
5231:
4496:
Dobel, B., Hartig, H., & Engel, M. (2012) "Operating system support for redundant multithreading".
4355:
3831:
3505:
2907:
2753:
2729:
introduced the D825 in 1962, a four-processor computer that accessed up to 16 memory modules through a
2487:
2475:
2337:
2120:
processing. Computer graphics processing is a field dominated by data parallel operations—particularly
1852:
1121:
936:
225:
81:
4030:
3409:
6760:
6525:
6435:
6415:
6401:
5865:
5844:
5789:
5676:
5666:
5639:
5501:
5143:
2547:
2537:
2479:
2469:
2405:
2400:
2357:
2303:
1792:
1644:
1620:
1584:
968:
127:
4427:
3700:
3556:
3388:
3032:
2983:
6740:
6700:
6643:
6575:
6313:
6144:
5819:
5445:
5384:
5297:
2912:
2646:
2113:
2029:
1860:
963:. However, "threads" is generally accepted as a generic term for subtasks. Threads will often need
956:
1995:
software makes use of "spare cycles", performing computations at times when a computer is idling.
6750:
6730:
6671:
6658:
6638:
6465:
6202:
6106:
6064:
5935:
5880:
5875:
5734:
5325:
4529:
4323:
2922:
2748:
system, a symmetric multiprocessor system capable of running up to eight processors in parallel.
2685:
2557:
2425:
2291:
1977:
1776:
1733:
1551:
1543:
1150:
1125:
1023:
932:
925:
1: function NoDep(a, b) 2: c := a * b 3: d := 3 * b 4: e := a + b 5: end function
677:. No program can run more quickly than the longest chain of dependent calculations (known as the
299:
209:
275:
being switched per clock cycle (proportional to the number of transistors whose inputs change),
6710:
6685:
6679:
6623:
6585:
6273:
6268:
6220:
6115:
6016:
5988:
5979:
5619:
5551:
5455:
5347:
5302:
5000:
4890:
3383:
2892:
2056:
1885:
supercomputers are clusters. The remaining are
Massively Parallel Processors, explained below.
1474:
1439:
1427:
1415:
1403:
1391:
1094:
976:
5082:
Sechin, A.; Parallel Computing in Photogrammetry. GIM International. #1, 2016, pp. 21–23.
4926:
3289:
Proceedings of the April 18-20, 1967, spring joint computer conference on - AFIPS '67 (Spring)
6612:
6608:
6550:
6502:
6072:
5711:
5671:
5624:
5614:
5409:
5272:
5211:
2897:
2872:
2726:
2373:
1890:
1815:
1750:
1636:
1632:
1588:
1311:
955:. Some parallel computer architectures use smaller, lightweight versions of threads known as
526:
89:
65:
5060:
Bio-Inspired Models of Network, Information and Computing Systems, 2007. Bionetics 2007. 2nd
4832:
3725:
946:
6755:
6735:
6695:
6497:
6356:
6225:
6212:
5966:
5651:
5539:
5534:
5524:
5511:
5307:
5139:
4878:
4856:
4289:
3285:"Validity of the single processor approach to achieving large scale computing capabilities"
2667:
2642:
2619:
2421:
2417:
2361:
2341:
2124:
1822:
1727:
1500:
678:
501:{\displaystyle S_{\text{latency}}(s)={\frac {1}{1-p+{\frac {p}{s}}}}={\frac {s}{s+p(1-s)}}}
303:
235:
217:
154:
116:
93:
4799:
4793:
4768:
4762:
8:
6690:
6628:
6440:
6420:
6406:
6138:
6006:
6001:
5814:
5769:
5569:
5435:
5151:
4472:
4199:
3111:
2711:
2603:
2588:
2568:
2329:
1992:
1856:
1844:
1691:
1609:
1596:, distributed shared memory space can be implemented using the programming model such as
1180:
1171:
1162:
960:
166:
61:
4293:
3863:
3527:
6460:
6430:
6376:
6235:
6134:
6026:
5839:
5688:
5661:
5486:
5450:
5440:
5399:
5241:
5221:
5071:
4973:
Divided consciousness: multiple controls in human thought and action (expanded edition)
4919:
4883:
4608:
4305:
4262:
4167:
4146:
2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS)
4078:
3788:
3544:
3423:
Bernstein, Arthur J. (1 October 1966). "Analysis of Programs for Parallel Processing".
3401:
3310:
2837:
2814:
2572:
2563:
2222:
1870:
1627:
Computer architectures in which each element of main memory can be accessed with equal
1580:
1512:
952:
674:
657:, whereas Gustafson's law assumes that the total amount of work to be done in parallel
583:
575:
315:
201:
150:
131:
120:
6663:
6555:
6410:
6386:
6326:
6293:
6255:
6240:
6179:
5885:
5561:
5519:
5414:
4976:
4948:
4894:
4836:
4825:
4803:
4772:
4600:
4407:
4380:
4252:
4171:
4157:
4153:
4119:
4070:
3774:
3451:
3335:
3314:
3300:
3263:
3238:
3187:
3162:
3137:
3087:
3081:
3049:
2994:
2959:
2932:
2831:
2760:
2706:
announced the first computer architecture specifically designed for parallelism, the
2582:
2553:
2507:
2117:
1516:
1279:
1137:
311:
307:
292:
231:
197:
85:
5075:
4612:
4266:
4082:
3821:
The Microprocessor Ten Years From Now: What Are The Challenges, How Do We Meet Them?
3405:
3334:(Anniversary ed., repr. with corr., 5. ed.). Reading, Mass. : Addison-Wesley.
3006:
6545:
6477:
6341:
6082:
5895:
5694:
5629:
5476:
5292:
5287:
5282:
5251:
5063:
4860:
4592:
4501:
4464:
4456:
4309:
4297:
4244:
4149:
4109:
4062:
3742:
3721:
3428:
3393:
3292:
3069:
2855:
2780:
2654:
2607:
2593:
2234:
2210:
1903:
1834:
1679:
1528:
1369:
1345:
1337:
1294:
1167:
1101:
1090:
1031:
1027:
915:
Consider the following functions, which demonstrate several kinds of dependencies:
396:
The potential speedup of an algorithm on a parallel computing platform is given by
331:
135:
108:
77:
73:
4098:"Construction of Residue Number System Using Hardware Efficient Diagonal Function"
3944:
3329:
2059:, with which most programmers are familiar. The best known C to HDL languages are
535:
is the speedup in latency of the execution of the parallelizable part of the task;
376:
twice as fast. This will make the computation much faster than by optimizing part
6595:
6469:
6335:
6036:
5759:
5699:
5634:
5481:
5471:
5404:
5394:
5236:
5226:
5067:
4914:
4560:
4423:
4362:
4330:
4241:[Proceedings] 1991 IEEE International Joint Conference on Neural Networks
4206:
4037:
4009:
3963:
3827:
3025:
2990:
2917:
2809:
2734:
2730:
2722:
2695:
2691:
2650:
2599:
2578:
2503:
2345:
2287:
2218:
2090:
2009:
The ubiquity of Internet brought the possibility of large-scale cloud computing.
2004:
1798:
1767:
1699:
1695:
1675:
1671:
1655:
1558:, such as manipulating coordinates, color channels or in loops unrolled by hand.
1504:
1360:
1341:
670:
397:
348:
185:
1647:(NUMA) architecture. Distributed memory systems have non-uniform memory access.
586:
gives a less pessimistic and more realistic assessment of parallel performance:
37:
6647:
6303:
6169:
5890:
5706:
5363:
5256:
4968:
4940:
4580:
4141:
4114:
4097:
2843:
2804:
2725:
discussed the use of parallelism in numerical calculations for the first time.
2530:
2515:
2499:
2428:
for parallelization. A few fully implicit parallel programming languages exist—
2384:
2380:
2277:
2214:
2121:
2079:
2024:
1967:
1802:
1741:
1628:
1593:
1547:
1486:
1283:
1019:
324:
319:
162:
143:
4533:
4301:
4248:
1806:
cost-effective, provided that a sufficient amount of memory bandwidth exists.
1705:
Parallel computers based on interconnected networks need to have some kind of
6782:
6633:
5779:
5656:
4726:
4699:
4604:
4219:
4123:
3432:
2800:
2707:
2673:
2349:
1951:
1933:
1878:
1757:
1576:
1508:
1143:
1089:
One thread will successfully lock variable V, while the other thread will be
355:
99:
84:, but has gained broader interest due to the physical constraints preventing
42:
4505:
4066:
3746:
3296:
2367:
Efforts to standardize parallel programming include an open standard called
2205:(ASIC) approaches have been devised for dealing with parallel applications.
918:
1: function Dep(a, b) 2: c := a * b 3: d := 3 * c 4: end function
153:, particularly those that use concurrency, are more difficult to write than
5379:
4460:
4142:"Hardware Design of Approximate Matrix Multiplier based on FPGA in Verilog"
4074:
3284:
2953:
2772:
2764:
2543:
2160:
2106:
1687:
1683:
1667:
1659:
1605:
1316:
158:
5152:
Lawrence Livermore National Laboratory: Introduction to Parallel Computing
5127:
was created from a revision of this article dated 21 August 2013
4596:
4498:
Proceedings of the Tenth ACM International Conference on Embedded Software
3979:
3161:(3rd ed.). San Francisco, Calif.: International Thomson. p. 43.
3068:
1855:
is more difficult if they are not. The most common type of cluster is the
6515:
5900:
4996:
4209:(PDF). ARL-SR-154, U.S. Army Research Lab. Retrieved on November 7, 2007.
3864:"Exploiting Superword Level Parallelism with Multimedia Instruction Sets"
3448:
Parallel processing and parallel algorithms : theory and computation
3077:
2849:
2021:, they tend to be applicable to only a few classes of parallel problems.
1737:
1698:, fat hypercube (a hypercube with more than one processor at a node), or
1555:
1495:
1109:
947:
Race conditions, mutual exclusion, synchronization, and parallel slowdown
272:
221:
177:
4562:
Parallel Computers 2: Architecture, Programming and Algorithms, Volume 2
3397:
3354:
298:
To deal with the problem of power consumption and overheating the major
34:
Programming paradigm in which many processes are executed simultaneously
27:"Parallelization" redirects here. For parallelization of manifolds, see
3016:
are expensive. New is power is expensive, but transistors are "free".
3013:
2927:
2172:
2152:
2084:
2060:
1984:
1899:
1601:
224:. This led to the design of parallel hardware and software, as well as
2159:. Nvidia has also released specific products for computation in their
1469:
157:
ones, because concurrency introduces several new classes of potential
5774:
5749:
5166:
4581:"Something old: the Gamma 60 the computer that was ahead of its time"
4377:
Modern processor design : fundamentals of superscalar processors
4279:
3816:
2776:
2741:
2677:
2613:
2483:
2437:
2156:
2064:
1947:
1929:
1651:
1463:
1422:
1352:
244:
205:
57:
46:
5156:
4945:
Divided consciousness: multiple controls in human thought and action
4468:
4442:"A Randomized Parallel Sorting Algorithm with an Experimental Study"
4428:"The Landscape of Parallel Computing Research: A View from Berkeley"
3033:"The Landscape of Parallel Computing Research: A View from Berkeley"
1920:
5824:
5804:
5729:
3357:
Structured Parallel Programming: Patterns for Efficient Computation
2902:
2409:
2368:
2148:
2068:
2052:
1973:
1867:
1325:
1140:, can be improved in some cases by software analysis and redesign.
5965:
4053:
Kirkpatrick, Scott (2003). "COMPUTER SCIENCE: Rough Times Ahead".
2790:
2280:
numbers. They are closely related to Flynn's SIMD classification.
2239:
2055:
languages that attempt to emulate the syntax and semantics of the
115:
and concurrency are two different things: a parallel program uses
5829:
5809:
5784:
5419:
3083:
Computer organization and design: the hardware/software interface
2745:
2645:
systems performing the same operation in parallel. This provides
2457:
2441:
2295:
2072:
2048:
1895:
1736:(called "cores") on the same chip. This processor differs from a
1706:
1348:
522:
389:
280:
181:
1458:
processors. The canonical example of a pipelined processor is a
5799:
5794:
5034:
2784:
2737:
was coined to define the limit of speed-up due to parallelism.
2388:
2353:
2244:
2184:
2180:
2164:
2132:
1955:
1882:
1864:
1365:
1321:
184:
of a single program as a result of parallelization is given by
3773:( ed.). San Francisco: Morgan Kaufmann Publ. p. 15.
2101:
2091:
General-purpose computing on graphics processing units (GPGPU)
5021:
In Search of the Miraculous. Fragments of an Unknown Teaching
4095:
3947:
Webopedia computer dictionary. Retrieved on November 7, 2007.
2749:
2429:
2299:
2290:
do include some vector processing instructions, such as with
2176:
2096:
1732:
A multi-core processor is a processor that includes multiple
1386:
343:
288:
4827:
Beyond the Conscious Mind. Unlocking the Secrets of the Self
4365:
refers to MPI as "the dominant HPC communications interface"
4238:
2356:
are two of the most widely used shared memory APIs, whereas
2025:
Reconfigurable computing with field-programmable gate arrays
570:
5834:
5764:
5754:
4921:
Evolution of Consciousness: The Origins of the Way We Think
4567:
Sketch of the Analytical Engine Invented by Charles Babbage
3156:
2984:"Parallel Computing Research at Illinois: The UPCRC Agenda"
2797:
MIT Computer Science and Artificial Intelligence Laboratory
2453:
2445:
2283:
2190:
2163:. The technology consortium Khronos Group has released the
2140:
2044:
1754:
1459:
1267:
4200:
GPUs: An Emerging Platform for General-Purpose Computation
3861:
3257:
3232:
3131:
1839:
1144:
Fine-grained, coarse-grained, and embarrassing parallelism
5744:
5721:
4627:"Architecture Sketch of Bull Gamma 60 -- Mark Smotherman"
4535:
Sketch of the Analytic Engine Invented by Charles Babbage
4379:(1st ed.). Dubuque, Iowa: McGraw-Hill. p. 561.
2951:
2775:, which was the earliest SIMD parallel-computing effort,
2168:
2136:
1943:
1925:
1746:
1639:(UMA) systems. Typically, that can be achieved only by a
4679:
Vol. 1 #1, pp2-10, British Computer Society, April 1958.
4440:
David R., Helman; David A., Bader; JaJa, Joseph (1998).
4324:
Scoping the Problem of DFM in the Semiconductor Industry
3186:. Upper Saddle River, N.J.: Prentice-Hall. p. 235.
3086:(2. ed., 3rd print. ed.). San Francisco: Kaufmann.
2637:
Parallel computing can also be applied to the design of
2416:
Mainstream parallel programming languages remain either
384:
s speedup is greater by ratio, (5 times versus 2 times).
5157:
Designing and Building Parallel Programs, by Ian Foster
3260:
Parallel Programming: for Multicore and Cluster Systems
3235:
Parallel Programming: for Multicore and Cluster Systems
3184:
Digital integrated circuits : a design perspective
3134:
Parallel Programming: for Multicore and Cluster Systems
2649:
in case one component fails, and also allows automatic
1859:, which is a cluster implemented on multiple identical
1615:
1124:. Barriers are typically implemented using a lock or a
5057:
4885:
The Social Brain. Discovering the Networks of the Mind
4700:"The History of the Development of Parallel Computing"
4356:
Sidney Fernbach Award given to MPI inventor Bill Gropp
2147:
respectively. Other GPU programming languages include
4198:
Boggan, Sha'Kia and Daniel M. Pressel (August 2007).
3726:"Some Computer Organizations and Their Effectiveness"
3355:
Michael McCool; James Reinders; Arch Robison (2013).
3331:
The mythical man month essays on software engineering
2321:
List of concurrent and parallel programming languages
865:
815:
765:
595:
408:
4658:. Department of Computer Science, Clemson University
4439:
4374:
3262:. Springer Science & Business Media. p. 3.
3237:. Springer Science & Business Media. p. 2.
3136:. Springer Science & Business Media. p. 1.
1575:(shared between all processing elements in a single
1515:(which is similar to scoreboarding but makes use of
3622:"What's the opposite of "embarrassingly parallel"?"
1116:Many parallel programs require that their subtasks
1113:neither thread can complete, and deadlock results.
338:
4918:
4882:
4824:
4649:
3209:"Intel Halts Development Of 2 New Microprocessors"
2684:The origins of true (MIMD) parallelism go back to
1989:Berkeley Open Infrastructure for Network Computing
1612:physically distributed across multiple I/O nodes.
900:
850:
800:
641:
500:
3572:"Synchronization internals – the semaphore"
2379:The rise of consumer GPUs has led to support for
2314:
2012:
1873:. Beowulf technology was originally developed by
1604:, this external shared memory system is known as
6780:
2075:based on C++ can also be used for this purpose.
1712:
1375:
951:Subtasks in a parallel program are often called
4855:
3159:Computer architecture / a quantitative approach
3157:Hennessy, John L.; Patterson, David A. (2002).
2791:Biological brain as massively parallel computer
1909:
1499:processors. Superscalar processors differ from
1324:, a parallel supercomputing device that joined
1247:Associative processing (predicated/masked SIMD)
327:design) due to thermal and design constraints.
3980:"List Statistics | TOP500 Supercomputer Sites"
3046:Parallel and Concurrent Programming in Haskell
2474:As a computer system grows in complexity, the
1537:
359:Assume that a task has two independent parts,
5951:
5182:
4652:"An Evaluation of the Design of the Gamma 60"
4449:Journal of Parallel and Distributed Computing
3672:"Why a simple test can get parallel slowdown"
2087:stealers.' Now they call us their partners."
1571:Main memory in a parallel computer is either
901:{\displaystyle O_{i}\cap O_{j}=\varnothing .}
851:{\displaystyle I_{i}\cap O_{j}=\varnothing ,}
801:{\displaystyle I_{j}\cap O_{i}=\varnothing ,}
659:varies linearly with the number of processors
642:{\displaystyle S_{\text{latency}}(s)=1-p+sp.}
49:are designed to heavily exploit parallelism.
4404:Encyclopedia of Parallel Computing, Volume 4
3793:: CS1 maint: multiple names: authors list (
2813:theory, which views the biological brain as
2463:
2394:
1786:
1104:locks introduces the possibility of program
959:, while others use bigger versions known as
234:was the dominant reason for improvements in
4798:. New York: Simon & Schuster. pp.
4767:. New York: Simon & Schuster. pp.
4719:
4693:
4691:
4689:
4687:
4685:
4515:
4513:
4139:
4052:
2958:. Redwood City, Calif.: Benjamin/Cummings.
2952:Gottlieb, Allan; Almasi, George S. (1989).
2424:, in which a programmer gives the compiler
2383:, either in graphics APIs (referred to as
1566:
5958:
5944:
5189:
5175:
5015:
4375:Shen, John Paul; Mikko H. Lipasti (2004).
4148:. Madurai, India: IEEE. pp. 496–498.
3714:
3598:"An Introduction to Lock-Free Programming"
3529:Introduction to Operating System Deadlocks
3109:
1678:or an interconnect network of a myriad of
1608:, which is typically built from arrays of
1485:Most modern processors also have multiple
323:performance and efficiency cores (such as
6022:Programming in the large and in the small
4877:
4822:
4185:
4183:
4181:
4113:
3846:
3837:
3525:
3425:IEEE Transactions on Electronic Computers
3422:
3387:
3369:
3105:
3103:
3019:
2888:List of distributed computing conferences
5135:, and does not reflect subsequent edits.
5118:
5035:"Official Neurocluster Brain Model site"
4913:
4682:
4578:
4546:
4544:
4510:
4341:
4339:
4140:Gupta, Ankit; Suneja, Kriti (May 2020).
3890:
3888:
3878:
3876:
3644:
3595:
3450:. New York, NY : Springer. p. 114.
3114:. Lawrence Livermore National Laboratory
2672:
2666:For broader coverage of this topic, see
2238:
2191:Application-specific integrated circuits
2139:releasing programming environments with
2100:
1954:in the world according to the June 2009
1919:
1838:
1809:
1614:
1468:
1421:
1385:
1315:
1305:
569:
354:
342:
98:
80:. Parallelism has long been employed in
36:
4967:
4939:
3669:
2680:, "the most infamous of supercomputers"
2203:application-specific integrated circuit
2197:Application-specific integrated circuit
1721:
1438:All modern processors have multi-stage
729:the output variables, and likewise for
655:independent of the number of processors
14:
6781:
5196:
4791:
4760:
4697:
4585:ACM SIGARCH Computer Architecture News
4178:
3830:(wmv). Distinguished Lecturer talk at
3690:
3495:
3445:
3327:
3282:
3181:
3100:
2883:Content Addressable Parallel Processor
2769:Lawrence Livermore National Laboratory
2493:
2276:are each 64-element vectors of 64-bit
1414:processors. These instructions can be
975:, for example when they must update a
5939:
5170:
4541:
4336:
4282:The Journal of VLSI Signal Processing
3885:
3873:
3768:
3720:
3569:
3258:Thomas Rauber; Gudula RĂĽnger (2013).
3233:Thomas Rauber; Gudula RĂĽnger (2013).
3206:
3132:Thomas Rauber; Gudula RĂĽnger (2013).
1983:Most grid computing applications use
247:programs. However, power consumption
4995:
4751:Patterson and Hennessy, p. 749.
4550:Patterson and Hennessy, p. 753.
4345:Patterson and Hennessy, p. 751.
4021:Hennessy and Patterson, p. 537.
3903:Patterson and Hennessy, p. 714.
3894:Hennessy and Patterson, p. 549.
3882:Patterson and Hennessy, p. 713.
3759:Patterson and Hennessy, p. 748.
3112:"Introduction to Parallel Computing"
2573:brute-force cryptographic techniques
2391:), or in other language extensions.
2228:
2221:machine which uses custom ASICs for
1828:
1128:. One class of algorithms, known as
5162:Internet Parallel Computing Archive
4243:. Vol. 3. pp. 2162–2167.
3526:Tanenbaum, Andrew S. (2002-02-01).
2525:Cooley–Tukey fast Fourier transform
1740:processor, which includes multiple
1522:
1466:processor had a 35-stage pipeline.
1156:
529:of the execution of the whole task;
306:have brought parallel computing to
291:'s May 8, 2004 cancellation of its
251:by a chip is given by the equation
238:from the mid-1980s until 2004. The
24:
5105:
5051:
4322:Kahng, Andrew B. (June 21, 2004) "
3862:Samuel Larsen; Saman Amarasinghe.
3769:Singh, David Culler; J.P. (1997).
2626:
1998:
1242:Pipelined processing (packed SIMD)
1149:times per second, and it exhibits
1130:lock-free and wait-free algorithms
720:be all of the input variables and
25:
6810:
5086:
4720:Anthes, Gry (November 19, 2001).
4675:"Parallel Programming", S. Gill,
4650:Tumlin, Smotherman (2023-08-14).
4223:Lecture Notes in Computer Science
2972:
2616:, a concise message-passing model
1961:
1542:Superword level parallelism is a
1493:). These processors are known as
1454:). These processors are known as
1410:). These processors are known as
1100:Locking multiple variables using
892:
842:
792:
6566:Partitioned global address space
5919:
5918:
5117:
4154:10.1109/ICICCS48265.2020.9121004
4043:. Retrieved on November 7, 2007.
3969:. Retrieved on November 7, 2007.
3834:. Retrieved on November 7, 2007.
2326:Concurrent programming languages
756:are independent if they satisfy
339:Amdahl's law and Gustafson's law
5390:Analysis of parallel algorithms
5027:
5009:
4989:
4961:
4933:
4907:
4871:
4849:
4816:
4785:
4754:
4745:
4713:
4669:
4643:
4619:
4572:
4553:
4523:
4490:
4433:
4417:
4397:
4368:
4348:
4316:
4273:
4232:
4212:
4192:
4133:
4089:
4046:
4024:
4015:
3996:
3972:
3950:
3938:
3922:
3906:
3897:
3855:
3810:
3801:
3762:
3753:
3684:
3663:
3638:
3614:
3589:
3563:
3519:
3498:"Thread Safety for Performance"
3489:
3464:
3439:
3416:
3370:Gustafson, John L. (May 1988).
3363:
3348:
3321:
3276:
3251:
3226:
3207:Flynn, Laurie J. (8 May 2004).
3200:
3175:
2771:. His design was funded by the
2752:, a multi-processor project at
2639:fault-tolerant computer systems
2051:. Several vendors have created
1663:distributed memory systems do.
1446:-stage pipeline can have up to
1372:processors become commonplace.
673:is fundamental in implementing
664:
318:have 10+ core processors. From
134:within a single machine, while
4656:ACONIT Computer History Museum
4426:, et al. (December 18, 2006).
3771:Parallel computer architecture
3734:IEEE Transactions on Computers
3670:Kukanov, Alexey (2008-03-04).
3645:Schwartz, David (2011-08-15).
3150:
3125:
3062:
3038:
3000:
2945:
2878:Concurrency (computer science)
2633:Fault-tolerant computer system
2387:), in dedicated APIs (such as
2315:Parallel programming languages
2041:hardware description languages
2013:Specialized parallel computers
1916:Massively parallel (computing)
1390:A canonical processor without
1300:
612:
606:
574:A graphical representation of
492:
480:
425:
419:
347:A graphical representation of
60:in which many calculations or
13:
1:
5337:Simultaneous and heterogenous
4559:R.W. Hockney, C.R. Jesshope.
3596:Preshing, Jeff (2012-06-08).
3532:. Pearson Education, Informit
3328:Brooks, Frederick P. (1996).
2939:
2408:of a sequential program by a
2112:General-purpose computing on
2039:FPGAs can be programmed with
2034:field-programmable gate array
1713:Classes of parallel computers
1650:Computer systems make use of
1382:Instruction-level parallelism
1376:Instruction-level parallelism
1120:. This requires the use of a
1075:4B: Write back to variable V
1072:4A: Write back to variable V
1012:3B: Write back to variable V
1009:3A: Write back to variable V
191:
29:Parallelization (mathematics)
6093:Uniform Function Call Syntax
5925:Category: Parallel computing
5068:10.1109/BIMNICS.2007.4610083
3647:"What is thread contention?"
2817:. In 1986, Minsky published
1910:Massively parallel computing
1334:very-large-scale integration
1022:. The programmer must use a
7:
6561:Parallel programming models
6535:Concurrent constraint logic
4698:Wilson, Gregory V. (1994).
4579:Bataille, M. (1972-04-01).
3693:"Threading for Performance"
3570:Cecil, David (2015-11-03).
3476:Microsoft Developer Network
3372:"Reevaluating Amdahl's law"
2898:Manchester dataflow machine
2865:
2815:massively parallel computer
2795:In the early 1970s, at the
2704:Compagnie des Machines Bull
2556:problems (such as found in
2338:parallel programming models
2309:
2078:AMD's decision to open its
1863:computers connected with a
1764:Simultaneous multithreading
1561:
1538:Superword level parallelism
1404:instruction per clock cycle
10:
6815:
6654:Metalinguistic abstraction
6521:Automatic mutual exclusion
5232:High-performance computing
4823:Blakeslee, Thomas (1996).
4722:"The Power of Parallelism"
4218:Maslennikov, Oleg (2002).
4115:10.3390/electronics8060694
3852:Culler et al. p. 125.
3843:Culler et al. p. 124.
3832:Carnegie Mellon University
2908:Parallel programming model
2856:George Ivanovich Gurdjieff
2754:Carnegie Mellon University
2665:
2661:
2630:
2523:Spectral methods (such as
2488:high performance computing
2476:mean time between failures
2467:
2398:
2318:
2288:processor instruction sets
2232:
2194:
2183:and others are supporting
2094:
2002:
1965:
1913:
1832:
1813:
1790:
1753:, designed for use in the
1725:
1526:
1379:
1309:
1160:
226:high performance computing
130:computers having multiple
103:Parallelism vs concurrency
82:high-performance computing
26:
6709:
6594:
6526:Choreographic programming
6496:
6312:
6254:
6211:
6114:
6105:
6045:
5987:
5978:
5914:
5866:Automatic parallelization
5858:
5720:
5560:
5510:
5502:Application checkpointing
5464:
5428:
5372:
5316:
5265:
5204:
4249:10.1109/IJCNN.1991.170708
3807:Culler et al. p. 15.
3446:Roosta, Seyed H. (2000).
3376:Communications of the ACM
3012:Old : Power is free, but
2955:Highly parallel computing
2861:Neurocluster Brain Model.
2548:Lattice Boltzmann methods
2480:Application checkpointing
2470:Application checkpointing
2464:Application checkpointing
2406:Automatic parallelization
2401:Automatic parallelization
2395:Automatic parallelization
2358:Message Passing Interface
2304:Streaming SIMD Extensions
2114:graphics processing units
1793:Symmetric multiprocessing
1787:Symmetric multiprocessing
1645:non-uniform memory access
1621:non-uniform memory access
1585:Distributed shared memory
204:. To solve a problem, an
6576:Relativistic programming
4889:. Basic Books. pp.
3433:10.1109/PGEC.1966.264565
3283:Amdahl, Gene M. (1967).
3048:. O'Reilly Media. 2013.
2913:Parallelization contract
2030:Reconfigurable computing
1861:commercial off-the-shelf
1567:Memory and communication
1151:embarrassing parallelism
1067:3B: Add 1 to variable V
1064:3A: Add 1 to variable V
1004:2B: Add 1 to variable V
1001:2A: Add 1 to variable V
92:, mainly in the form of
5881:Embarrassingly parallel
5876:Deterministic algorithm
4792:Minsky, Marvin (1986).
4761:Minsky, Marvin (1986).
4506:10.1145/2380356.2380375
4302:10.1023/A:1008005616596
4067:10.1126/science.1081623
3747:10.1109/TC.1972.5009071
3697:Develop for Performance
3691:Krauss, Kirk J (2018).
3602:Preshing on Programming
3502:Develop for Performance
3496:Krauss, Kirk J (2018).
3472:"Processes and Threads"
3359:. Elsevier. p. 61.
3297:10.1145/1465482.1465560
3182:Rabaey, Jan M. (1996).
2923:Synchronous programming
2807:started developing the
2686:Luigi Federico Menabrea
2658:off-the-shelf systems.
2558:finite element analysis
2292:Freescale Semiconductor
1978:embarrassingly parallel
1777:Temporal multithreading
1473:A canonical five-stage
1426:A canonical five-stage
1344:processor must add two
1237:Array processing (SIMT)
300:central processing unit
210:central processing unit
6586:Structured concurrency
5971:Comparison by language
5596:Associative processing
5552:Non-blocking algorithm
5358:Clustered multi-thread
5113:
5093:Listen to this article
5002:The Future of the Mind
4631:www.feb-patrimoine.com
4461:10.1006/jpdc.1998.1462
3478:. Microsoft Corp. 2018
3427:. EC-15 (5): 757–763.
2893:Loop-level parallelism
2681:
2374:remote procedure calls
2248:
2247:is a vector processor.
2109:
2071:. Specific subsets of
2057:C programming language
1936:
1847:
1624:
1482:
1435:
1399:
1329:
1083:5B: Unlock variable V
1080:5A: Unlock variable V
941:synchronization method
902:
852:
802:
643:
578:
543:before parallelization
502:
385:
352:
104:
50:
6799:Distributed computing
6551:Multitier programming
6367:Interface description
5967:Programming paradigms
5712:Hardware acceleration
5625:Superscalar processor
5615:Dataflow architecture
5212:Distributed computing
5112:
5019:(1992). "Chapter 3".
4831:. Springer. pp.
4597:10.1145/641276.641278
3031:(December 18, 2006).
2873:Computer multitasking
2744:introduced its first
2727:Burroughs Corporation
2676:
2631:Further information:
2538:Barnes–Hut simulation
2520:Sparse linear algebra
2342:algorithmic skeletons
2242:
2104:
1923:
1842:
1816:Distributed computing
1810:Distributed computing
1637:uniform memory access
1618:
1589:memory virtualization
1501:multi-core processors
1472:
1440:instruction pipelines
1425:
1389:
1319:
1312:Bit-level parallelism
1306:Bit-level parallelism
1208:Multiple data streams
903:
853:
803:
644:
573:
503:
358:
346:
304:Multi-core processors
200:has been written for
165:are the most common.
102:
94:multi-core processors
90:computer architecture
40:
6794:Concurrent computing
5591:Pipelined processing
5540:Explicit parallelism
5535:Implicit parallelism
5525:Dataflow programming
5144:More spoken articles
4677:The Computer Journal
4406:by David Padua 2011
3291:. pp. 483–485.
2832:Michael S. Gazzaniga
2828:Thomas R. Blakeslee,
2668:History of computing
2620:Finite-state machine
2604:hidden Markov models
1958:ranking, is an MPP.
1950:, the fifth fastest
1823:concurrent computing
1728:Multi-core processor
1722:Multi-core computing
1619:A logical view of a
1059:2B: Read variable V
1056:2A: Read variable V
1051:1B: Lock variable V
1048:1A: Lock variable V
996:1B: Read variable V
993:1A: Read variable V
863:
813:
763:
593:
406:
236:computer performance
218:engineering sciences
18:Parallel programming
6691:Self-modifying code
6299:Probabilistic logic
6230:Functional reactive
6185:Expression-oriented
6139:Partial application
5815:Parallel Extensions
5620:Pipelined processor
4975:. New York: Wiley.
4947:. New York: Wiley.
4867:. pp. 132–161.
4865:The Integrated Mind
4795:The Society of Mind
4764:The Society of Mind
4734:on January 31, 2008
4478:on 19 November 2012
4294:1998JSPSy..19...97A
3957:Beowulf definition.
3945:What is clustering?
3398:10.1145/42411.42415
3074:Patterson, David A.
2819:The Society of Mind
2763:of the processor's
2641:, particularly via
2602:(such as detecting
2589:Dynamic programming
2569:Combinational logic
2494:Algorithmic methods
2478:usually decreases.
2418:explicitly parallel
2211:UV photolithography
1993:volunteer computing
1932:massively parallel
1889:importantly, a low-
1751:Cell microprocessor
1610:non-volatile memory
1546:technique based on
1368:architectures, did
1332:From the advent of
675:parallel algorithms
380:, even though part
151:parallel algorithms
132:processing elements
6789:Parallel computing
6604:Attribute-oriented
6377:List comprehension
6322:Algebraic modeling
6135:Anonymous function
6027:Design by contract
5997:Jackson structures
5689:Massively parallel
5667:distributed shared
5487:Cache invalidation
5451:Instruction window
5242:Manycore processor
5222:Massively parallel
5217:Parallel computing
5198:Parallel computing
5114:
4879:Gazzaniga, Michael
4857:Gazzaniga, Michael
4361:2011-07-25 at the
4329:2008-01-31 at the
4205:2016-12-25 at the
4036:2013-05-11 at the
4008:2015-01-28 at the
3962:2012-10-10 at the
3826:2008-04-14 at the
3724:(September 1972).
2989:2018-01-11 at the
2838:Robert E. Ornstein
2682:
2583:sorting algorithms
2564:Monte Carlo method
2546:problems (such as
2422:partially implicit
2249:
2223:molecular dynamics
2110:
1937:
1871:local area network
1848:
1766:(of which Intel's
1700:n-dimensional mesh
1625:
1581:distributed memory
1513:Tomasulo algorithm
1483:
1436:
1400:
1338:computer word size
1330:
1291:David A. Patterson
1229:SIMD subcategories
1187:Single data stream
898:
848:
798:
639:
579:
498:
386:
353:
202:serial computation
117:multiple CPU cores
105:
54:Parallel computing
51:
6774:
6773:
6664:Program synthesis
6556:Organic computing
6492:
6491:
6397:Non-English-based
6372:Language-oriented
6150:Purely functional
6101:
6100:
5933:
5932:
5886:Parallel slowdown
5520:Stream processing
5410:Karp–Flatt metric
5110:
5023:. pp. 72–83.
5017:Ouspenskii, Pyotr
4982:978-0-471-80572-4
4954:978-0-471-39602-4
4809:978-0-671-60740-1
4778:978-0-671-60740-1
4386:978-0-07-057064-1
4258:978-0-7803-0227-3
4163:978-1-7281-4876-2
4061:(5607): 668–669.
3780:978-1-55860-343-1
3722:Flynn, Michael J.
3457:978-0-387-98716-3
3341:978-0-201-83595-3
3193:978-0-13-178609-7
3168:978-1-55860-724-8
3093:978-1-55860-428-5
3070:Hennessy, John L.
2995:computer industry
2982:(November 2008).
2965:978-0-8053-0177-9
2933:Vector processing
2781:vector processing
2608:Bayesian networks
2606:and constructing
2554:Unstructured grid
2508:sequence analysis
2229:Vector processors
2118:computer graphics
1829:Cluster computing
1517:register renaming
1280:signal processing
1275:
1274:
1138:parallel slowdown
1087:
1086:
1016:
1015:
671:data dependencies
603:
521:is the potential
496:
460:
457:
416:
312:desktop computers
308:desktop computers
293:Tejas and Jayhawk
232:Frequency scaling
198:computer software
86:frequency scaling
70:instruction-level
16:(Redirected from
6806:
6676:by demonstration
6581:Service-oriented
6571:Process-oriented
6546:Macroprogramming
6531:Concurrent logic
6402:Page description
6392:Natural language
6362:Grammar-oriented
6289:Nondeterministic
6278:Constraint logic
6180:Point-free style
6175:Functional logic
6112:
6111:
6083:Immutable object
6002:Block-structured
5985:
5984:
5960:
5953:
5946:
5937:
5936:
5922:
5921:
5896:Software lockout
5695:Computer cluster
5630:Vector processor
5585:Array processing
5570:Flynn's taxonomy
5477:Memory coherence
5252:Computer network
5191:
5184:
5177:
5168:
5167:
5134:
5132:
5121:
5120:
5111:
5101:
5099:
5094:
5079:
5046:
5045:
5043:
5041:
5031:
5025:
5024:
5013:
5007:
5006:
4993:
4987:
4986:
4965:
4959:
4958:
4937:
4931:
4930:
4924:
4915:Ornstein, Robert
4911:
4905:
4904:
4888:
4875:
4869:
4868:
4853:
4847:
4846:
4830:
4820:
4814:
4813:
4789:
4783:
4782:
4758:
4752:
4749:
4743:
4742:
4740:
4739:
4730:. Archived from
4717:
4711:
4710:
4708:
4707:
4695:
4680:
4673:
4667:
4666:
4664:
4663:
4647:
4641:
4640:
4638:
4637:
4623:
4617:
4616:
4576:
4570:
4557:
4551:
4548:
4539:
4527:
4521:
4517:
4508:
4494:
4488:
4487:
4485:
4483:
4477:
4471:. Archived from
4446:
4437:
4431:
4421:
4415:
4401:
4395:
4394:
4372:
4366:
4352:
4346:
4343:
4334:
4320:
4314:
4313:
4277:
4271:
4270:
4236:
4230:
4216:
4210:
4196:
4190:
4187:
4176:
4175:
4137:
4131:
4130:
4117:
4093:
4087:
4086:
4050:
4044:
4028:
4022:
4019:
4013:
4000:
3994:
3993:
3991:
3990:
3976:
3970:
3954:
3948:
3942:
3936:
3926:
3920:
3910:
3904:
3901:
3895:
3892:
3883:
3880:
3871:
3870:
3868:
3859:
3853:
3850:
3844:
3841:
3835:
3814:
3808:
3805:
3799:
3798:
3792:
3784:
3766:
3760:
3757:
3751:
3750:
3730:
3718:
3712:
3711:
3709:
3708:
3699:. Archived from
3688:
3682:
3681:
3679:
3678:
3667:
3661:
3660:
3658:
3657:
3642:
3636:
3635:
3633:
3632:
3618:
3612:
3611:
3609:
3608:
3593:
3587:
3586:
3584:
3583:
3567:
3561:
3560:
3554:
3550:
3548:
3540:
3538:
3537:
3523:
3517:
3516:
3514:
3513:
3504:. Archived from
3493:
3487:
3486:
3484:
3483:
3468:
3462:
3461:
3443:
3437:
3436:
3420:
3414:
3413:
3408:. Archived from
3391:
3367:
3361:
3360:
3352:
3346:
3345:
3325:
3319:
3318:
3280:
3274:
3273:
3255:
3249:
3248:
3230:
3224:
3223:
3221:
3219:
3204:
3198:
3197:
3179:
3173:
3172:
3154:
3148:
3147:
3129:
3123:
3122:
3120:
3119:
3110:Barney, Blaise.
3107:
3098:
3097:
3066:
3060:
3059:
3042:
3036:
3023:
3017:
3004:
2998:
2976:
2970:
2969:
2949:
2710:. It utilized a
2655:error correction
2600:Graphical models
2594:Branch and bound
2235:Vector processor
2107:Tesla GPGPU card
2032:is the use of a
1904:Gigabit Ethernet
1835:Computer cluster
1734:processing units
1529:Task parallelism
1523:Task parallelism
1492:
1480:
1453:
1433:
1409:
1397:
1396:IPC = 0.2 < 1
1295:John L. Hennessy
1181:Flynn's taxonomy
1177:
1176:
1172:Flynn's taxonomy
1168:Michael J. Flynn
1163:Flynn's taxonomy
1157:Flynn's taxonomy
1118:act in synchrony
1037:
1036:
1032:critical section
1028:mutual exclusion
982:
981:
907:
905:
904:
899:
888:
887:
875:
874:
857:
855:
854:
849:
838:
837:
825:
824:
807:
805:
804:
799:
788:
787:
775:
774:
648:
646:
645:
640:
605:
604:
601:
562:
507:
505:
504:
499:
497:
495:
466:
461:
459:
458:
450:
432:
418:
417:
414:
332:operating system
325:ARM's big.LITTLE
109:computer science
78:task parallelism
21:
6814:
6813:
6809:
6808:
6807:
6805:
6804:
6803:
6779:
6778:
6775:
6770:
6712:
6705:
6596:Metaprogramming
6590:
6506:
6501:
6488:
6470:Graph rewriting
6308:
6284:Inductive logic
6264:Abductive logic
6250:
6207:
6170:Dependent types
6118:
6097:
6069:Prototype-based
6049:
6047:Object-oriented
6041:
6037:Nested function
6032:Invariant-based
5974:
5964:
5934:
5929:
5910:
5854:
5760:Coarray Fortran
5716:
5700:Beowulf cluster
5556:
5506:
5497:Synchronization
5482:Cache coherence
5472:Multiprocessing
5460:
5424:
5405:Cost efficiency
5400:Gustafson's law
5368:
5312:
5261:
5237:Multiprocessing
5227:Cloud computing
5200:
5195:
5148:
5147:
5136:
5130:
5128:
5125:This audio file
5122:
5115:
5106:
5103:
5097:
5096:
5092:
5089:
5054:
5052:Further reading
5049:
5039:
5037:
5033:
5032:
5028:
5014:
5010:
4994:
4990:
4983:
4969:Hilgard, Ernest
4966:
4962:
4955:
4941:Hilgard, Ernest
4938:
4934:
4912:
4908:
4901:
4876:
4872:
4854:
4850:
4843:
4821:
4817:
4810:
4790:
4786:
4779:
4759:
4755:
4750:
4746:
4737:
4735:
4718:
4714:
4705:
4703:
4696:
4683:
4674:
4670:
4661:
4659:
4648:
4644:
4635:
4633:
4625:
4624:
4620:
4577:
4573:
4558:
4554:
4549:
4542:
4530:Menabrea, L. F.
4528:
4524:
4518:
4511:
4495:
4491:
4481:
4479:
4475:
4444:
4438:
4434:
4424:Asanovic, Krste
4422:
4418:
4402:
4398:
4387:
4373:
4369:
4363:Wayback Machine
4353:
4349:
4344:
4337:
4331:Wayback Machine
4321:
4317:
4278:
4274:
4259:
4237:
4233:
4217:
4213:
4207:Wayback Machine
4197:
4193:
4188:
4179:
4164:
4138:
4134:
4094:
4090:
4051:
4047:
4038:Wayback Machine
4031:MPP Definition.
4029:
4025:
4020:
4016:
4010:Wayback Machine
4001:
3997:
3988:
3986:
3978:
3977:
3973:
3964:Wayback Machine
3955:
3951:
3943:
3939:
3931:, p. xix, 1–2.
3927:
3923:
3911:
3907:
3902:
3898:
3893:
3886:
3881:
3874:
3866:
3860:
3856:
3851:
3847:
3842:
3838:
3828:Wayback Machine
3819:(April 2004). "
3815:
3811:
3806:
3802:
3786:
3785:
3781:
3767:
3763:
3758:
3754:
3728:
3719:
3715:
3706:
3704:
3689:
3685:
3676:
3674:
3668:
3664:
3655:
3653:
3643:
3639:
3630:
3628:
3620:
3619:
3615:
3606:
3604:
3594:
3590:
3581:
3579:
3568:
3564:
3552:
3551:
3542:
3541:
3535:
3533:
3524:
3520:
3511:
3509:
3494:
3490:
3481:
3479:
3470:
3469:
3465:
3458:
3444:
3440:
3421:
3417:
3389:10.1.1.509.6892
3368:
3364:
3353:
3349:
3342:
3326:
3322:
3307:
3281:
3277:
3270:
3256:
3252:
3245:
3231:
3227:
3217:
3215:
3205:
3201:
3194:
3180:
3176:
3169:
3155:
3151:
3144:
3130:
3126:
3117:
3115:
3108:
3101:
3094:
3078:Larus, James R.
3067:
3063:
3056:
3044:
3043:
3039:
3026:Asanovic, Krste
3024:
3020:
3005:
3001:
2991:Wayback Machine
2977:
2973:
2966:
2950:
2946:
2942:
2937:
2918:Serializability
2868:
2810:Society of Mind
2793:
2731:crossbar switch
2723:Daniel Slotnick
2712:fork-join model
2696:Charles Babbage
2692:Analytic Engine
2671:
2664:
2651:error detection
2635:
2629:
2627:Fault tolerance
2579:Graph traversal
2544:Structured grid
2504:protein folding
2496:
2472:
2466:
2403:
2397:
2385:compute shaders
2381:compute kernels
2346:message passing
2323:
2317:
2312:
2237:
2231:
2219:RIKEN MDGRAPE-3
2199:
2193:
2099:
2093:
2027:
2019:domain-specific
2015:
2007:
2005:Cloud computing
2001:
1999:Cloud computing
1991:(BOINC). Often
1970:
1964:
1924:A cabinet from
1918:
1912:
1875:Thomas Sterling
1857:Beowulf cluster
1845:Beowulf cluster
1837:
1831:
1818:
1812:
1795:
1789:
1768:Hyper-Threading
1742:execution units
1730:
1724:
1715:
1672:crossbar switch
1656:cache coherency
1569:
1564:
1540:
1531:
1525:
1505:data dependency
1490:
1487:execution units
1478:
1451:
1431:
1407:
1395:
1384:
1378:
1314:
1308:
1303:
1284:systolic arrays
1165:
1159:
1146:
949:
926:
919:
883:
879:
870:
866:
864:
861:
860:
833:
829:
820:
816:
814:
811:
810:
783:
779:
770:
766:
764:
761:
760:
755:
746:
737:
728:
719:
710:
701:
692:
667:
600:
596:
594:
591:
590:
584:Gustafson's law
576:Gustafson's law
556:
550:
520:
470:
465:
449:
436:
431:
413:
409:
407:
404:
403:
388:Optimally, the
341:
196:Traditionally,
194:
171:synchronization
163:race conditions
128:multi-processor
35:
32:
23:
22:
15:
12:
11:
5:
6812:
6802:
6801:
6796:
6791:
6772:
6771:
6769:
6768:
6763:
6758:
6753:
6748:
6743:
6738:
6733:
6728:
6723:
6717:
6715:
6707:
6706:
6704:
6703:
6698:
6693:
6688:
6683:
6661:
6656:
6651:
6641:
6636:
6631:
6626:
6621:
6616:
6606:
6600:
6598:
6592:
6591:
6589:
6588:
6583:
6578:
6573:
6568:
6563:
6558:
6553:
6548:
6543:
6538:
6528:
6523:
6518:
6512:
6510:
6494:
6493:
6490:
6489:
6487:
6486:
6481:
6466:Transformation
6463:
6458:
6453:
6448:
6443:
6438:
6433:
6428:
6423:
6418:
6413:
6404:
6399:
6394:
6389:
6384:
6379:
6374:
6369:
6364:
6359:
6354:
6352:Differentiable
6349:
6339:
6332:Automata-based
6329:
6324:
6318:
6316:
6310:
6309:
6307:
6306:
6301:
6296:
6291:
6286:
6281:
6271:
6266:
6260:
6258:
6252:
6251:
6249:
6248:
6243:
6238:
6233:
6223:
6217:
6215:
6209:
6208:
6206:
6205:
6199:Function-level
6196:
6187:
6182:
6177:
6172:
6167:
6162:
6157:
6152:
6147:
6142:
6132:
6126:
6124:
6109:
6103:
6102:
6099:
6098:
6096:
6095:
6090:
6085:
6080:
6075:
6061:
6059:
6043:
6042:
6040:
6039:
6034:
6029:
6024:
6019:
6014:
6012:Non-structured
6009:
6004:
5999:
5993:
5991:
5982:
5976:
5975:
5963:
5962:
5955:
5948:
5940:
5931:
5930:
5928:
5927:
5915:
5912:
5911:
5909:
5908:
5903:
5898:
5893:
5891:Race condition
5888:
5883:
5878:
5873:
5868:
5862:
5860:
5856:
5855:
5853:
5852:
5847:
5842:
5837:
5832:
5827:
5822:
5817:
5812:
5807:
5802:
5797:
5792:
5787:
5782:
5777:
5772:
5767:
5762:
5757:
5752:
5747:
5742:
5737:
5732:
5726:
5724:
5718:
5717:
5715:
5714:
5709:
5704:
5703:
5702:
5692:
5686:
5685:
5684:
5679:
5674:
5669:
5664:
5659:
5649:
5648:
5647:
5642:
5635:Multiprocessor
5632:
5627:
5622:
5617:
5612:
5611:
5610:
5605:
5600:
5599:
5598:
5593:
5588:
5577:
5566:
5564:
5558:
5557:
5555:
5554:
5549:
5548:
5547:
5542:
5537:
5527:
5522:
5516:
5514:
5508:
5507:
5505:
5504:
5499:
5494:
5489:
5484:
5479:
5474:
5468:
5466:
5462:
5461:
5459:
5458:
5453:
5448:
5443:
5438:
5432:
5430:
5426:
5425:
5423:
5422:
5417:
5412:
5407:
5402:
5397:
5392:
5387:
5382:
5376:
5374:
5370:
5369:
5367:
5366:
5364:Hardware scout
5361:
5355:
5350:
5345:
5339:
5334:
5328:
5322:
5320:
5318:Multithreading
5314:
5313:
5311:
5310:
5305:
5300:
5295:
5290:
5285:
5280:
5275:
5269:
5267:
5263:
5262:
5260:
5259:
5257:Systolic array
5254:
5249:
5244:
5239:
5234:
5229:
5224:
5219:
5214:
5208:
5206:
5202:
5201:
5194:
5193:
5186:
5179:
5171:
5165:
5164:
5159:
5154:
5137:
5123:
5116:
5104:
5091:
5090:
5088:
5087:External links
5085:
5084:
5083:
5080:
5053:
5050:
5048:
5047:
5026:
5008:
4988:
4981:
4960:
4953:
4932:
4906:
4899:
4870:
4861:LeDoux, Joseph
4848:
4841:
4815:
4808:
4784:
4777:
4753:
4744:
4712:
4681:
4668:
4642:
4618:
4571:
4552:
4540:
4522:
4509:
4489:
4432:
4416:
4396:
4385:
4367:
4347:
4335:
4315:
4272:
4257:
4231:
4211:
4191:
4177:
4162:
4132:
4088:
4045:
4023:
4014:
4003:"Interconnect"
3995:
3984:www.top500.org
3971:
3949:
3937:
3921:
3905:
3896:
3884:
3872:
3854:
3845:
3836:
3809:
3800:
3779:
3761:
3752:
3741:(9): 948–960.
3713:
3683:
3662:
3637:
3613:
3588:
3562:
3553:|website=
3518:
3488:
3463:
3456:
3438:
3415:
3412:on 2007-09-27.
3382:(5): 532–533.
3362:
3347:
3340:
3320:
3305:
3275:
3268:
3250:
3243:
3225:
3213:New York Times
3199:
3192:
3174:
3167:
3149:
3142:
3124:
3099:
3092:
3061:
3054:
3037:
3018:
2999:
2971:
2964:
2943:
2941:
2938:
2936:
2935:
2930:
2925:
2920:
2915:
2910:
2905:
2900:
2895:
2890:
2885:
2880:
2875:
2869:
2867:
2864:
2863:
2862:
2859:
2853:
2847:
2844:Ernest Hilgard
2841:
2835:
2829:
2805:Seymour Papert
2792:
2789:
2690:Sketch of the
2663:
2660:
2628:
2625:
2624:
2623:
2617:
2611:
2597:
2591:
2586:
2576:
2566:
2561:
2551:
2541:
2534:-body problems
2528:
2521:
2518:
2516:linear algebra
2500:bioinformatics
2495:
2492:
2468:Main article:
2465:
2462:
2399:Main article:
2396:
2393:
2362:future concept
2319:Main article:
2316:
2313:
2311:
2308:
2278:floating-point
2233:Main article:
2230:
2227:
2195:Main article:
2192:
2189:
2122:linear algebra
2095:Main article:
2092:
2089:
2080:HyperTransport
2026:
2023:
2014:
2011:
2003:Main article:
2000:
1997:
1968:Grid computing
1966:Main article:
1963:
1962:Grid computing
1960:
1914:Main article:
1911:
1908:
1853:load balancing
1833:Main article:
1830:
1827:
1814:Main article:
1811:
1808:
1803:Bus contention
1791:Main article:
1788:
1785:
1726:Main article:
1723:
1720:
1714:
1711:
1594:supercomputers
1568:
1565:
1563:
1560:
1548:loop unrolling
1539:
1536:
1527:Main article:
1524:
1521:
1507:between them.
1479:IPC = 2 > 1
1380:Main article:
1377:
1374:
1359:Historically,
1320:Taiwania 3 of
1310:Main article:
1307:
1304:
1302:
1299:
1273:
1272:
1271:
1270:
1265:
1257:
1256:
1252:
1251:
1250:
1249:
1244:
1239:
1231:
1230:
1226:
1225:
1224:
1223:
1218:
1210:
1209:
1205:
1204:
1203:
1202:
1197:
1189:
1188:
1184:
1183:
1161:Main article:
1158:
1155:
1145:
1142:
1085:
1084:
1081:
1077:
1076:
1073:
1069:
1068:
1065:
1061:
1060:
1057:
1053:
1052:
1049:
1045:
1044:
1041:
1020:race condition
1014:
1013:
1010:
1006:
1005:
1002:
998:
997:
994:
990:
989:
986:
948:
945:
939:or some other
924:
917:
909:
908:
897:
894:
891:
886:
882:
878:
873:
869:
858:
847:
844:
841:
836:
832:
828:
823:
819:
808:
797:
794:
791:
786:
782:
778:
773:
769:
751:
742:
733:
724:
715:
706:
697:
688:
669:Understanding
666:
663:
650:
649:
638:
635:
632:
629:
626:
623:
620:
617:
614:
611:
608:
599:
554:
547:
546:
536:
530:
518:
509:
508:
494:
491:
488:
485:
482:
479:
476:
473:
469:
464:
456:
453:
448:
445:
442:
439:
435:
430:
427:
424:
421:
412:
340:
337:
193:
190:
176:A theoretical
45:such as IBM's
43:supercomputers
33:
9:
6:
4:
3:
2:
6811:
6800:
6797:
6795:
6792:
6790:
6787:
6786:
6784:
6777:
6767:
6764:
6762:
6759:
6757:
6754:
6752:
6749:
6747:
6744:
6742:
6739:
6737:
6736:Data-oriented
6734:
6732:
6729:
6727:
6724:
6722:
6719:
6718:
6716:
6714:
6708:
6702:
6699:
6697:
6694:
6692:
6689:
6687:
6684:
6681:
6677:
6673:
6669:
6665:
6662:
6660:
6657:
6655:
6652:
6649:
6645:
6642:
6640:
6637:
6635:
6634:Homoiconicity
6632:
6630:
6627:
6625:
6622:
6620:
6617:
6614:
6610:
6607:
6605:
6602:
6601:
6599:
6597:
6593:
6587:
6584:
6582:
6579:
6577:
6574:
6572:
6569:
6567:
6564:
6562:
6559:
6557:
6554:
6552:
6549:
6547:
6544:
6542:
6541:Concurrent OO
6539:
6536:
6532:
6529:
6527:
6524:
6522:
6519:
6517:
6514:
6513:
6511:
6509:
6504:
6499:
6495:
6485:
6482:
6479:
6475:
6471:
6467:
6464:
6462:
6459:
6457:
6454:
6452:
6449:
6447:
6444:
6442:
6439:
6437:
6436:Set-theoretic
6434:
6432:
6429:
6427:
6424:
6422:
6419:
6417:
6416:Probabilistic
6414:
6412:
6408:
6405:
6403:
6400:
6398:
6395:
6393:
6390:
6388:
6385:
6383:
6380:
6378:
6375:
6373:
6370:
6368:
6365:
6363:
6360:
6358:
6355:
6353:
6350:
6347:
6343:
6340:
6337:
6333:
6330:
6328:
6325:
6323:
6320:
6319:
6317:
6315:
6311:
6305:
6302:
6300:
6297:
6295:
6292:
6290:
6287:
6285:
6282:
6279:
6275:
6272:
6270:
6267:
6265:
6262:
6261:
6259:
6257:
6253:
6247:
6244:
6242:
6239:
6237:
6234:
6231:
6227:
6224:
6222:
6219:
6218:
6216:
6214:
6210:
6204:
6200:
6197:
6195:
6194:Concatenative
6191:
6188:
6186:
6183:
6181:
6178:
6176:
6173:
6171:
6168:
6166:
6163:
6161:
6158:
6156:
6153:
6151:
6148:
6146:
6143:
6140:
6136:
6133:
6131:
6128:
6127:
6125:
6122:
6117:
6113:
6110:
6108:
6104:
6094:
6091:
6089:
6086:
6084:
6081:
6079:
6076:
6074:
6070:
6066:
6063:
6062:
6060:
6057:
6053:
6048:
6044:
6038:
6035:
6033:
6030:
6028:
6025:
6023:
6020:
6018:
6015:
6013:
6010:
6008:
6005:
6003:
6000:
5998:
5995:
5994:
5992:
5990:
5986:
5983:
5981:
5977:
5972:
5968:
5961:
5956:
5954:
5949:
5947:
5942:
5941:
5938:
5926:
5917:
5916:
5913:
5907:
5904:
5902:
5899:
5897:
5894:
5892:
5889:
5887:
5884:
5882:
5879:
5877:
5874:
5872:
5869:
5867:
5864:
5863:
5861:
5857:
5851:
5848:
5846:
5843:
5841:
5838:
5836:
5833:
5831:
5828:
5826:
5823:
5821:
5818:
5816:
5813:
5811:
5808:
5806:
5803:
5801:
5798:
5796:
5793:
5791:
5788:
5786:
5783:
5781:
5780:Global Arrays
5778:
5776:
5773:
5771:
5768:
5766:
5763:
5761:
5758:
5756:
5753:
5751:
5748:
5746:
5743:
5741:
5738:
5736:
5733:
5731:
5728:
5727:
5725:
5723:
5719:
5713:
5710:
5708:
5707:Grid computer
5705:
5701:
5698:
5697:
5696:
5693:
5690:
5687:
5683:
5680:
5678:
5675:
5673:
5670:
5668:
5665:
5663:
5660:
5658:
5655:
5654:
5653:
5650:
5646:
5643:
5641:
5638:
5637:
5636:
5633:
5631:
5628:
5626:
5623:
5621:
5618:
5616:
5613:
5609:
5606:
5604:
5601:
5597:
5594:
5592:
5589:
5586:
5583:
5582:
5581:
5578:
5576:
5573:
5572:
5571:
5568:
5567:
5565:
5563:
5559:
5553:
5550:
5546:
5543:
5541:
5538:
5536:
5533:
5532:
5531:
5528:
5526:
5523:
5521:
5518:
5517:
5515:
5513:
5509:
5503:
5500:
5498:
5495:
5493:
5490:
5488:
5485:
5483:
5480:
5478:
5475:
5473:
5470:
5469:
5467:
5463:
5457:
5454:
5452:
5449:
5447:
5444:
5442:
5439:
5437:
5434:
5433:
5431:
5427:
5421:
5418:
5416:
5413:
5411:
5408:
5406:
5403:
5401:
5398:
5396:
5393:
5391:
5388:
5386:
5383:
5381:
5378:
5377:
5375:
5371:
5365:
5362:
5359:
5356:
5354:
5351:
5349:
5346:
5343:
5340:
5338:
5335:
5332:
5329:
5327:
5324:
5323:
5321:
5319:
5315:
5309:
5306:
5304:
5301:
5299:
5296:
5294:
5291:
5289:
5286:
5284:
5281:
5279:
5276:
5274:
5271:
5270:
5268:
5264:
5258:
5255:
5253:
5250:
5248:
5245:
5243:
5240:
5238:
5235:
5233:
5230:
5228:
5225:
5223:
5220:
5218:
5215:
5213:
5210:
5209:
5207:
5203:
5199:
5192:
5187:
5185:
5180:
5178:
5173:
5172:
5169:
5163:
5160:
5158:
5155:
5153:
5150:
5149:
5145:
5141:
5126:
5081:
5077:
5073:
5069:
5065:
5061:
5056:
5055:
5036:
5030:
5022:
5018:
5012:
5004:
5003:
4998:
4992:
4984:
4978:
4974:
4970:
4964:
4956:
4950:
4946:
4942:
4936:
4928:
4923:
4922:
4916:
4910:
4902:
4900:9780465078509
4896:
4892:
4887:
4886:
4880:
4874:
4866:
4862:
4858:
4852:
4844:
4842:9780306452628
4838:
4834:
4829:
4828:
4819:
4811:
4805:
4801:
4797:
4796:
4788:
4780:
4774:
4770:
4766:
4765:
4757:
4748:
4733:
4729:
4728:
4727:Computerworld
4723:
4716:
4701:
4694:
4692:
4690:
4688:
4686:
4678:
4672:
4657:
4653:
4646:
4632:
4628:
4622:
4614:
4610:
4606:
4602:
4598:
4594:
4590:
4586:
4582:
4575:
4568:
4564:
4563:
4556:
4547:
4545:
4537:
4536:
4531:
4526:
4516:
4514:
4507:
4503:
4499:
4493:
4474:
4470:
4466:
4462:
4458:
4454:
4450:
4443:
4436:
4429:
4425:
4420:
4413:
4409:
4405:
4400:
4393:
4388:
4382:
4378:
4371:
4364:
4360:
4357:
4351:
4342:
4340:
4332:
4328:
4325:
4319:
4311:
4307:
4303:
4299:
4295:
4291:
4288:(2): 97–113.
4287:
4283:
4276:
4268:
4264:
4260:
4254:
4250:
4246:
4242:
4235:
4228:
4224:
4221:
4215:
4208:
4204:
4201:
4195:
4186:
4184:
4182:
4173:
4169:
4165:
4159:
4155:
4151:
4147:
4143:
4136:
4129:
4125:
4121:
4116:
4111:
4107:
4103:
4099:
4092:
4084:
4080:
4076:
4072:
4068:
4064:
4060:
4056:
4049:
4042:
4039:
4035:
4032:
4027:
4018:
4011:
4007:
4004:
3999:
3985:
3981:
3975:
3968:
3965:
3961:
3958:
3953:
3946:
3941:
3934:
3930:
3925:
3918:
3917:Keidar (2008)
3914:
3909:
3900:
3891:
3889:
3879:
3877:
3865:
3858:
3849:
3840:
3833:
3829:
3825:
3822:
3818:
3813:
3804:
3796:
3790:
3782:
3776:
3772:
3765:
3756:
3748:
3744:
3740:
3736:
3735:
3727:
3723:
3717:
3703:on 2018-05-13
3702:
3698:
3694:
3687:
3673:
3666:
3652:
3651:StackOverflow
3648:
3641:
3627:
3626:StackOverflow
3623:
3617:
3603:
3599:
3592:
3577:
3573:
3566:
3558:
3546:
3531:
3530:
3522:
3508:on 2018-05-13
3507:
3503:
3499:
3492:
3477:
3473:
3467:
3459:
3453:
3449:
3442:
3434:
3430:
3426:
3419:
3411:
3407:
3403:
3399:
3395:
3390:
3385:
3381:
3377:
3373:
3366:
3358:
3351:
3343:
3337:
3333:
3332:
3324:
3316:
3312:
3308:
3306:9780805301779
3302:
3298:
3294:
3290:
3286:
3279:
3271:
3269:9783642378010
3265:
3261:
3254:
3246:
3244:9783642378010
3240:
3236:
3229:
3214:
3210:
3203:
3195:
3189:
3185:
3178:
3170:
3164:
3160:
3153:
3145:
3143:9783642378010
3139:
3135:
3128:
3113:
3106:
3104:
3095:
3089:
3085:
3084:
3079:
3075:
3071:
3065:
3057:
3055:9781449335922
3051:
3047:
3041:
3034:
3030:
3027:
3022:
3015:
3011:
3008:
3003:
2996:
2992:
2988:
2985:
2981:
2975:
2967:
2961:
2957:
2956:
2948:
2944:
2934:
2931:
2929:
2926:
2924:
2921:
2919:
2916:
2914:
2911:
2909:
2906:
2904:
2901:
2899:
2896:
2894:
2891:
2889:
2886:
2884:
2881:
2879:
2876:
2874:
2871:
2870:
2860:
2857:
2854:
2851:
2848:
2845:
2842:
2839:
2836:
2833:
2830:
2827:
2826:
2825:
2822:
2820:
2816:
2812:
2811:
2806:
2802:
2801:Marvin Minsky
2798:
2788:
2786:
2782:
2778:
2774:
2770:
2766:
2762:
2757:
2755:
2751:
2747:
2743:
2738:
2736:
2732:
2728:
2724:
2720:
2715:
2713:
2709:
2705:
2700:
2698:
2697:
2693:
2687:
2679:
2675:
2669:
2659:
2656:
2652:
2648:
2644:
2640:
2634:
2621:
2618:
2615:
2612:
2609:
2605:
2601:
2598:
2595:
2592:
2590:
2587:
2584:
2580:
2577:
2574:
2570:
2567:
2565:
2562:
2559:
2555:
2552:
2549:
2545:
2542:
2539:
2535:
2533:
2529:
2526:
2522:
2519:
2517:
2513:
2512:
2511:
2509:
2505:
2501:
2491:
2489:
2485:
2481:
2477:
2471:
2461:
2459:
2455:
2451:
2447:
2443:
2439:
2435:
2431:
2427:
2423:
2420:or (at best)
2419:
2414:
2411:
2407:
2402:
2392:
2390:
2386:
2382:
2377:
2375:
2370:
2365:
2363:
2359:
2355:
2351:
2350:POSIX Threads
2347:
2343:
2339:
2335:
2331:
2327:
2322:
2307:
2305:
2301:
2297:
2293:
2289:
2285:
2281:
2279:
2275:
2271:
2267:
2263:
2259:
2255:
2246:
2241:
2236:
2226:
2224:
2220:
2216:
2212:
2206:
2204:
2198:
2188:
2186:
2182:
2178:
2174:
2170:
2166:
2162:
2158:
2154:
2150:
2146:
2142:
2138:
2134:
2128:
2126:
2123:
2119:
2115:
2108:
2103:
2098:
2088:
2086:
2081:
2076:
2074:
2070:
2066:
2062:
2058:
2054:
2050:
2046:
2042:
2037:
2035:
2031:
2022:
2020:
2010:
2006:
1996:
1994:
1990:
1986:
1981:
1979:
1975:
1969:
1959:
1957:
1953:
1952:supercomputer
1949:
1945:
1941:
1935:
1934:supercomputer
1931:
1927:
1922:
1917:
1907:
1905:
1901:
1897:
1892:
1886:
1884:
1881:. 87% of all
1880:
1879:Donald Becker
1876:
1872:
1869:
1866:
1862:
1858:
1854:
1846:
1841:
1836:
1826:
1824:
1817:
1807:
1804:
1800:
1794:
1784:
1782:
1778:
1774:
1769:
1765:
1761:
1759:
1758:PlayStation 3
1756:
1752:
1748:
1743:
1739:
1735:
1729:
1719:
1710:
1708:
1703:
1701:
1697:
1693:
1689:
1685:
1681:
1677:
1673:
1669:
1664:
1661:
1657:
1653:
1648:
1646:
1642:
1641:shared memory
1638:
1635:are known as
1634:
1630:
1622:
1617:
1613:
1611:
1607:
1603:
1599:
1595:
1590:
1586:
1582:
1578:
1577:address space
1574:
1573:shared memory
1559:
1557:
1553:
1549:
1545:
1544:vectorization
1535:
1530:
1520:
1518:
1514:
1510:
1509:Scoreboarding
1506:
1502:
1498:
1497:
1488:
1476:
1471:
1467:
1465:
1461:
1457:
1449:
1445:
1441:
1429:
1424:
1420:
1417:
1413:
1405:
1393:
1388:
1383:
1373:
1371:
1367:
1362:
1357:
1354:
1350:
1347:
1343:
1339:
1335:
1327:
1323:
1318:
1313:
1298:
1296:
1292:
1289:According to
1287:
1285:
1281:
1269:
1266:
1264:
1261:
1260:
1259:
1258:
1254:
1253:
1248:
1245:
1243:
1240:
1238:
1235:
1234:
1233:
1232:
1228:
1227:
1222:
1219:
1217:
1214:
1213:
1212:
1211:
1207:
1206:
1201:
1198:
1196:
1193:
1192:
1191:
1190:
1186:
1185:
1182:
1179:
1178:
1175:
1173:
1169:
1164:
1154:
1152:
1141:
1139:
1133:
1131:
1127:
1123:
1119:
1114:
1111:
1107:
1103:
1098:
1096:
1092:
1082:
1079:
1078:
1074:
1071:
1070:
1066:
1063:
1062:
1058:
1055:
1054:
1050:
1047:
1046:
1042:
1039:
1038:
1035:
1033:
1029:
1025:
1021:
1011:
1008:
1007:
1003:
1000:
999:
995:
992:
991:
987:
984:
983:
980:
978:
974:
970:
967:access to an
966:
962:
958:
954:
944:
942:
938:
934:
929:
923:
916:
913:
895:
889:
884:
880:
876:
871:
867:
859:
845:
839:
834:
830:
826:
821:
817:
809:
795:
789:
784:
780:
776:
771:
767:
759:
758:
757:
754:
750:
745:
741:
736:
732:
727:
723:
718:
714:
709:
705:
700:
696:
691:
687:
682:
680:
679:critical path
676:
672:
662:
660:
656:
636:
633:
630:
627:
624:
621:
618:
615:
609:
597:
589:
588:
587:
585:
577:
572:
568:
566:
560:
553:
544:
540:
537:
534:
531:
528:
524:
517:
514:
513:
512:
489:
486:
483:
477:
474:
471:
467:
462:
454:
451:
446:
443:
440:
437:
433:
428:
422:
410:
402:
401:
400:
399:
394:
391:
383:
379:
375:
370:
366:
362:
357:
350:
345:
336:
333:
328:
326:
321:
317:
313:
309:
305:
301:
296:
294:
290:
286:
282:
278:
274:
270:
266:
262:
258:
254:
250:
246:
245:compute-bound
241:
237:
233:
229:
227:
223:
219:
213:
211:
207:
203:
199:
189:
187:
183:
179:
174:
172:
168:
167:Communication
164:
160:
159:software bugs
156:
152:
147:
145:
141:
137:
133:
129:
124:
122:
118:
114:
110:
101:
97:
95:
91:
87:
83:
79:
75:
71:
67:
63:
59:
56:is a type of
55:
48:
44:
39:
30:
19:
6776:
6741:Event-driven
6507:
6145:Higher-order
6073:Object-based
5465:Coordination
5395:Amdahl's law
5331:Simultaneous
5216:
5197:
5059:
5038:. Retrieved
5029:
5020:
5011:
5001:
4997:Kaku, Michio
4991:
4972:
4963:
4944:
4935:
4920:
4909:
4884:
4873:
4864:
4851:
4826:
4818:
4794:
4787:
4763:
4756:
4747:
4736:. Retrieved
4732:the original
4725:
4715:
4704:. Retrieved
4676:
4671:
4660:. Retrieved
4655:
4645:
4634:. Retrieved
4630:
4621:
4591:(2): 10–15.
4588:
4584:
4574:
4566:
4561:
4555:
4534:
4525:
4497:
4492:
4480:. Retrieved
4473:the original
4452:
4448:
4435:
4419:
4403:
4399:
4390:
4376:
4370:
4350:
4318:
4285:
4281:
4275:
4240:
4234:
4229:p. 272.
4226:
4222:
4214:
4194:
4145:
4135:
4127:
4105:
4101:
4091:
4058:
4054:
4048:
4040:
4026:
4017:
3998:
3987:. Retrieved
3983:
3974:
3966:
3952:
3940:
3933:Peleg (2000)
3929:Lynch (1996)
3924:
3913:Ghosh (2007)
3908:
3899:
3857:
3848:
3839:
3812:
3803:
3770:
3764:
3755:
3738:
3732:
3716:
3705:. Retrieved
3701:the original
3696:
3686:
3675:. Retrieved
3665:
3654:. Retrieved
3650:
3640:
3629:. Retrieved
3625:
3616:
3605:. Retrieved
3601:
3591:
3580:. Retrieved
3575:
3565:
3534:. Retrieved
3528:
3521:
3510:. Retrieved
3506:the original
3501:
3491:
3480:. Retrieved
3475:
3466:
3447:
3441:
3424:
3418:
3410:the original
3379:
3375:
3365:
3356:
3350:
3330:
3323:
3288:
3278:
3259:
3253:
3234:
3228:
3216:. Retrieved
3212:
3202:
3183:
3177:
3158:
3152:
3133:
3127:
3116:. Retrieved
3082:
3064:
3045:
3040:
3028:
3021:
3009:
3002:
2979:
2974:
2954:
2947:
2823:
2818:
2808:
2794:
2773:US Air Force
2765:control unit
2758:
2739:
2735:Amdahl's law
2716:
2701:
2694:Invented by
2689:
2683:
2636:
2531:
2497:
2473:
2415:
2404:
2378:
2366:
2324:
2282:
2273:
2269:
2265:
2261:
2257:
2253:
2250:
2225:simulation.
2207:
2200:
2161:Tesla series
2129:
2127:operations.
2111:
2077:
2038:
2028:
2016:
2008:
1982:
1971:
1942:
1938:
1887:
1849:
1819:
1796:
1780:
1772:
1762:
1731:
1716:
1704:
1670:) memory, a
1665:
1660:Bus snooping
1649:
1626:
1606:burst buffer
1570:
1541:
1532:
1494:
1484:
1455:
1447:
1443:
1437:
1411:
1401:
1358:
1331:
1288:
1276:
1166:
1147:
1134:
1115:
1099:
1088:
1017:
965:synchronized
950:
930:
927:
920:
914:
910:
752:
748:
743:
739:
734:
730:
725:
721:
716:
712:
707:
703:
698:
694:
689:
685:
683:
668:
665:Dependencies
658:
654:
651:
580:
564:
558:
557:< 1/(1 -
551:
548:
542:
538:
532:
515:
510:
398:Amdahl's law
395:
387:
381:
377:
373:
368:
364:
360:
349:Amdahl's law
329:
297:
284:
276:
268:
264:
260:
256:
252:
248:
230:
214:
195:
186:Amdahl's law
175:
148:
125:
112:
106:
53:
52:
6751:Intentional
6731:Data-driven
6713:of concerns
6672:Inferential
6659:Multi-stage
6639:Interactive
6516:Actor-based
6503:distributed
6446:Stack-based
6246:Synchronous
6203:Value-level
6190:Applicative
6107:Declarative
6065:Class-based
5901:Scalability
5662:distributed
5545:Concurrency
5512:Programming
5353:Cooperative
5342:Speculative
5278:Instruction
4925:. pp.
4102:Electronics
4041:PC Magazine
3967:PC Magazine
3578:. AspenCore
3014:transistors
2850:Michio Kaku
2432:, Parallel
2215:Moore's law
1948:Blue Gene/L
1930:Blue Gene/L
1738:superscalar
1674:, a shared
1668:multiplexed
1556:inline code
1552:parallelism
1496:superscalar
1301:Granularity
1110:atomic lock
1095:reliability
1026:to provide
320:Moore's law
273:capacitance
222:meteorology
178:upper bound
161:, of which
113:parallelism
58:computation
47:Blue Gene/P
6783:Categories
6726:Components
6711:Separation
6686:Reflective
6680:by example
6624:Extensible
6498:Concurrent
6474:Production
6461:Templating
6441:Simulation
6426:Scientific
6346:Spacecraft
6274:Constraint
6269:Answer set
6221:Flow-based
6121:comparison
6116:Functional
6088:Persistent
6052:comparison
6017:Procedural
5989:Structured
5980:Imperative
5906:Starvation
5645:asymmetric
5380:PRAM model
5348:Preemptive
5140:Audio help
5131:2013-08-21
4738:2008-01-08
4706:2008-01-08
4662:2023-08-14
4636:2023-08-14
4482:26 October
4412:0387097651
4227:2328/2002:
4108:(6): 694.
3989:2018-08-05
3817:Patt, Yale
3707:2018-05-10
3677:2015-02-15
3656:2018-05-10
3631:2018-05-10
3607:2018-05-10
3582:2018-05-10
3536:2018-05-10
3512:2018-05-10
3482:2018-05-10
3118:2007-11-09
2978:S.V. Adve
2940:References
2928:Transputer
2761:gate delay
2719:John Cocke
2647:redundancy
2622:simulation
2426:directives
2153:PeakStream
2145:Stream SDK
1985:middleware
1980:problems.
1900:InfiniBand
1682:including
1680:topologies
1602:Infiniband
1491:IPC > 1
1416:re-ordered
1408:IPC < 1
1102:non-atomic
1091:locked out
933:semaphores
220:, such as
192:Background
155:sequential
6613:Inductive
6609:Automatic
6431:Scripting
6130:Recursive
5640:symmetric
5385:PEM model
5062:: 66–69.
4605:0163-5964
4500:, 83–92.
4172:219990653
4124:2079-9292
3915:, p. 10.
3789:cite book
3555:ignored (
3545:cite book
3384:CiteSeerX
3315:195607370
2777:ILLIAC IV
2742:Honeywell
2740:In 1969,
2702:In 1957,
2678:ILLIAC IV
2614:HBJ model
2581:(such as
2571:(such as
2536:(such as
2484:core dump
2450:Mitrion-C
2438:SequenceL
2340:(such as
2330:libraries
2157:RapidMind
2105:Nvidia's
2065:Impulse C
2061:Mitrion-C
1783:threads.
1775:threads.
1696:hypercube
1633:bandwidth
1475:pipelined
1464:Pentium 4
1428:pipelined
1412:subscalar
1353:carry bit
1126:semaphore
1043:Thread B
1040:Thread A
988:Thread B
985:Thread A
971:or other
961:processes
893:∅
877:∩
843:∅
827:∩
793:∅
777:∩
622:−
487:−
441:−
206:algorithm
66:bit-level
62:processes
6766:Subjects
6756:Literate
6746:Features
6701:Template
6696:Symbolic
6668:Bayesian
6648:Hygienic
6508:parallel
6387:Modeling
6382:Low-code
6357:End-user
6294:Ontology
6226:Reactive
6213:Dataflow
5871:Deadlock
5859:Problems
5825:pthreads
5805:OpenHMPP
5730:Ateji PX
5691:computer
5562:Hardware
5429:Elements
5415:Slowdown
5326:Temporal
5308:Pipeline
5142: ·
5076:15185219
5040:July 22,
4999:(2014).
4971:(1986).
4943:(1977).
4917:(1992).
4881:(1985).
4863:(1978).
4613:34642285
4532:(1842).
4469:1903/835
4455:: 1–23.
4414:page 265
4359:Archived
4327:Archived
4267:61094111
4203:Archived
4083:60622095
4075:12560537
4034:Archived
4006:Archived
3960:Archived
3824:Archived
3576:Embedded
3406:33937392
3080:(1999).
3007:Asanovic
2987:Archived
2903:Manycore
2866:See also
2708:Gamma 60
2688:and his
2643:lockstep
2442:System C
2410:compiler
2369:OpenHMPP
2310:Software
2264:, where
2201:Several
2149:BrookGPU
2069:Handel-C
2053:C to HDL
2043:such as
1974:Internet
1868:Ethernet
1781:multiple
1773:multiple
1718:common.
1562:Hardware
1511:and the
1392:pipeline
1349:integers
1328:research
1326:COVID-19
1255:See also
1106:deadlock
977:variable
973:resource
937:barriers
314:, while
267:, where
182:speed-up
136:clusters
6721:Aspects
6629:Generic
6619:Dynamic
6478:Pattern
6456:Tactile
6421:Quantum
6411:filters
6342:Command
6241:Streams
6236:Signals
6007:Modular
5830:RaftLib
5810:OpenACC
5785:GPUOpen
5775:C++ AMP
5750:Charm++
5492:Barrier
5436:Process
5420:Speedup
5205:General
5129: (
5100:minutes
4310:2976028
4290:Bibcode
4055:Science
3935:, p. 1.
2746:Multics
2662:History
2596:methods
2458:Verilog
2434:Haskell
2306:(SSE).
2296:AltiVec
2073:SystemC
2049:Verilog
1896:Myrinet
1891:latency
1707:routing
1629:latency
1452:IPC = 1
1432:IPC = 1
1122:barrier
953:threads
602:latency
555:latency
527:latency
523:speedup
519:latency
415:latency
390:speedup
367:. Part
316:servers
281:voltage
271:is the
240:runtime
180:on the
121:threads
6484:Visual
6451:System
6336:Action
6160:Strict
5923:
5800:OpenCL
5795:OpenMP
5740:Chapel
5657:shared
5652:Memory
5587:(SIMT)
5530:Models
5441:Thread
5373:Theory
5344:(SpMT)
5298:Memory
5283:Thread
5266:Levels
5074:
4979:
4951:
4897:
4839:
4806:
4775:
4611:
4603:
4410:
4383:
4308:
4265:
4255:
4170:
4160:
4122:
4081:
4073:
3777:
3454:
3404:
3386:
3338:
3313:
3303:
3266:
3241:
3218:5 June
3190:
3165:
3140:
3090:
3052:
3029:et al.
3010:et al.
2980:et al.
2962:
2785:Cray-1
2514:Dense
2456:, and
2389:OpenCL
2354:OpenMP
2336:, and
2272:, and
2245:Cray-1
2185:OpenCL
2181:Nvidia
2165:OpenCL
2155:, and
2133:Nvidia
2125:matrix
2085:socket
2067:, and
1956:TOP500
1883:Top500
1865:TCP/IP
1652:caches
1579:), or
1456:scalar
1370:64-bit
1366:x86-64
1346:16-bit
1322:Taiwan
969:object
957:fibers
711:, let
582:case,
549:Since
511:where
283:, and
142:, and
76:, and
41:Large
6761:Roles
6644:Macro
6407:Pipes
6327:Array
6304:Query
6256:Logic
6165:GADTs
6155:Total
6078:Agent
5770:Dryad
5735:Boost
5456:Array
5446:Fiber
5360:(CMT)
5333:(SMT)
5247:GPGPU
5072:S2CID
4891:77–79
4609:S2CID
4476:(PDF)
4445:(PDF)
4306:S2CID
4263:S2CID
4168:S2CID
4079:S2CID
3867:(PDF)
3729:(PDF)
3402:S2CID
3311:S2CID
2750:C.mmp
2502:(for
2446:FPGAs
2444:(for
2430:SISAL
2300:Intel
2177:Intel
2173:Apple
2097:GPGPU
1902:, or
1361:4-bit
1342:8-bit
1108:. An
289:Intel
144:grids
6409:and
6056:list
5835:ROCm
5765:CUDA
5755:Cilk
5722:APIs
5682:COMA
5677:NUMA
5608:MIMD
5603:MISD
5580:SIMD
5575:SISD
5303:Loop
5293:Data
5288:Task
5042:2017
4977:ISBN
4949:ISBN
4895:ISBN
4837:ISBN
4804:ISBN
4773:ISBN
4601:ISSN
4484:2012
4408:ISBN
4381:ISBN
4354:The
4253:ISBN
4158:ISBN
4120:ISSN
4071:PMID
3795:link
3775:ISBN
3739:C-21
3557:help
3452:ISBN
3336:ISBN
3301:ISBN
3264:ISBN
3239:ISBN
3220:2012
3188:ISBN
3163:ISBN
3138:ISBN
3088:ISBN
3050:ISBN
2960:ISBN
2803:and
2721:and
2653:and
2506:and
2454:VHDL
2352:and
2334:APIs
2298:and
2284:Cray
2243:The
2143:and
2141:CUDA
2135:and
2045:VHDL
1877:and
1755:Sony
1692:tree
1688:ring
1684:star
1631:and
1598:PGAS
1587:and
1460:RISC
1293:and
1268:MPMD
1263:SPMD
1221:MIMD
1216:SIMD
1200:MISD
1195:SISD
1024:lock
747:and
693:and
684:Let
363:and
169:and
140:MPPs
74:data
6314:DSL
5850:ZPL
5845:TBB
5840:UPC
5820:PVM
5790:MPI
5745:HPX
5672:UMA
5273:Bit
5064:doi
4833:6–7
4593:doi
4502:doi
4465:hdl
4457:doi
4298:doi
4245:doi
4150:doi
4110:doi
4063:doi
4059:299
3743:doi
3429:doi
3394:doi
3293:doi
2448:),
2302:'s
2294:'s
2169:AMD
2137:AMD
2047:or
1946:'s
1944:IBM
1928:'s
1926:IBM
1799:bus
1749:'s
1747:IBM
1676:bus
1554:of
525:in
330:An
279:is
107:In
6785::
6678:,
6674:,
6670:,
6476:,
6472:,
6201:,
6192:,
6071:,
6067:,
6054:,
5098:54
5070:.
4893:.
4859:;
4835:.
4802:.
4800:29
4771:.
4769:17
4724:.
4684:^
4654:.
4629:.
4607:.
4599:.
4587:.
4583:.
4569:".
4543:^
4512:^
4463:.
4453:52
4451:.
4447:.
4389:.
4338:^
4304:.
4296:.
4286:19
4284:.
4261:.
4251:.
4225:,
4180:^
4166:.
4156:.
4144:.
4126:.
4118:.
4104:.
4100:.
4077:.
4069:.
4057:.
3982:.
3887:^
3875:^
3791:}}
3787:{{
3737:.
3731:.
3695:.
3649:.
3624:.
3600:.
3574:.
3549::
3547:}}
3543:{{
3500:.
3474:.
3400:.
3392:.
3380:31
3378:.
3374:.
3309:.
3299:.
3287:.
3211:.
3102:^
3076:;
3072:;
2799:,
2787:.
2699:.
2490:.
2460:.
2452:,
2440:,
2436:,
2376:.
2348:.
2332:,
2328:,
2268:,
2260:Ă—
2256:=
2187:.
2179:,
2175:,
2171:,
2151:,
2063:,
1906:.
1898:,
1843:A
1801:.
1702:.
1694:,
1690:,
1686:,
1481:).
1434:).
1398:).
1097:.
943:.
935:,
738:.
661:.
382:B'
263:Ă—
259:Ă—
255:=
228:.
138:,
111:,
96:.
72:,
68:,
6682:)
6666:(
6650:)
6646:(
6615:)
6611:(
6537:)
6533:(
6505:,
6500:,
6480:)
6468:(
6348:)
6344:(
6338:)
6334:(
6280:)
6276:(
6232:)
6228:(
6141:)
6137:(
6123:)
6119:(
6058:)
6050:(
5973:)
5969:(
5959:e
5952:t
5945:v
5190:e
5183:t
5176:v
5146:)
5138:(
5133:)
5102:)
5095:(
5078:.
5066::
5044:.
5005:.
4985:.
4957:.
4929:.
4927:2
4903:.
4845:.
4812:.
4781:.
4741:.
4709:.
4665:.
4639:.
4615:.
4595::
4589:1
4504::
4486:.
4467::
4459::
4312:.
4300::
4292::
4269:.
4247::
4174:.
4152::
4112::
4106:8
4085:.
4065::
4012:.
3992:.
3919:.
3869:.
3797:)
3783:.
3749:.
3745::
3710:.
3680:.
3659:.
3634:.
3610:.
3585:.
3559:)
3539:.
3515:.
3485:.
3460:.
3435:.
3431::
3396::
3344:.
3317:.
3295::
3272:.
3247:.
3222:.
3196:.
3171:.
3146:.
3121:.
3096:.
3058:.
2968:.
2858:,
2852:,
2846:,
2840:,
2834:,
2670:.
2610:)
2585:)
2575:)
2560:)
2550:)
2540:)
2532:N
2527:)
2274:C
2270:B
2266:A
2262:C
2258:B
2254:A
1821:"
1448:N
1444:N
1406:(
896:.
890:=
885:j
881:O
872:i
868:O
846:,
840:=
835:j
831:O
822:i
818:I
796:,
790:=
785:i
781:O
772:j
768:I
753:j
749:P
744:i
740:P
735:j
731:P
726:i
722:O
717:i
713:I
708:i
704:P
699:j
695:P
690:i
686:P
637:.
634:p
631:s
628:+
625:p
619:1
616:=
613:)
610:s
607:(
598:S
565:p
561:)
559:p
552:S
545:.
539:p
533:s
516:S
493:)
490:s
484:1
481:(
478:p
475:+
472:s
468:s
463:=
455:s
452:p
447:+
444:p
438:1
434:1
429:=
426:)
423:s
420:(
411:S
378:B
374:A
369:B
365:B
361:A
285:F
277:V
269:C
265:F
261:V
257:C
253:P
249:P
31:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.