676:
used in thumb mode). Any GPRs can propagate and store multiple instructions independently in smaller code size that is small enough to be able to fit in one register and its architectural register act as a table and shared with all decoder/instructions with simple bank switching between decoders. The major difference between ARM and other designs is that ARM allows to run on the same general-purpose register with quick bank switching without requiring additional register file in superscalar. Despite x86 sharing the same mechanism with ARM that its GPRs can store any data individually, x86 will confront data dependency if more than three non-related instructions are stored, as its GPRs per file are too small (eight in 32 bit mode and 16 in 64 bit, compared to ARM's 13 in 32 bit and 31 in 64 bit) for data, and it is impossible to have superscalar without multiple register files to feed to its decoder (x86 code is big and complex compared to ARM). Because most x86's front-ends have become much larger and much more power hungry than the ARM processor in order to be competitive (example: Pentium M & Core 2 Duo, Bay Trail). Some third-party x86 equivalent processors even became noncompetitive with ARM due to having no dedicated register-file architecture. Particularly for AMD, Cyrix and VIA that cannot bring any reasonable performance without register renaming and out of order execution, which leave only Intel Atom to be the only in-order x86 processor core in the mobile competition. This was until the x86 Nehalem processor merged both of its integer and floating point register into one single file, and the introduction of a large physical register table and enhanced allocator table in its front-end before renaming in its out-of-order internal core.
564:
floating-point). This was done in order to solve the register bottleneck that existed in the x86 architecture after micro-operation fusion is introduced, but it is still have 8 entries 32 bit architectural registers for total 32 bytes in capacity per file (segment register and instruction pointer remain within the file, though they are inaccessible by program) as speculative file. The second file is served as a scaled shadow register file, which without context switch the scaled file cannot store some instruction independently. Some instruction from SSE2/SSE3/SSSE3 require this feature for integer operation, for example instruction like PSHUFB, PMADDUBSW, PHSUBW, PHSUBD, PHSUBSW, PHADDW, PHADDD, PHADDSW would require loading EAX/EBX/ECX/EDX from both register files, though it was uncommon for an x86 processor to make use of another register file with the same instruction. Most of time, the second file is served as a scale retired file. The
Pentium M architecture still has one dual-ported floating-point register file (8 entries MM/XMM) shared with three decoders, and the FP register file does not have a shadow register file along with it, as its shadow-register-file architecture did not including floating-point functions. In processors after P6, the architectural register files are external and located in the processor's backend after the retired file, as opposed to the internal register file located in the inner core for register renaming/reorder buffer. However, in Core 2 it is now housed within a unit called the "register alias table" (RAT), located with instruction allocator but have same size of register size as retirement.
560:, all register files do not require additional cycle to propagate the data, register files like architectural and floating point are located between code buffer and decoders, called "retire buffer", Reorder buffer and OoOE and connected within the ring bus (16 bytes). The register file itself still remains one x86 register file and one x87 stack and both serve as retirement storing. Its x86 register file was enlarged to dual-ported to increase bandwidth for result storage. Registers like debug/condition code/control/unnamed/flag were stripped from the main register file and placed into individual files between the micro-op ROM and instruction sequencer. Only inaccessible registers like the segment register are now separated from the general-purpose register file (except the instruction pointer); they are now located between the scheduler and instruction allocator, in order to facilitate register renaming and out-of-order execution. The x87 stack was later merged with the floating-point register file after a 128-bit XMM register debuted in Pentium III, but the XMM register file is still located separately from x86 integer register files.
656:. Like early AMD designs, most of the x86 manufacturers like Cyrix, VIA, DM&P, and SIS used the same mechanism as well, resulting in a lack of integer performance without register renaming for their in-order CPU. Companies like Cyrix and AMD had to increase cache size in hope to reduce the bottleneck. AMD's SSE integer operation work in a different way than Core 2 and Pentium 4; it uses its separate renaming integer register to load the value directly before the decode stage. Though theoretically it will only need a shorter pipeline than Intel's SSE implementation, but generally the cost of branch prediction are much greater and higher missing rate than Intel, and it would have to take at least two cycles for its SSE instruction to be executed regardless of instruction wide, as early AMDs implementations could not execute both FP and Int in an SSE instruction set like Intel's implementation did.
496:(EV6), for instance, was the first large micro-architecture to implement a "Shadow Register File Architecture". It had two copies of the integer register file and two copies of the floating point register located in its front end (future and scaled file, each containing 2 read and 2 write ports), and took an extra cycle to propagate data between the two during a context switch. The issuing logic attempted to reduce the number of operations forwarding data between the two and greatly improved its integer performance, and helped reduce the impact of the limited number of general-purpose registers in superscalar architectures with speculative execution. This design was later adapted by
594:
eight-entries 64 bit shadow based register and an eight-entries 64 bit unnamed register that are now separated from main GPRs unlike the original P5 design and located after the execution unit, and the file of these registers is single-ported and not expose to instruction like scaled shadow register file found on Core/Core2 (shadow register file are made of architectural registers and
Bonnell did not due to not have "Shadow Register File Architecture"), however the file can be use for renaming purpose due to lack of out of order execution found on Bonnell architecture. It also had one copy of XMM floating point register file per thread. The difference from
569:
one instruction pointer as they are unable to be access in the file by any code/instruction) in total file size and expanded to 16 entries in x64 for total 128 bytes size per file. From
Pentium M as its pipeline port and decoder increased, but they're located with allocator table instead of code buffer. Its FP XMM register file are also increase to quad-ported (2 read/2 write), register still remain 8 entries in 32 bit and extended to 16 entries in x64 mode and number still remain 1 as its shadow-register-file architecture is not including floating point/SSE functions.
705:, in which the 5-bit architectural names of the registers actually point into a window on a much larger register file, with hundreds of entries. Implementing multiported register files with hundreds of entries requires a large area. The register window slides by 16 registers when moved, so that each architectural register name can refer to only a small number of registers in the larger array, e.g. architectural register r20 can only refer to physical registers #20, #36, #52, #68, #84, #100, #116, if there are just seven windows in the physical file.
638:'s early design like K6 do not have a register file like Intel and do not support "Shadow Register File Architecture" as its lack of context switch and bypass inverter that are necessary require for a register file to function appropriately. Instead they use a separate GPRs that directly link to a rename register table for its OoOE CPU with a dedicated integer decoder and floating decoder. The mechanism is similar to Intel's pre-Pentium processor line. For example, the
351:
25:
417:
689:
benefits from replication and subsetting the read ports. At the limit, this technique would place a stack of 1-write, 2-read regfiles at the inputs to each functional unit. Since regfiles with a small number of ports are often dominated by transistor area, it is best not to push this technique to this limit, but it is useful all the same.
544:, a typical Pentium-compatible x86 processor is integrated with one copy of a single-port architectural register file containing 6 general-purpose registers, 4 control registers, 8 debug registers (two reserved), 1 stack pointer register, 1 stack base register, 1 instruction pointer, 1 flags register, and 6 segment registers.
576:
and later processors, both integer and floating point registers are now incorporated into a unified octa-ported (six read and two write) general-purpose register file (8 + 8 in 32-bit and 16 + 16 in x64 per file), while the register file extended to 2 with enhanced "Shadow
Register File Architecture"
568:
increased the inner ring bus to 24 bytes (allow more than 3 instructions to be decoded) and extended its register file from dual-ported (one read/one write) to quad-ported (two read/two write), register still remain 8 entries in 32 bit and 32 bytes (not including 6 segment register and
511:
uses multiple register files as well. The R8000 floating-point unit had two copies of the floating-point register file, each with four write and four read ports, and wrote both copies at the same time with a context switch. However, it did not support integer operations, and the integer register file
519:
uses a "Shadow
Register File Architecture" as well for its high-end line. It has up to 4 copies of integer register files (future, retired, scaled, and scratched, each containing 7 read 4 write port) and 2 copies of the floating point register file. However, unlike Alpha and x86, they are located in
472:
Most register files make no special provisions to prevent multiple write ports from writing to the same entry simultaneously. Instead, the instruction scheduling hardware ensures that only one instruction in any particular cycle writes a particular entry. If multiple instructions targeting the same
631:
microarchitecture), on the other hand, does not have a register file for its decoder, as its x86 GPRs didn't exist within its structure, due to the introduction of a physical unified renaming register file (similar to Sandy Bridge, but slightly different due to the inability of
Pentium 4 to use the
179:
ARM processors have both banked and unbanked registers. While all modes always share the same physical registers for the first eight general-purpose registers, R0 to R7, the physical register which the banked registers, R8 to R14, point to depends on the operating mode the processor is in. Notably,
277:
There is one decoder per read or write port. If the array has four read and two write ports, for example, it has 6 word lines per bit cell in the array, and six AND gates per row in the decoder. Note that the decoder has to be pitch matched to the array, which forces those AND gates to be wide and
716:
register renaming mapping file, which stores a 6-bit virtual register number for each of the physical registers. In the renaming file, the renaming state is checkpointed whenever a branch is taken, so that when a branch is detected to be mispredicted, the old renaming state can be recovered in a
708:
To save area, some SPARC implementations implement a 32-entry register file, in which each cell has seven "bits". Only one is read and writeable through the external ports, but the contents of the bits can be rotated. A rotation accomplishes in a single cycle a movement of the register window.
675:
processor on the other hand does not integrate multiple register files to load/fetch instructions. ARM GPRs have no special purpose to the instruction set (the ARM ISA does not require accumulator, index, and stack/base points. Registers do not have an accumulator and base/stack point can only be
642:
processor has four int (one eight-entries temporary scratched register file + one eight-entries future register file + one eight-entries fetched register file + an eight-entries unnamed register file) and two FP rename register files (two eight-entries x87 ST file one goes fadd and one goes fmov)
563:
Later P6 implementations (Pentium M, Yonah) introduced a "Shadow
Register File Architecture" that expanded to 2 copies of dual-ported integer architectural register files and consist with context switch (between future and retired file and scaled file using the same trick used between integer and
523:
IBM uses the same mechanism as many major microprocessors, deeply merging the register file with the decoder, but its register files work independently of the decoder side and do not involve context switching, which is different from Alpha and x86. Most of its register files do not only serve its
647:
included "shadow register" in its front end, it's scaled up to 40 entries unified register file for in order integer operation before decoded, the register file contain 8 entries scratch register + 16 future GPRs register file + 16 unnamed GPRs register file. In later AMD designs it abandons the
484:
that it serves. Pitch matching avoids having many busses passing over the datapath turn corners, which would use a lot of area. But since every unit must have the same bit pitch, every unit in the datapath ends up with the bit pitch forced by the widest unit, which can waste area in the other
607:
also each have only one general-purpose integer register file, but the
Larrabee has up to 16 XMM register files (8 entries per file), and the Xeon Phi has up to 128 AVX-512 register files, each containing 32 512-bit ZMM registers for vector instruction storage, which can be as big as L2 cache.
688:
can arrange for each functional unit to write to a subset of the physical register file. This arrangement can eliminate the need for multiple write ports per bit cell, for large savings in area. The resulting register file, effectively a stack of register files with single write ports, then
190:
processors use context switching and fast interrupts for switching between instruction, decoder, GPRs and register files, if there is more than one, before the instruction is issued, but this only exists on processors that support superscalar execution. However, context switching is a totally
476:
The crossed inverters take some finite time to settle after a write operation, during which a read operation will either take longer or return garbage. It is common to have bypass multiplexers that bypass written data to the read ports when a simultaneous read and write to the same entry is
228:
Register files have one word line per entry per port, one bit line per bit of width per read port, and two bit lines per bit of width per write port. Each bit cell also has a Vdd and Vss. Therefore, the wire pitch area increases as the square of the number of ports, and the transistor area
593:
line was the modern simplified revision of P5. It includes single copies of register file share with thread and decoder. The register file is a dual-port design, 8/16 entries GPRS, 8/16 entries debug register and 8/16 entries condition code are integrated in the same file. However it has an
632:
register before naming) for attempting to replace the architectural register file and skip the x86 decoding scheme. Instead it uses SSE for integer execution and storage before the ALU and after result, SSE2/SSE3/SSSE3 use the same mechanism as well for its integer operation.
488:
Area can sometimes be saved on machines with multiple units in a datapath by having two datapaths side-by-side, each of which has smaller bit pitch than a single datapath would have. This case usually forces multiple copies of a register file, one for each datapath.
551:
register were virtually simulated from x87 stack and require x86 register to supplying MMX instruction and aliases to exist stack. On P6, the instruction independently can be stored and executed in parallel in early pipeline stages before decoding into
585:
and onward replaced shadow register table and architectural registers with much large and yet more advance physical register file before decoding to the reorder buffer. Randered that Sandy Bridge and onward no longer carry an architectural register.
322:
Read bit lines often swing only a fraction of the way to Vdd or Vss. A sense amplifier converts this small-swing signal into a full logic level. Small swing signals are faster because the bit line has little drive but a great deal of parasitic
539:
processor line, a typical pre-486 CPU did not have an individual register file, as all general purpose registers worked directly with the decoder, and the x87 push stack was located within the floating-point unit itself. Starting with the
520:
the backend as a retire unit right after its out-of-order unit and renaming register files. The shadow registers do not load instructions during instruction fetching and decoding stages and a context switch is unnecessary in this design.
246:
In principle, any operation that could be done with a 64-bit-wide register file with many read and write ports could be done with a single 8-bit-wide register file with a single read port and a single write port. However, the
598:
is
Bonnell do not have a unified register file and has no dedicated register file for its hyper threading. Instead, Bonnell uses a separate rename register for its thread despite it is not out of order. Similar to Bonnell,
225:, which convert low-swing read bitlines into full-swing logic levels, are usually at the bottom (by convention). Larger register files are then sometimes constructed by tiling mirrored and rotated simple arrays.
258:. Occasionally it is slightly wider in order to attach "extra" bits to each register, such as the poison bit. If the width of the data word is different than the width of an address—or in some cases, such as the
229:
increases linearly. At some point, it may be smaller and/or faster to have multiple redundant register files, with smaller numbers of read ports, rather than a single register file with all the read ports. The
251:
of wide register files with many ports allows them to run much faster and thus, they can do operations in a single cycle that would take many cycles with fewer ports or a narrower bit width or both.
652:, it has three int register files and two SSE register files that are located in the physical register file directly linked with GPRs. However, it scales down to one integer + one floating-point on
329:
If Vdd is a horizontal line, it can be switched off, by yet another decoder, if any of the write ports are writing that line during that cycle. This optimization increases the speed of the write.
485:
units. Register files, because they have two wires per bit per write port, and because all the bit lines must contact the silicon at every bit cell, can often set the pitch of a datapath.
326:
Write bit lines may be braided, so that they couple equally to the nearby read bitlines. Because write bitlines are full swing, they can cause significant disturbances on read bitlines.
211:
164:
with multiple ports. Such RAMs are distinguished by having dedicated read and write ports, whereas ordinary multiported SRAMs will usually read and write through the same ports.
288:
240:
unit, for example, had a 9 read 4 write port 32 entry 64-bit register file implemented in a 0.7 ÎĽm process, which could be seen when looking at the chip from arm's length.
243:
Two popular approaches to dividing registers into multiple register files are the distributed register file configuration and the partitioned register file configuration.
137:
of a CPU will almost always define a set of registers which are used to stage data between memory and the functional units on the chip. The register file is part of the
623:-compatible or reverse-engineered early 80x86 processors. Therefore, most of them don't have a register file for their decoders, but their GPRs are used individually.
477:
commanded. These bypass multiplexers are often part of a larger bypass network that forwards results which have not yet been committed between functional units.
1878:
790:
291:
A typical register file – "triple-ported", able to read from 2 registers and write to 1 register simultaneously – is made of bit cells like this one.
184:(FIQ) mode has its own bank of registers for R8 to R12, with the architecture also providing a private stack pointer (R13) for every interrupt mode.
749:
850:
217:
The usual layout convention is that a simple array is read out vertically. That is, a single word line, which runs horizontally, causes a row of
1989:
1172:
671:
that only allows one register file to load/fetch one operand at the time; it would require multiple register files to achieve superscale. The
1691:
528:
has up to 8 instruction decoders, but up to 32 register files of 32 general purpose registers each (4 read and 4 write ports) to facilitate
969:
1848:
1414:
1231:
2202:
1194:
1843:
1915:
153:, so that the mapping of which physical entry stores a particular architectural register changes dynamically during execution.
2197:
1668:
89:
2612:
1736:
999:
843:
709:
Because most of the wires accomplishing the state movement are local, tremendous bandwidth is possible with little power.
61:
2622:
1763:
890:
512:
still remained as such. Later, shadow register files were abandoned in newer designs in favor of the embedded market.
168:
is the method of using a single name to access multiple different physical registers depending on the operating mode.
1930:
1758:
1731:
1110:
459:
398:
108:
68:
380:
2745:
2308:
1201:
1167:
1162:
1081:
1046:
262:, even when they are the same width—the address registers are in a separate register file than the data registers.
149:
correspond one-for-one to the entries in a physical register file (PRF) within the CPU. More complicated CPUs use
643:
that directly link with its x86 EAX for integer renaming and XMM0 register for floating point renaming, but later
532:, as its parallel instructions cannot be used across any other register file due to the lack of a context switch.
2720:
2617:
2018:
1925:
1726:
947:
836:
2781:
1746:
1465:
900:
441:
376:
372:
75:
46:
42:
1920:
1768:
1741:
1602:
1216:
1177:
1034:
653:
176:
Register files may be clubbed together as register banks. A processor may have more than one register bank.
2786:
2357:
2119:
1595:
1556:
1211:
1206:
1140:
952:
600:
138:
134:
57:
1984:
1681:
1379:
1076:
595:
590:
573:
529:
2791:
2634:
2281:
1698:
1189:
1157:
927:
915:
895:
237:
2725:
2688:
2678:
1066:
432:
648:
shadow register design and favored to K6 architecture with individual GPRs direct link design. Like
611:
There are some other of Intel's x86 lines that don't have a register file in their internal design,
2740:
2147:
2083:
2060:
1910:
1872:
1708:
1658:
1653:
1130:
1024:
932:
565:
361:
2693:
2476:
2370:
2334:
2251:
2235:
2077:
1866:
1825:
1813:
1676:
1590:
1511:
1276:
937:
880:
818:
365:
130:
35:
2499:
2471:
2381:
2346:
2095:
2089:
2071:
1805:
1799:
1703:
1607:
1498:
1437:
1299:
942:
557:
181:
2673:
2582:
2328:
2040:
1858:
1617:
1585:
1543:
1455:
1256:
1071:
1061:
1051:
1041:
1011:
994:
859:
730:
548:
333:
248:
2703:
2639:
2225:
1947:
1837:
1784:
1316:
1029:
885:
867:
82:
8:
2750:
2735:
2555:
2406:
2388:
2352:
2340:
1994:
1941:
1718:
1634:
1516:
1371:
1266:
1125:
802:
2607:
2599:
2451:
2426:
2230:
2105:
1629:
1570:
1450:
1182:
910:
157:
126:
2560:
2527:
2443:
2375:
2276:
2266:
2256:
2187:
2182:
2177:
2100:
2029:
1935:
1895:
1528:
1478:
1428:
1404:
1286:
1226:
1221:
1103:
1019:
718:
685:
668:
508:
501:
427:
305:
Data is written by shorting one side or the other to ground through a two-NMOS stack.
230:
199:
150:
2730:
2663:
2649:
2504:
2411:
2365:
2172:
2167:
2162:
2157:
2152:
2142:
2012:
1979:
1890:
1885:
1794:
1646:
1641:
1624:
1612:
1551:
1115:
1093:
979:
957:
875:
672:
553:
255:
202:
use bits in the program status word to select the currently active register bank.
2644:
2629:
2577:
2481:
2456:
2293:
2286:
2137:
2132:
2127:
2066:
1974:
1964:
1686:
1521:
1473:
1236:
1120:
1088:
989:
984:
905:
702:
578:
222:
2755:
2589:
2572:
2565:
2461:
2318:
2055:
1969:
1900:
1483:
1445:
1394:
1389:
1384:
1098:
922:
2775:
2550:
1506:
1488:
1281:
974:
822:
762:
1409:
254:
The width in bits of the register file is usually the number of bits in the
2760:
2698:
2514:
2491:
2303:
2024:
962:
582:
2545:
2509:
2220:
2192:
2050:
1905:
828:
660:
493:
141:
and visible to the programmer, as opposed to the concept of transparent
2431:
2421:
2416:
2398:
2298:
2271:
1533:
1366:
1336:
1056:
819:
Register file design considerations in dynamically scheduled processors
649:
581:
and each thread uses independent register files for its decoder. Later
332:
Techniques that reduce the energy used by register files are useful in
308:
So: read ports take one transistor per bit cell, write ports take four.
161:
473:
register are issued, all but one have their write enables turned off.
2522:
2519:
2261:
1331:
1309:
624:
319:
Read bit lines are often precharged to something between Vdd and Vss.
142:
350:
24:
2537:
1356:
628:
616:
604:
481:
218:
1346:
1304:
620:
612:
541:
195:
191:
different mechanism to ARM's register bank within the registers.
271:
The decoder is often broken into pre-decoder and decoder proper.
210:
1361:
1326:
1291:
713:
644:
639:
525:
287:
1819:
1351:
1321:
698:
664:
524:
dedicated decoder, but up to the thread level. For example,
516:
497:
259:
233:
160:-based register files are usually implemented by way of fast
750:
Wikibooks: Microprocessor Design/Register File#Register Bank
2683:
1831:
1751:
1341:
791:"Compiler Strategies for Transport Triggered Architectures"
274:
The decoder is a series of AND gates that drive word lines.
1271:
1261:
635:
536:
187:
803:"Energy efficient asymmetrically ported register files"
556:
and renaming in out-of-order execution. Beginning with
221:
to put their data on bit lines, which run vertically.
316:
Sharing lines between cells, for example, Vdd and Vss.
49:. Unsourced material may be challenged and removed.
480:The register file is usually pitch-matched to the
302:Data is read out by NMOS transistor to a bit line.
547:One copy of 8 x87 FP push down stack by default,
2773:
844:
1849:Computer performance by orders of magnitude
504:and some of the later x86 implementations.
379:. Unsourced material may be challenged and
858:
851:
837:
171:
805:by Aneesh Aggarwal and M. Franklin. 2003.
785:
783:
619:and many embedded processors that aren't
460:Learn how and when to remove this message
399:Learn how and when to remove this message
109:Learn how and when to remove this message
286:
16:Working storage in a computer processor
2774:
780:
832:
299:State is stored in pair of inverters.
1820:Floating-point operations per second
679:
410:
377:adding citations to reliable sources
344:
340:
47:adding citations to reliable sources
18:
763:"ARM Architecture Reference Manual"
712:This same technique is used in the
692:
572:In later x86 implementations, like
13:
209:
14:
2803:
812:
312:Many optimizations are possible:
295:The basic scheme for a bit cell:
205:
2746:Semiconductor device fabrication
415:
349:
23:
2721:History of general-purpose CPUs
948:Nondeterministic Turing machine
34:needs additional citations for
901:Deterministic finite automaton
796:
755:
743:
1:
1692:Simultaneous and heterogenous
736:
2376:Integrated memory controller
2358:Translation lookaside buffer
1557:Memory dependence prediction
1000:Random-access stored program
953:Probabilistic Turing machine
135:instruction set architecture
7:
1832:Synaptic updates per second
793:. 2001. p. 169. p. 171-173.
724:
530:simultaneous multithreading
435:. The specific problem is:
10:
2808:
2236:Heterogeneous architecture
1158:Orthogonal instruction set
928:Alternating Turing machine
916:Quantum cellular automaton
265:
200:8051-compatible processors
2726:Microprocessor chronology
2713:
2689:Dynamic frequency scaling
2662:
2598:
2536:
2490:
2442:
2397:
2317:
2244:
2213:
2118:
2039:
2003:
1957:
1857:
1844:Cache performance metrics
1783:
1717:
1667:
1578:
1569:
1542:
1497:
1464:
1436:
1427:
1247:
1150:
1139:
1010:
866:
577:in favorite of executing
145:. In simpler CPUs, these
2741:Hardware security module
2084:Digital signal processor
2061:Graphics processing unit
1873:Graphics processing unit
768:. ARM Limited. July 2005
684:Processors that perform
282:
2694:Dynamic voltage scaling
2477:Memory address register
2371:Branch target predictor
2335:Address generation unit
2078:Physics processing unit
1867:Central processing unit
1826:Transactions per second
1814:Instructions per second
1737:Array processing (SIMT)
881:Stored-program computer
172:Register-bank switching
147:architectural registers
131:central processing unit
2500:Hardwired control unit
2382:Memory management unit
2347:Memory management unit
2096:Secure cryptoprocessor
2090:Tensor Processing Unit
2072:Vision processing unit
1806:Cycles per instruction
1800:Instructions per cycle
1747:Associative processing
1438:Instruction pipelining
860:Processor technologies
292:
214:
182:Fast Interrupt Request
2782:Computer architecture
2583:Sum-addressed decoder
2329:Arithmetic logic unit
1456:Classic RISC pipeline
1410:Epiphany architecture
1257:Motorola 68000 series
731:Sum-addressed decoder
334:low-power electronics
290:
249:bit-level parallelism
213:
2704:Performance per watt
2282:replacement policies
1948:Package on a package
1838:Performance per watt
1742:Pipelined processing
1512:Tomasulo's algorithm
1317:Clipper architecture
1173:Application-specific
886:Finite-state machine
717:single cycle. (See
442:improve this section
431:to meet Knowledge's
373:improve this section
43:improve this article
2787:Digital electronics
2736:Digital electronics
2389:Instruction decoder
2341:Floating-point unit
1995:Soft microprocessor
1942:System in a package
1517:Reservation station
1047:Transport-triggered
256:processor word size
127:processor registers
2608:Integrated circuit
2452:Processor register
2106:Baseband processor
1451:Operand forwarding
911:Cellular automaton
293:
215:
158:integrated circuit
2792:Digital registers
2769:
2768:
2658:
2657:
2277:Instruction cache
2267:Scratchpad memory
2114:
2113:
2101:Network processor
2030:Network on a chip
1985:Ultra-low-voltage
1936:Multi-chip module
1779:
1778:
1565:
1564:
1552:Branch prediction
1529:Register renaming
1423:
1422:
1405:VISC architecture
1227:Quantum computing
1222:VISC architecture
1104:Secondary storage
1020:Microarchitecture
980:Register machines
719:Register renaming
686:register renaming
680:Register renaming
470:
469:
462:
433:quality standards
424:This section may
409:
408:
401:
341:Microarchitecture
151:register renaming
119:
118:
111:
93:
2799:
2731:Processor design
2623:Power management
2505:Instruction unit
2366:Branch predictor
2315:
2314:
2013:System on a chip
1955:
1954:
1795:Transistor count
1719:Flynn's taxonomy
1576:
1575:
1434:
1433:
1237:Addressing modes
1148:
1147:
1094:Memory hierarchy
958:Hypercomputation
876:Abstract machine
853:
846:
839:
830:
829:
806:
800:
794:
787:
778:
777:
775:
773:
767:
759:
753:
747:
703:register windows
693:Register windows
554:micro-operations
465:
458:
454:
451:
445:
419:
418:
411:
404:
397:
393:
390:
384:
353:
345:
166:Register banking
114:
107:
103:
100:
94:
92:
51:
27:
19:
2807:
2806:
2802:
2801:
2800:
2798:
2797:
2796:
2772:
2771:
2770:
2765:
2751:Tick–tock model
2709:
2665:
2654:
2594:
2578:Address decoder
2532:
2486:
2482:Program counter
2457:Status register
2438:
2393:
2353:Load–store unit
2320:
2313:
2240:
2209:
2110:
2067:Image processor
2042:
2035:
2005:
1999:
1975:Microcontroller
1965:Embedded system
1953:
1853:
1786:
1775:
1713:
1663:
1561:
1538:
1522:Re-order buffer
1493:
1474:Data dependency
1460:
1419:
1249:
1243:
1142:
1141:Instruction set
1135:
1121:Multiprocessing
1089:Cache hierarchy
1082:Register/memory
1006:
906:Queue automaton
862:
857:
815:
810:
809:
801:
797:
789:Johan Janssen.
788:
781:
771:
769:
765:
761:
760:
756:
748:
744:
739:
727:
695:
682:
579:hyper threading
466:
455:
449:
446:
439:
420:
416:
405:
394:
388:
385:
370:
354:
343:
285:
268:
208:
174:
125:is an array of
115:
104:
98:
95:
58:"Register file"
52:
50:
40:
28:
17:
12:
11:
5:
2805:
2795:
2794:
2789:
2784:
2767:
2766:
2764:
2763:
2758:
2756:Pin grid array
2753:
2748:
2743:
2738:
2733:
2728:
2723:
2717:
2715:
2711:
2710:
2708:
2707:
2701:
2696:
2691:
2686:
2681:
2676:
2670:
2668:
2660:
2659:
2656:
2655:
2653:
2652:
2647:
2642:
2637:
2632:
2627:
2626:
2625:
2620:
2615:
2604:
2602:
2596:
2595:
2593:
2592:
2590:Barrel shifter
2587:
2586:
2585:
2580:
2573:Binary decoder
2570:
2569:
2568:
2558:
2553:
2548:
2542:
2540:
2534:
2533:
2531:
2530:
2525:
2517:
2512:
2507:
2502:
2496:
2494:
2488:
2487:
2485:
2484:
2479:
2474:
2469:
2464:
2462:Stack register
2459:
2454:
2448:
2446:
2440:
2439:
2437:
2436:
2435:
2434:
2429:
2419:
2414:
2409:
2403:
2401:
2395:
2394:
2392:
2391:
2386:
2385:
2384:
2373:
2368:
2363:
2362:
2361:
2355:
2344:
2338:
2332:
2325:
2323:
2312:
2311:
2306:
2301:
2296:
2291:
2290:
2289:
2284:
2279:
2274:
2269:
2264:
2254:
2248:
2246:
2242:
2241:
2239:
2238:
2233:
2228:
2223:
2217:
2215:
2211:
2210:
2208:
2207:
2206:
2205:
2195:
2190:
2185:
2180:
2175:
2170:
2165:
2160:
2155:
2150:
2145:
2140:
2135:
2130:
2124:
2122:
2116:
2115:
2112:
2111:
2109:
2108:
2103:
2098:
2093:
2087:
2081:
2075:
2069:
2064:
2058:
2056:AI accelerator
2053:
2047:
2045:
2037:
2036:
2034:
2033:
2027:
2022:
2019:Multiprocessor
2016:
2009:
2007:
2001:
2000:
1998:
1997:
1992:
1987:
1982:
1977:
1972:
1970:Microprocessor
1967:
1961:
1959:
1958:By application
1952:
1951:
1945:
1939:
1933:
1928:
1923:
1918:
1913:
1908:
1903:
1901:Tile processor
1898:
1893:
1888:
1883:
1882:
1881:
1870:
1863:
1861:
1855:
1854:
1852:
1851:
1846:
1841:
1835:
1829:
1823:
1817:
1811:
1810:
1809:
1797:
1791:
1789:
1781:
1780:
1777:
1776:
1774:
1773:
1772:
1771:
1761:
1756:
1755:
1754:
1749:
1744:
1739:
1729:
1723:
1721:
1715:
1714:
1712:
1711:
1706:
1701:
1696:
1695:
1694:
1689:
1687:Hyperthreading
1679:
1673:
1671:
1669:Multithreading
1665:
1664:
1662:
1661:
1656:
1651:
1650:
1649:
1639:
1638:
1637:
1632:
1622:
1621:
1620:
1615:
1605:
1600:
1599:
1598:
1593:
1582:
1580:
1573:
1567:
1566:
1563:
1562:
1560:
1559:
1554:
1548:
1546:
1540:
1539:
1537:
1536:
1531:
1526:
1525:
1524:
1519:
1509:
1503:
1501:
1495:
1494:
1492:
1491:
1486:
1481:
1476:
1470:
1468:
1462:
1461:
1459:
1458:
1453:
1448:
1446:Pipeline stall
1442:
1440:
1431:
1425:
1424:
1421:
1420:
1418:
1417:
1412:
1407:
1402:
1399:
1398:
1397:
1395:z/Architecture
1392:
1387:
1382:
1374:
1369:
1364:
1359:
1354:
1349:
1344:
1339:
1334:
1329:
1324:
1319:
1314:
1313:
1312:
1307:
1302:
1294:
1289:
1284:
1279:
1274:
1269:
1264:
1259:
1253:
1251:
1245:
1244:
1242:
1241:
1240:
1239:
1229:
1224:
1219:
1214:
1209:
1204:
1199:
1198:
1197:
1187:
1186:
1185:
1175:
1170:
1165:
1160:
1154:
1152:
1145:
1137:
1136:
1134:
1133:
1128:
1123:
1118:
1113:
1108:
1107:
1106:
1101:
1099:Virtual memory
1091:
1086:
1085:
1084:
1079:
1074:
1069:
1059:
1054:
1049:
1044:
1039:
1038:
1037:
1027:
1022:
1016:
1014:
1008:
1007:
1005:
1004:
1003:
1002:
997:
992:
987:
977:
972:
967:
966:
965:
960:
955:
950:
945:
940:
935:
930:
923:Turing machine
920:
919:
918:
913:
908:
903:
898:
893:
883:
878:
872:
870:
864:
863:
856:
855:
848:
841:
833:
827:
826:
814:
813:External links
811:
808:
807:
795:
779:
754:
741:
740:
738:
735:
734:
733:
726:
723:
694:
691:
681:
678:
627:(based on the
468:
467:
423:
421:
414:
407:
406:
389:September 2015
357:
355:
348:
342:
339:
338:
337:
330:
327:
324:
320:
317:
310:
309:
306:
303:
300:
284:
281:
280:
279:
275:
272:
267:
264:
207:
206:Implementation
204:
198:and the later
173:
170:
117:
116:
31:
29:
22:
15:
9:
6:
4:
3:
2:
2804:
2793:
2790:
2788:
2785:
2783:
2780:
2779:
2777:
2762:
2759:
2757:
2754:
2752:
2749:
2747:
2744:
2742:
2739:
2737:
2734:
2732:
2729:
2727:
2724:
2722:
2719:
2718:
2716:
2712:
2705:
2702:
2700:
2697:
2695:
2692:
2690:
2687:
2685:
2682:
2680:
2677:
2675:
2672:
2671:
2669:
2667:
2661:
2651:
2648:
2646:
2643:
2641:
2638:
2636:
2633:
2631:
2628:
2624:
2621:
2619:
2616:
2614:
2611:
2610:
2609:
2606:
2605:
2603:
2601:
2597:
2591:
2588:
2584:
2581:
2579:
2576:
2575:
2574:
2571:
2567:
2564:
2563:
2562:
2559:
2557:
2554:
2552:
2551:Demultiplexer
2549:
2547:
2544:
2543:
2541:
2539:
2535:
2529:
2526:
2524:
2521:
2518:
2516:
2513:
2511:
2508:
2506:
2503:
2501:
2498:
2497:
2495:
2493:
2489:
2483:
2480:
2478:
2475:
2473:
2472:Memory buffer
2470:
2468:
2467:Register file
2465:
2463:
2460:
2458:
2455:
2453:
2450:
2449:
2447:
2445:
2441:
2433:
2430:
2428:
2425:
2424:
2423:
2420:
2418:
2415:
2413:
2410:
2408:
2407:Combinational
2405:
2404:
2402:
2400:
2396:
2390:
2387:
2383:
2380:
2379:
2377:
2374:
2372:
2369:
2367:
2364:
2359:
2356:
2354:
2351:
2350:
2348:
2345:
2342:
2339:
2336:
2333:
2330:
2327:
2326:
2324:
2322:
2316:
2310:
2307:
2305:
2302:
2300:
2297:
2295:
2292:
2288:
2285:
2283:
2280:
2278:
2275:
2273:
2270:
2268:
2265:
2263:
2260:
2259:
2258:
2255:
2253:
2250:
2249:
2247:
2243:
2237:
2234:
2232:
2229:
2227:
2224:
2222:
2219:
2218:
2216:
2212:
2204:
2201:
2200:
2199:
2196:
2194:
2191:
2189:
2186:
2184:
2181:
2179:
2176:
2174:
2171:
2169:
2166:
2164:
2161:
2159:
2156:
2154:
2151:
2149:
2146:
2144:
2141:
2139:
2136:
2134:
2131:
2129:
2126:
2125:
2123:
2121:
2117:
2107:
2104:
2102:
2099:
2097:
2094:
2091:
2088:
2085:
2082:
2079:
2076:
2073:
2070:
2068:
2065:
2062:
2059:
2057:
2054:
2052:
2049:
2048:
2046:
2044:
2038:
2031:
2028:
2026:
2023:
2020:
2017:
2014:
2011:
2010:
2008:
2002:
1996:
1993:
1991:
1988:
1986:
1983:
1981:
1978:
1976:
1973:
1971:
1968:
1966:
1963:
1962:
1960:
1956:
1949:
1946:
1943:
1940:
1937:
1934:
1932:
1929:
1927:
1924:
1922:
1919:
1917:
1914:
1912:
1909:
1907:
1904:
1902:
1899:
1897:
1894:
1892:
1889:
1887:
1884:
1880:
1877:
1876:
1874:
1871:
1868:
1865:
1864:
1862:
1860:
1856:
1850:
1847:
1845:
1842:
1839:
1836:
1833:
1830:
1827:
1824:
1821:
1818:
1815:
1812:
1807:
1804:
1803:
1801:
1798:
1796:
1793:
1792:
1790:
1788:
1782:
1770:
1767:
1766:
1765:
1762:
1760:
1757:
1753:
1750:
1748:
1745:
1743:
1740:
1738:
1735:
1734:
1733:
1730:
1728:
1725:
1724:
1722:
1720:
1716:
1710:
1707:
1705:
1702:
1700:
1697:
1693:
1690:
1688:
1685:
1684:
1683:
1680:
1678:
1675:
1674:
1672:
1670:
1666:
1660:
1657:
1655:
1652:
1648:
1645:
1644:
1643:
1640:
1636:
1633:
1631:
1628:
1627:
1626:
1623:
1619:
1616:
1614:
1611:
1610:
1609:
1606:
1604:
1601:
1597:
1594:
1592:
1589:
1588:
1587:
1584:
1583:
1581:
1577:
1574:
1572:
1568:
1558:
1555:
1553:
1550:
1549:
1547:
1545:
1541:
1535:
1532:
1530:
1527:
1523:
1520:
1518:
1515:
1514:
1513:
1510:
1508:
1507:Scoreboarding
1505:
1504:
1502:
1500:
1496:
1490:
1489:False sharing
1487:
1485:
1482:
1480:
1477:
1475:
1472:
1471:
1469:
1467:
1463:
1457:
1454:
1452:
1449:
1447:
1444:
1443:
1441:
1439:
1435:
1432:
1430:
1426:
1416:
1413:
1411:
1408:
1406:
1403:
1400:
1396:
1393:
1391:
1388:
1386:
1383:
1381:
1378:
1377:
1375:
1373:
1370:
1368:
1365:
1363:
1360:
1358:
1355:
1353:
1350:
1348:
1345:
1343:
1340:
1338:
1335:
1333:
1330:
1328:
1325:
1323:
1320:
1318:
1315:
1311:
1308:
1306:
1303:
1301:
1298:
1297:
1295:
1293:
1290:
1288:
1285:
1283:
1282:Stanford MIPS
1280:
1278:
1275:
1273:
1270:
1268:
1265:
1263:
1260:
1258:
1255:
1254:
1252:
1246:
1238:
1235:
1234:
1233:
1230:
1228:
1225:
1223:
1220:
1218:
1215:
1213:
1210:
1208:
1205:
1203:
1200:
1196:
1193:
1192:
1191:
1188:
1184:
1181:
1180:
1179:
1176:
1174:
1171:
1169:
1166:
1164:
1161:
1159:
1156:
1155:
1153:
1149:
1146:
1144:
1143:architectures
1138:
1132:
1129:
1127:
1124:
1122:
1119:
1117:
1114:
1112:
1111:Heterogeneous
1109:
1105:
1102:
1100:
1097:
1096:
1095:
1092:
1090:
1087:
1083:
1080:
1078:
1075:
1073:
1070:
1068:
1065:
1064:
1063:
1062:Memory access
1060:
1058:
1055:
1053:
1050:
1048:
1045:
1043:
1040:
1036:
1033:
1032:
1031:
1028:
1026:
1023:
1021:
1018:
1017:
1015:
1013:
1009:
1001:
998:
996:
995:Random-access
993:
991:
988:
986:
983:
982:
981:
978:
976:
975:Stack machine
973:
971:
968:
964:
961:
959:
956:
954:
951:
949:
946:
944:
941:
939:
936:
934:
931:
929:
926:
925:
924:
921:
917:
914:
912:
909:
907:
904:
902:
899:
897:
894:
892:
891:with datapath
889:
888:
887:
884:
882:
879:
877:
874:
873:
871:
869:
865:
861:
854:
849:
847:
842:
840:
835:
834:
831:
825:, Chow - 1995
824:
820:
817:
816:
804:
799:
792:
786:
784:
764:
758:
751:
746:
742:
732:
729:
728:
722:
720:
715:
710:
706:
704:
700:
690:
687:
677:
674:
670:
666:
662:
657:
655:
651:
646:
641:
637:
633:
630:
626:
622:
618:
614:
609:
606:
602:
597:
592:
587:
584:
580:
575:
570:
567:
561:
559:
555:
550:
545:
543:
538:
533:
531:
527:
521:
518:
513:
510:
505:
503:
499:
495:
490:
486:
483:
478:
474:
464:
461:
453:
443:
438:
437:poor English.
434:
430:
429:
422:
413:
412:
403:
400:
392:
382:
378:
374:
368:
367:
363:
358:This section
356:
352:
347:
346:
335:
331:
328:
325:
321:
318:
315:
314:
313:
307:
304:
301:
298:
297:
296:
289:
276:
273:
270:
269:
263:
261:
257:
252:
250:
244:
241:
239:
235:
232:
226:
224:
220:
212:
203:
201:
197:
192:
189:
185:
183:
177:
169:
167:
163:
159:
154:
152:
148:
144:
140:
136:
132:
128:
124:
123:register file
113:
110:
102:
91:
88:
84:
81:
77:
74:
70:
67:
63:
60: –
59:
55:
54:Find sources:
48:
44:
38:
37:
32:This article
30:
26:
21:
20:
2761:Chip carrier
2699:Clock gating
2618:Mixed-signal
2515:Write buffer
2492:Control unit
2466:
2304:Clock signal
2043:accelerators
2025:Cypress PSoC
1682:Simultaneous
1499:Out-of-order
1131:Neuromorphic
1012:Architecture
970:Belt machine
963:Zeno machine
896:Hierarchical
798:
770:. Retrieved
757:
745:
711:
707:
701:ISA defines
696:
683:
658:
634:
610:
588:
583:Sandy bridge
571:
562:
546:
534:
522:
514:
506:
491:
487:
479:
475:
471:
456:
447:
440:Please help
436:
425:
395:
386:
371:Please help
359:
323:capacitance.
311:
294:
253:
245:
242:
227:
216:
193:
186:
178:
175:
165:
155:
146:
139:architecture
122:
120:
105:
96:
86:
79:
72:
65:
53:
41:Please help
36:verification
33:
2546:Multiplexer
2510:Data buffer
2221:Single-core
2193:bit slicing
2051:Coprocessor
1906:Coprocessor
1787:performance
1709:Cooperative
1699:Speculative
1659:Distributed
1618:Superscalar
1603:Instruction
1571:Parallelism
1544:Speculative
1376:System/3x0
1248:Instruction
1025:Von Neumann
938:Post–Turing
494:Alpha 21264
444:if you can.
162:static RAMs
133:(CPU). The
99:August 2015
2776:Categories
2666:management
2561:Multiplier
2422:Logic gate
2412:Sequential
2319:Functional
2299:Clock rate
2272:Data cache
2245:Components
2226:Multi-core
2214:Core count
1704:Preemptive
1608:Pipelining
1591:Bit-serial
1534:Wide-issue
1479:Structural
1401:Tilera ISA
1367:MicroBlaze
1337:ETRAX CRIS
1232:Comparison
1077:Load–store
1057:Endianness
821:- Farkas,
772:13 October
737:References
223:Sense amps
69:newspapers
2600:Circuitry
2520:Microcode
2444:Registers
2287:coherence
2262:CPU cache
2120:Word size
1785:Processor
1429:Execution
1332:DEC Alpha
1310:Power ISA
1126:Cognitive
933:Universal
654:Bulldozer
625:Pentium 4
450:June 2016
360:does not
219:bit cells
2538:Datapath
2231:Manycore
2203:variable
2041:Hardware
1677:Temporal
1357:OpenRISC
1052:Cellular
1042:Dataflow
1035:modified
725:See also
629:NetBurst
617:Vortex86
613:Geode GX
605:Xeon Phi
601:Larrabee
482:datapath
426:require
2714:Related
2645:Quantum
2635:Digital
2630:Boolean
2528:Counter
2427:Quantum
2188:512-bit
2183:256-bit
2178:128-bit
2021:(MPSoC)
2006:on chip
2004:Systems
1822:(FLOPS)
1635:Process
1484:Control
1466:Hazards
1352:Itanium
1347:Unicore
1305:PowerPC
1030:Harvard
990:Pointer
985:Counter
943:Quantum
659:Unlike
621:Pentium
596:Nehalem
589:On the
574:Nehalem
542:Pentium
535:In the
428:cleanup
381:removed
366:sources
266:Decoder
238:integer
196:MODCOMP
156:Modern
83:scholar
2650:Switch
2640:Analog
2378:(IMC)
2349:(MMU)
2198:others
2173:64-bit
2168:48-bit
2163:32-bit
2158:24-bit
2153:16-bit
2148:15-bit
2143:12-bit
1980:Mobile
1896:Stream
1891:Barrel
1886:Vector
1875:(GPU)
1834:(SUPS)
1802:(IPC)
1654:Memory
1647:Vector
1630:Thread
1613:Scalar
1415:Others
1362:RISC-V
1327:SuperH
1296:Power
1292:MIPS-X
1267:PDP-11
1116:Fabric
868:Models
823:Jouppi
714:R10000
667:, and
650:Phenom
645:Athlon
566:Core 2
526:POWER8
278:short.
143:caches
85:
78:
71:
64:
56:
2706:(PPW)
2664:Power
2556:Adder
2432:Array
2399:Logic
2360:(TLB)
2343:(FPU)
2337:(AGU)
2331:(ALU)
2321:units
2257:Cache
2138:8-bit
2133:4-bit
2128:1-bit
2092:(TPU)
2086:(DSP)
2080:(PPU)
2074:(VPU)
2063:(GPU)
2032:(NoC)
2015:(SoC)
1950:(PoP)
1944:(SiP)
1938:(MCM)
1879:GPGPU
1869:(CPU)
1859:Types
1840:(PPW)
1828:(TPS)
1816:(IPS)
1808:(CPI)
1579:Level
1390:S/390
1385:S/370
1380:S/360
1322:SPARC
1300:POWER
1183:TRIPS
1151:Types
766:(PDF)
699:SPARC
665:SPARC
661:Alpha
517:SPARC
498:SPARC
283:Array
260:68000
234:R8000
129:in a
90:JSTOR
76:books
2684:ACPI
2417:Glue
2309:FIFO
2252:Core
1990:ASIP
1931:CPLD
1926:FPOA
1921:FPGA
1916:ASIC
1769:SPMD
1764:MIMD
1759:MISD
1752:SWAR
1732:SIMD
1727:SISD
1642:Data
1625:Task
1596:Word
1342:M32R
1287:MIPS
1250:sets
1217:ZISC
1212:NISC
1207:OISC
1202:MISC
1195:EPIC
1190:VLIW
1178:EDGE
1168:RISC
1163:CISC
1072:HUMA
1067:NUMA
774:2021
697:The
669:MIPS
615:and
603:and
591:Atom
515:The
509:MIPS
507:The
502:MIPS
492:The
364:any
362:cite
231:MIPS
194:The
62:news
2679:APM
2674:PMU
2566:CPU
2523:ROM
2294:Bus
1911:PAL
1586:Bit
1372:LMC
1277:ARM
1272:x86
1262:VAX
721:.)
673:ARM
636:AMD
549:MMX
537:x86
375:by
236:'s
188:x86
45:by
2778::
2613:3D
782:^
663:,
640:K6
558:P6
500:,
121:A
852:e
845:t
838:v
776:.
752:.
463:)
457:(
452:)
448:(
402:)
396:(
391:)
387:(
383:.
369:.
336:.
112:)
106:(
101:)
97:(
87:·
80:·
73:·
66:·
39:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.