Knowledge

Optimizing compiler

Source 📝

689:: CISC instruction sets often have variable instruction lengths, often have a larger number of possible instructions that can be used, and each instruction could take differing amounts of time. RISC instruction sets attempt to limit the variability in each of these: instruction sets are usually constant in length, with few exceptions, there are usually fewer combinations of registers and memory operations, and the instruction issue rate (the number of instructions completed per time period, usually an integer multiple of the clock cycle) is usually constant in cases where memory latency is not a factor. There may be several ways of carrying out a certain task, with CISC usually offering more alternatives than RISC. Compilers have to know the relative costs among the various instructions and choose the best instruction sequence (see 705:. It allows the use of parts of the CPU for different instructions by breaking up the execution of instructions into various stages: instruction decode, address decode, memory fetch, register fetch, compute, register store, etc. One instruction could be in the register store stage, while another could be in the register fetch stage. Pipeline conflicts occur when an instruction in one stage of the pipeline depends on the result of another instruction ahead of it in the pipeline but not yet completed. Pipeline conflicts can lead to 1596:, it is possible to directly insert the body of the procedure inside the calling code rather than transferring control to it. This saves the overhead related to procedure calls, as well as providing an opportunity for many different parameter-specific optimizations, but comes at the cost of space; the procedure body is duplicated each time the procedure is called inline. Generally, inlining is useful in performance-critical code that makes a large number of calls to small procedures. A "fewer jumps" optimization. The 583: 1117:
parameter-less instruction, unlike a comparison with a number, which needs the number to compare to. Therefore, the amount of bytes needed to store the parameter is saved by using the loop reversal. Additionally, if the comparison number exceeds the size of word of the platform, in standard loop order, multiple instructions would need to be executed to evaluate the comparison, which is not the case with loop reversal.
1973:
disconcerting to the user, but especially so in this case, since it may not be clear that the optimization logic is at fault. In the case of internal errors, the problem can be partially ameliorated by a "fail-safe" programming technique in which the optimization logic in the compiler is coded such that a failure is trapped, a warning message issued, and the rest of the compilation proceeds to successful completion.
529:. Therefore, if a program makes several calls to the same function with the same arguments, the compiler can infer that the function's result only needs to be computed once. In languages where functions are allowed to have side effects, the compiler can restrict such optimization to functions that it can determine have no side-effects. 1942:
their control command or procedure to allow the compiler user to choose how much optimization to request; for instance, the IBM FORTRAN H compiler allowed the user to specify no optimization, optimization at the registers level only, or full optimization. By the 2000s, it was common for compilers, such as
1941:
There can be a wide range of optimizations that a compiler can perform, ranging from simple and straightforward optimizations that take little compilation time to elaborate and complex optimizations that involve considerable amounts of compilation time. Accordingly, compilers often provide options to
1354:
The most frequently used variables should be kept in processor registers for the fastest access. To find which variables to put in registers, an interference-graph is created. Each variable is a vertex and when two variables are used at the same time (have an intersecting liverange) they have an edge
729:
that allow them to execute multiple instructions simultaneously. There may be restrictions on which instructions can pair with which other instructions ("pairing" is the simultaneous execution of two or more instructions), and which functional unit can execute which instruction. They also have issues
508:
share common programming constructs and abstractions: branching (if, switch), looping (for, while), and encapsulation (structures, objects). Thus, similar optimization techniques can be used across languages. However, certain language features make some optimizations difficult. For instance, pointers
807:
can be optimized for the target CPU and memory. System cost or reliability may be more important than the code speed. For example, compilers for embedded software usually offer options that reduce code size at the expense of speed. The code's timing may need to be predictable, rather than as fast as
775:
General-purpose use: Prepackaged software is often expected to run on a variety of machines that may share the same instruction set, but have different performance characteristics. The code may not be optimized to any particular machine, or may be tuned to work best on the most popular machine while
1086:
If a quantity is computed inside a loop during every iteration, and its value is the same for each iteration, it can vastly improve efficiency to hoist it outside the loop and compute its value just once before the loop begins. This is particularly important with the address-calculation expressions
440:
has been generated. This optimization examines a few adjacent instructions (like "looking through a peephole" at the code) to see whether they can be replaced by a single instruction or a shorter sequence of instructions. For instance, a multiplication of a value by 2 might be more efficiently
537:
Many optimizations that operate on abstract programming concepts (loops, objects, structures) are independent of the machine targeted by the compiler, but many of the most effective optimizations are those that best exploit special features of the target platform. Examples are instructions that do
1572:
If an expression is carried out both when a condition is met and is not met, it can be written just once outside of the conditional statement. Similarly, if certain types of expressions (e.g., the assignment of a constant into a variable) appear inside a loop, they can be moved out of it because
1981:
Early compilers of the 1960s were often primarily concerned with simply compiling code correctly or efficiently, such that compile times were a major concern. One notable early optimizing compiler was the IBM FORTRAN H compiler of the late 1960s. Another of the earliest and important optimizing
564:
family, it turns out that the XOR variant is shorter and probably faster, as there will be no need to decode an immediate operand, nor use the internal "immediate operand register". A potential problem with this is that XOR may introduce a data dependency on the previous value of the register,
1127:
Unrolling duplicates the body of the loop multiple times, to decrease the number of times the loop condition is tested and the number of jumps; tests and jumps can hurt performance by impairing the instruction pipeline. A "fewer jumps" optimization. Completely unrolling a loop eliminates all
751:
may increase the size of the generated code and reduce code locality. The program may slow down drastically if a highly utilized section of code (like inner loops in various algorithms) suddenly cannot fit in the cache. Also, caches that are not fully associative have higher chances of cache
1972:
Another consideration is that optimization algorithms are complicated and, especially when being used to compile large, complex programming languages, can contain bugs that introduce errors in the generated code or cause internal errors during compilation. Compiler errors of any kind can be
1189:
and locks. The process needs some way of knowing ahead of time what value will be stored by the assignment that it should have followed. The purpose of this relaxation is to allow compiler optimization to perform certain kinds of code rearrangements that preserve the semantics of properly
1116:
and thus enable other optimizations. Furthermore, on some architectures, loop reversal contributes to smaller code, as when the loop index is being decremented, the condition that needs to be met for the running program to exit the loop is a comparison with zero. This is often a special,
425:, act on whole functions. This gives them more information to work with, but often makes expensive computations necessary. Worst-case assumptions need to be made when function calls occur or global variables are accessed because little information about them is available. 1812:
A space optimization that recognizes common sequences of code, creates subprograms ("code macros") that contain the common code, and replaces the occurrences of the common code sequences with calls to the corresponding subprograms. This is most effectively done as a
1425:
Many CPUs have smaller subroutine call instructions to access low memory. A compiler can save space by using these small calls in the main body of code. Jump instructions in low memory can access the routines at any address. This multiplies space savings from code
1377:, offer several different ways of performing a particular operation, using entirely different sequences of instructions. The job of the instruction selector is to do a good job overall of choosing which instructions to implement which operators in the low-level 770:: During development, optimizations are often disabled to speed compilation or to make the executable code easier to debug. Optimizing transformations, particularly those that reorder code, can make it difficult to relate the executable code to the source code. 476:(LTO), a.k.a. whole-program optimization, is a more general class of interprocedural optimization. During LTO, the compiler has visibility across translation units which allows for it to perform more aggressive optimizations like cross-module inlining and 1525:
on certain applications such as scientific code. Bounds-checking elimination allows the compiler to safely remove bounds checking in many situations where it can determine that the index must fall within valid bounds; for example, if it is a simple loop
1026:
Another technique that attempts to reduce loop overhead. When two adjacent loops would iterate the same number of times regardless of whether that number is known at compile time, their bodies can be combined as long as they do not refer to each other's
1864:
If we have two tests that are the condition for something, we can first deal with the simpler tests (e.g., comparing a variable to something) and only then with the complex tests (e.g., those that require a function call). This technique complements
1932:
Due to the extra time and space required by interprocedural analysis, most compilers do not perform it by default. Users must use compiler options explicitly to tell the compiler to enable interprocedural analysis and other expensive optimizations.
496:. Some of the techniques that can be applied in a more limited scope, such as macro compression which saves space by collapsing common sequences of instructions, are more effective when the entire executable task image is available for analysis. 1885:
works on the entire program, across procedure and file boundaries. It works tightly with intraprocedural counterparts, carried out with the cooperation of a local part and a global part. Typical interprocedural optimizations are procedure
1300:, in which every variable is assigned in only one place. Although some function without SSA, they are most effective with SSA. Many optimizations listed in other sections also benefit with no special changes, such as register allocation. 1395:
Instruction scheduling is an important optimization for modern pipelined processors, which avoids stalls or bubbles in the pipeline by clustering instructions with no dependencies together, while being careful to preserve the original
1276:, it is difficult to make any optimizations at all, since potentially any variable can have been changed when a memory location is assigned to. By specifying which pointers can alias which variables, unrelated pointers can be ignored. 1101:
Some pervasive algorithms such as matrix multiplication have very poor cache behavior and excessive memory accesses. Loop nest optimization increases the number of cache hits by operating over small blocks and by using a loop
1957:(some commercial versions of which date back to mainframe software of the late 1970s). These tools take the executable output by an optimizing compiler and optimize it even further. Post-pass optimizers usually work on the 1138:
Loop splitting attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops that have the same bodies but iterate over different contiguous portions of the index range. A useful special case is
1414:
If several sequences of code are identical, or can be parameterized or reordered to be identical, they can be replaced with calls to a shared subroutine. This can often share code for subroutine set-up and sometimes
1992:(1975). By the late 1980s, optimizing compilers were sufficiently effective that programming in assembly language declined. This co-evolved with the development of RISC chips and advanced processor features such as 1053:
conditional, reducing the number of jumps by two, for cases when the loop is executed. Doing so duplicates the condition check (increasing the size of the code), but is more efficient because jumps usually cause a
394:
output, which is impossible in a general sense since optimizing for one aspect may degrade performance for another. Rather, optimizations are heuristic methods for improving resource usage in typical programs.
1438:, restructuring compilers enhance data locality and expose more parallelism by reordering computations. Space-optimizing compilers may reorder code to lengthen sequences that can be factored into subroutines. 1165:
The loop is restructured in such a way that work done in an iteration is split into several parts and done over several iterations. In a tight loop, this technique hides the latency between loading and using
460:
analyze all of a program's source code. The greater the quantity of information consumed; the more effective the optimizations can be. The information can be used for various optimizations including function
418:. Since basic blocks have no control flow, these optimizations need very little analysis, saving time and reducing storage requirements, but this also means that no information is preserved across jumps. 1385:
and the x86 architecture, complex addressing modes can be used in statements like "lea 25(a1,d5*4), a0", allowing a single instruction to perform a significant amount of arithmetic with less storage.
1176:
A loop is converted into multi-threaded or vectorized (or even both) code to utilize multiple processors simultaneously in a shared-memory multiprocessor (SMP) machine, including multi-core machines.
525:
that also support pointers do have optimizations for arrays. Conversely, some language features make certain optimizations easier. For example, in some languages, functions are not permitted to have
1076:
These optimizations exchange inner loops with outer loops. When the loop variables index into an array, such a transformation can improve the locality of reference, depending on the array's layout.
1925:
was criticized for a lack of powerful interprocedural analysis and optimizations, though this is now improving. Another open-source compiler with full analysis and optimization infrastructure is
1335:, and improves upon what is possible by running them separately. This optimization symbolically executes the program, simultaneously propagating constant values and eliminating portions of the 1287:
removal of assignments to variables that are not subsequently read, either because the lifetime of the variable ends or because of a subsequent assignment that will overwrite the first value.
1573:
their effect will be the same no matter if they're executed many times or just once. This is also known as total redundancy elimination. A similar but more powerful optimization is
1406:
Rematerialization recalculates a value instead of loading it from memory, eliminating an access to memory. This is performed in tandem with register allocation to avoid spills.
1491:
In languages where it is common for a sequence of transformations to be applied to a list, deforestation attempts to remove the construction of intermediate data structures.
781:
Special-purpose use: If the software is compiled for machines with uniform characteristics, then the compiler can heavily optimize the generated code for those machines.
1011:
Loop fission attempts to break a loop into multiple loops over the same index range with each new loop taking only a part of the original loop's body. This can improve
1310: 933:
Replace complex, difficult, or expensive operations with simpler ones. For example, replacing division by a constant with multiplication by its reciprocal, or using
1155:
Unswitching moves a conditional from inside a loop to outside the loop by duplicating the loop's body inside each of the if and else clauses of the conditional.
1891: 869:) interfere with the prefetching of instructions, thus slowing down code. Using inlining or loop unrolling can reduce branching, at the cost of increasing 846:
Remove unnecessary computations and intermediate values. Less work for the CPU, cache, and memory usually results in faster execution. Alternatively, in
1894:. As usual, the compiler needs to perform interprocedural analysis before its actual optimizations. Interprocedural analyses include alias analysis, 379:'s willingness to wait for compilation limit the optimizations that a compiler might provide. Research indicates that some optimization problems are 757:
Cache/memory transfer rates: These give the compiler an indication of the penalty for cache misses. This is used mainly in specialized applications.
2716: 2550: 1107: 2863: 1253: 1359:
using the same number of colors as there are registers. If the coloring fails one variable is "spilled" to memory and the coloring is retried.
2235: 3105: 1112:
Loop reversal reverses the order in which values are assigned to the index variable. This is a subtle optimization that can help eliminate
643:
Whether particular optimizations can and should be applied may depend on the characteristics of the target machine. Some compilers such as
2482:
Proc. 11th Southeastern Conference on Combinatorics, Graph Theory and Computing, Congressus Numerantium, Utilitas Math., Winnipeg, Canada
553:
machines, both instructions would be equally appropriate, since they would both be the same length and take the same time. On many other
330: 545:
to 0, the obvious way is to use the constant '0' in an instruction that sets a register value to a constant. A less obvious way is to
2404: 2897: 2320: 1468:
A function call consumes stack space and involves some overhead related to parameter passing and flushing the instruction cache.
1322: 1313:
of the program, and then determining which values are computed by equivalent expressions. GVN can identify some redundancy that
1145:, which can simplify a loop with a problematic first iteration by performing that iteration separately before entering the loop. 1829:. The problem of determining an optimal set of macros that minimizes the space required by a given code segment is known to be 968:. Loop optimizations can have a significant impact because many programs spend a large percentage of their time inside loops. 2368: 2173: 903:
Reorder operations to allow multiple computations to happen in parallel, either at the instruction, memory, or thread level.
2709: 2216:(Ph.D. dissertation). Vol. Computer Science Department Technical Report #246. Courant Institute, New York University. 1988: 1946:, to have several compiler command options that could affect a variety of optimization choices, starting with the familiar 1556:
Removes instructions that will not affect the behaviour of the program, for example, definitions that have no uses, called
1185:
Prescient store optimizations allow store operations to occur earlier than would otherwise be permitted in the context of
2273: 1447:
Although many of these also apply to non-functional languages, they either originate in or are particularly critical in
909:
The more precise the information the compiler has, the better it can employ any or all of these optimization techniques.
883:
Code and data that are accessed closely together in time should be placed close together in memory to increase spatial
1618:
In this optimization, consecutive conditional jumps predicated entirely or partially on the same condition are merged.
3069: 2610: 2590: 2138: 1969:(pcc) of the 1980s, which had an optional pass that would perform post-optimizations on the generated assembly code. 626: 608: 367:, algorithms that transform code to produce semantically equivalent code optimized for some aspect. It is typically 2946: 2847: 1965:
level (in contrast with compilers that optimize intermediate representations of programs). One such example is the
1481: 1370: 1314: 1213: 686: 164: 569:
stall. However, processors often have XOR of a register with itself as a special case that does not cause stalls.
3100: 2988: 2702: 2084: 323: 897:, so place the most commonly used items in registers first, then caches, then main memory, before going to disk. 3095: 1574: 1273: 1059: 593: 526: 1817:
optimization, when all the code is present. The technique was first used to conserve space in an interpretive
2883: 208: 2967: 1882: 1597: 730:
similar to pipeline conflicts. Instructions can be scheduled so that the functional units are fully loaded.
457: 257: 2444:. Computer Science Department Technical Report. Vol. 11. Courant Institute of Mathematical Sciences. 3018: 2983: 2105: 2004:, which were designed to be targeted by optimizing compilers rather than by human-written assembly code. 1514: 1508: 1378: 993: 975: 934: 916: 65: 2907: 2782: 1435: 1186: 1081: 965: 316: 218: 1296:
These optimizations are intended to be done after transforming the program into a special form called
2762: 2023: 1604:
languages are also an example of such an optimization. Although statements could be implemented with
1452: 1171: 492:
to analyze the executable task image of the program after all of an executable machine code has been
181: 152: 81: 3110: 2094: 1586: 924: 651:
parameterize machine-dependent factors so that they can be used to optimize for different machines.
549:
a register with itself. It is up to the compiler to know which instruction variant to use. On many
510: 236: 158: 2208: 840:
Reuse results that are already computed and store them for later use, instead of recomputing them.
743:
size and type (direct mapped, 2-/4-/8-/16-way associative, fully associative): Techniques such as
2767: 2691: – documentation about x86 processor architecture and low-level code optimization 2473: 2437: 2089: 1922: 1456: 657: 644: 604: 289: 214: 93: 1248:) at compile time, rather than doing the calculation in run-time. Used in most modern languages. 2915: 2892: 2868: 2797: 2079: 2059: 1997: 1601: 1463: 1420: 1390: 1356: 1304: 1096: 920: 808:
possible, so code caching might be disabled, along with compiler optimizations that require it.
795: 473: 304: 263: 41: 2387: 2360: 3044: 3039: 2993: 2920: 2744: 2739: 2480:; Clinton F. Goss (1980). "Macro Substitutions in MICRO SPITBOL - a Combinatorial Analysis". 2043: 2033: 2001: 1993: 1895: 1551: 1522: 1364: 1332: 1087:
generated by loops over arrays. For correct implementation, this technique must be used with
1012: 884: 722: 690: 600: 489: 433: 186: 169: 71: 22: 1205:, primarily depend on how certain properties of data are propagated by control edges in the 980:
Roughly, if a variable in a loop is a simple linear function of the index variable, such as
3074: 2998: 2842: 2455: 2227: 2110: 2064: 866: 698: 566: 505: 477: 242: 8: 3054: 2925: 2817: 2100: 2038: 1966: 1954: 1543: 1448: 1349: 1160: 1113: 997: 726: 384: 360: 2634:
Schilling, Jonathan L. (August 1993). "Fail-safe programming in compiler optimization".
2472: 2459: 2432: 2231: 3049: 3013: 2832: 2822: 2772: 2651: 2616: 2563: 2500: 2445: 2217: 2048: 1870: 1496: 1336: 1206: 1202: 985: 787: 677:. Temporary/intermediate results can be accessed in registers instead of slower memory. 666: 542: 493: 446: 356: 2930: 2754: 2606: 2364: 2353: 2169: 2134: 1958: 1839: 1401: 1058:. Additionally, if the initial condition is known at compile-time and is known to be 952: 1856:
Rearrange an expression tree to minimize resources needed for expression evaluation.
984:, it can be updated appropriately each time the loop variable is changed. This is a 3064: 3003: 2873: 2852: 2812: 2792: 2655: 2643: 2620: 2598: 2053: 2018: 1918: 1906: 1890:, interprocedural dead-code elimination, interprocedural constant propagation, and 1822: 1582: 1328: 1235: 1150: 1071: 894: 859: 847: 791: 744: 462: 352: 36: 1546:
in a program to reduce conditional branches and improve the locality of reference.
3059: 2165: 2157: 2028: 1866: 1518: 1374: 1297: 1267: 834:. If the fast path is taken most often, the result is better overall performance. 804: 541:
The following is an instance of a local machine-dependent optimization. To set a
411:
Scope describes how much of the input code is considered to apply optimizations.
372: 202: 98: 2595:
Proceedings of the 25th International Symposium on Software Testing and Analysis
1015:
to both the data being accessed within the loop and the code in the loop's body.
445:
the value or by adding the value to itself (this example is also an instance of
3034: 3008: 2807: 2802: 2787: 2013: 1826: 1613: 1566: 1485: 1469: 1133: 1122: 1088: 1055: 1032: 748: 709:: where the CPU wastes cycles waiting for a conflict to resolve. Compilers can 706: 670: 554: 518: 176: 76: 1476:
through a process called tail-recursion elimination or tail-call optimization.
1128:
overhead, but requires that the number of iterations be known at compile time.
3089: 971:
Some optimization techniques primarily designed to operate on loops include:
702: 2602: 2499:
Glazunov, N. M. (November 25, 2012). "Foundations of Scientific Research".
2477: 2433: 1962: 1905:
Interprocedural optimization is common in modern commercial compilers from
1814: 1382: 1141: 1005: 538:
several things at once, such as decrement register and branch if not zero.
437: 131: 51: 2647: 351:
designed to generate code that is optimized in aspects such as minimizing
2777: 2694: 2277: 1830: 1818: 1020: 893:
Accesses to memory are increasingly more expensive for each level of the
874: 870: 718: 713:, or reorder, instructions so that pipeline stalls occur less frequently. 465:, where a call to a function is replaced by a copy of the function body. 415: 380: 272: 253: 121: 116: 1869:, but can be used only when the tests are not dependent on one another. 2857: 1899: 1850: 1605: 1593: 1281: 674: 376: 88: 2684: 2350: 2951: 2688: 2324: 1914: 1557: 1473: 989: 873:
size by the length of the repeated code. This tends to merge several
826: 767: 740: 442: 136: 2551:"Customize the compilation process with Clang: Optimization options" 1982:
compilers, that pioneered several advanced techniques, was that for
611:. Statements consisting only of original research should be removed. 363:
consumption. Optimization is generally implemented as a sequence of
1887: 1198: 817:
Optimization includes the following, sometimes conflicting themes.
348: 126: 46: 2505: 2450: 2222: 267: 223: 1560:. This reduces code size and eliminates unnecessary computation. 1534:
Choose the shortest branch displacement that reaches the target.
988:, and also may allow the index variable's definitions to become 2423:
Cx51 Compiler Manual, version 09.2001, p155, Keil Software Inc.
1926: 375:
intensive. In practice, factors such as available memory and a
248: 1953:
An approach to isolating optimization is the use of so-called
1091:, because not all code is safe to be hoisted outside the loop. 499: 2255: 1983: 1943: 1910: 648: 558: 514: 299: 2257:
Machine Code Optimization - Improving Executable Object Code
2210:
Machine Code Optimization - Improving Executable Object Code
532: 436:
are usually performed late in the compilation process after
403:
Optimizations are categorized in various, overlapping ways.
2351:
Steven Muchnick; Muchnick and Associates (15 August 1997).
682: 550: 522: 295: 232: 227: 561: 546: 368: 2129:
Aho, Alfred V.; Sethi, Ravi; Ullman, Jeffrey D. (1986).
937:
to replace multiplication by a loop index with addition.
919:. Information gathered at runtime, ideally with minimal 824:
The common case may have unique properties that allow a
2133:. Reading, Massachusetts: Addison-Wesley. p. 585. 1833:, but efficient heuristics attain near-optimal results. 483: 2344: 1608:
they are almost always implemented with code inlining.
1355:
between them. This graph is colored using for example
1230:
will not change, and so only calculate its value once.
960:
acts on the statements that make up a loop, such as a
915:
Information gathered during a test run can be used in
2409:
ACM Transactions on Programming Languages and Systems
2392:
ACM Transactions on Programming Languages and Systems
2260:(PhD thesis). Courant Institute, New York University. 1240:
replacing expressions consisting of constants (e.g.,
1226:. Compilers implementing this technique realize that 669:: Registers can be used to optimize for performance. 414:
Local scope optimizations use information local to a
2591:"Toward understanding compiler bugs in GCC and LLVM" 1542:
Code-block reordering alters the order of the basic
1521:
of all array accesses. This is a severe performance
1442: 701:: A pipeline is essentially a CPU broken up into an 2352: 1222:, "common subexpression" refers to the duplicated 1023:or loop combining or loop ramming or loop jamming 572: 3087: 2589:Sun, Chengnian; et al. (July 18–20, 2016). 1877: 1180: 2253: 2206: 2007: 1343: 452: 2864:Induction variable recognition and elimination 2564:Software engineering for the Cobol environment 2388:Constant Propagation with Conditional Branches 2128: 1531:Branch-offset optimization (machine dependent) 1254:Induction variable recognition and elimination 2710: 2156: 2152: 2150: 1845:(e.g., by disrupting alignment within a page) 1381:with. For example, on many processors in the 927:compiler to dynamically improve optimization. 865:Less complicated code. Jumps (conditional or 324: 2426: 2131:Compilers: Principles, Techniques, and Tools 1309:GVN eliminates redundancy by constructing a 2405:Combining Analyses, Combining Optimizations 2321:"Microsoft Learn - Prescient Store Actions" 1936: 500:Language-independent vs. language-dependent 2724: 2717: 2703: 2566:. Portal.acm.org. Retrieved on 2013-08-10. 2147: 1291: 1193: 673:can be stored in registers instead of the 331: 317: 2633: 2531: 2529: 2504: 2449: 2386:Wegman, Mark N. and Zadeck, F. Kenneth. " 2221: 1268:Alias classification and pointer analysis 627:Learn how and when to remove this message 533:Machine-independent vs. machine-dependent 468: 406: 2498: 2289: 2287: 850:, less code brings a lower product cost. 786:Notable cases include code designed for 428: 390:In general, optimization cannot produce 2898:Sparse conditional constant propagation 2548: 2419: 2417: 2355:Advanced Compiler Design Implementation 1323:Sparse conditional constant propagation 517:make array optimization difficult (see 3088: 2526: 1502: 992:. This information is also useful for 941: 16:Compiler that optimizes generated code 2698: 2284: 2202: 2200: 2198: 946: 752:collisions even in an unfilled cache. 2414: 2403:Click, Clifford and Cooper, Keith. " 1989:The Design of an Optimizing Compiler 576: 484:Machine and object code optimization 3106:Programming language implementation 2588: 2394:, 13(2), April 1991, pages 181-210. 1921:. For a long time, the open source 421:Global scope optimizations, a.k.a. 13: 2440:; Clinton F. Goss (August 2013) . 2411:, 17(2), March 1995, pages 181-196 2195: 1873:semantics can make this difficult. 1373:architectures and those with many 1037:This technique changes a standard 906:More precise information is better 488:Machine code optimization uses an 282:Notable compilers & toolchains 14: 3122: 2678: 2549:Guelton, Serge (August 5, 2019). 2274:"GCC - Machine-Dependent Options" 1443:Functional language optimizations 1369:Most architectures, particularly 776:working less optimally on others. 398: 2848:Common subexpression elimination 2597:. Issta 2016. pp. 294–305. 2207:Clinton F. Goss (August 2013) . 1315:common subexpression elimination 1236:Constant folding and propagation 1214:Common subexpression elimination 812: 581: 2989:Compile-time function execution 2662: 2627: 2582: 2569: 2557: 2542: 2513: 2492: 2466: 2397: 2380: 2241:from the original on 2022-10-09 2085:Compile-time function execution 1986:(1970), which was described in 1472:algorithms can be converted to 1327:Combines constant propagation, 2331: 2313: 2300: 2266: 2182: 2122: 1575:partial-redundancy elimination 573:Factors affecting optimization 521:). However, languages such as 1: 2116: 1878:Interprocedural optimizations 1821:used in an implementation of 1565:Factoring out of invariants ( 1181:Prescient store optimizations 458:Interprocedural optimizations 2968:Interprocedural optimization 2359:. Morgan Kaufmann. pp.  2008:List of static code analyses 1898:, and the construction of a 1883:Interprocedural optimization 1344:Code generator optimizations 1339:that this makes unreachable. 890:Exploit the memory hierarchy 453:Interprocedural optimization 239:target-specific initializer) 7: 3019:Profile-guided optimization 2984:Bounds-checking elimination 2106:Profile-guided optimization 2073: 1509:Bounds-checking elimination 1379:intermediate representation 1260:induction variable analysis 1258:see discussion above about 994:bounds-checking elimination 976:Induction variable analysis 935:induction variable analysis 917:profile-guided optimization 607:the claims made and adding 66:Intermediate representation 10: 3127: 2783:Loop-invariant code motion 2160:; Torczon, Linda (2003) . 1976: 1436:integer linear programming 1244:) with their final value ( 1082:Loop-invariant code motion 966:loop-invariant code motion 950: 719:Number of functional units 365:optimizing transformations 3027: 2976: 2960: 2939: 2906: 2882: 2831: 2763:Automatic parallelization 2753: 2732: 2254:Clinton F. Goss (2013) . 1763: 1709: 1673: 1622: 1592:When some code invokes a 1209:. Some of these include: 1172:Automatic parallelization 721:: Some CPUs have several 2668:Aho, Sethi, and Ullman, 2575:Aho, Sethi, and Ullman, 2535:Aho, Sethi, and Ullman, 2519:Aho, Sethi, and Ullman, 2337:Aho, Sethi, and Ullman, 2306:Aho, Sethi, and Ullman, 2293:Aho, Sethi, and Ullman, 2188:Aho, Sethi, and Ullman, 2095:Just-in-time compilation 1937:Practical considerations 1513:Many languages, such as 1298:Static Single Assignment 1201:optimizations, based on 912:Runtime metrics can help 821:Optimize the common case 2768:Automatic vectorization 2603:10.1145/2931037.2931074 2474:Martin Charles Golumbic 2438:Martin Charles Golumbic 2090:Full-employment theorem 1431:Reordering computations 1317:cannot, and vice versa. 1292:SSA-based optimizations 1194:Data-flow optimizations 1190:synchronized programs. 796:parallelizing compilers 423:intraprocedural methods 359:use, storage size, and 290:GNU Compiler Collection 215:Common Language Runtime 3101:Compiler optimizations 2916:Instruction scheduling 2893:Global value numbering 2869:Live-variable analysis 2798:Loop nest optimization 2726:Compiler optimizations 2162:Engineering a Compiler 2080:Algorithmic efficiency 2060:Live-variable analysis 1998:out-of-order execution 1994:superscalar processors 1602:imperative programming 1464:Tail-call optimization 1391:Instruction scheduling 1305:Global value numbering 1097:Loop nest optimization 867:unconditional branches 474:Link-time optimization 469:Link-time optimization 434:Peephole optimizations 407:Local vs. global scope 145:Compilation strategies 3096:Compiler construction 3045:Control-flow analysis 3040:Array-access analysis 2994:Dead-code elimination 2952:Tail-call elimination 2921:Instruction selection 2745:Local value numbering 2740:Peephole optimization 2648:10.1145/163114.163118 2168:. pp. 404, 407. 2044:Control-flow analysis 2034:Array-access analysis 2002:speculative execution 1896:array access analysis 1552:Dead-code elimination 1539:Code-block reordering 1365:Instruction selection 1333:dead-code elimination 1066:guard can be skipped. 1049:) loop wrapped in an 1013:locality of reference 1000:, among other things. 885:locality of reference 853:Fewer jumps by using 691:instruction selection 506:programming languages 490:object code optimizer 429:Peephole optimization 170:Compile and go system 3075:Value range analysis 2999:Expression templates 2843:Available expression 2685:Optimization manuals 2111:Program optimization 2065:Available expression 1955:post-pass optimizers 1892:procedure reordering 1449:functional languages 1008:or loop distribution 830:at the expense of a 794:, for which special 735:Machine architecture 243:Java virtual machine 165:Tracing just-in-time 3055:Dependence analysis 2926:Register allocation 2818:Software pipelining 2636:ACM SIGPLAN Notices 2460:2013arXiv1308.6096D 2232:2013arXiv1308.4815G 2039:Dependence analysis 1967:Portable C Compiler 1503:Other optimizations 1357:Chaitin's algorithm 1350:Register allocation 1272:in the presence of 1220:(a + b) - (a + b)/4 1161:Software pipelining 998:dependence analysis 942:Specific techniques 923:, can be used by a 345:optimizing compiler 59:Optimizing compiler 3050:Data-flow analysis 3014:Partial evaluation 2823:Strength reduction 2773:Induction variable 2478:Robert B. K. Dewar 2434:Robert B. K. Dewar 2049:Data-flow analysis 1497:Partial evaluation 1337:control-flow graph 1218:In the expression 1207:control-flow graph 1203:data-flow analysis 986:strength reduction 964:loop, for example 947:Loop optimizations 930:Strength reduction 855:straight line code 592:possibly contains 447:strength reduction 3083: 3082: 2931:Rematerialization 2375:constant folding. 2370:978-1-55860-320-2 2175:978-1-55860-698-2 1959:assembly language 1853:-height reduction 1809:Macro compression 1402:Rematerialization 982:j := 4*i + 1 958:Loop optimization 953:Loop optimization 792:vector processors 637: 636: 629: 594:original research 341: 340: 23:Program execution 3118: 3065:Pointer analysis 3004:Inline expansion 2874:Use-define chain 2853:Constant folding 2813:Loop unswitching 2793:Loop interchange 2719: 2712: 2705: 2696: 2695: 2673: 2666: 2660: 2659: 2631: 2625: 2624: 2586: 2580: 2573: 2567: 2561: 2555: 2554: 2546: 2540: 2533: 2524: 2517: 2511: 2510: 2508: 2496: 2490: 2489: 2470: 2464: 2463: 2453: 2430: 2424: 2421: 2412: 2401: 2395: 2384: 2378: 2377: 2358: 2348: 2342: 2335: 2329: 2328: 2317: 2311: 2304: 2298: 2291: 2282: 2281: 2270: 2264: 2261: 2250: 2248: 2246: 2240: 2225: 2215: 2204: 2193: 2186: 2180: 2179: 2158:Cooper, Keith D. 2154: 2145: 2144: 2126: 2101:Kildall's method 2054:Use-define chain 2019:Pointer analysis 1949: 1919:Sun Microsystems 1871:Short-circuiting 1803: 1802: 1799: 1796: 1793: 1790: 1787: 1784: 1781: 1778: 1775: 1772: 1769: 1766: 1761: 1760: 1757: 1754: 1751: 1748: 1745: 1742: 1739: 1736: 1733: 1730: 1727: 1724: 1721: 1718: 1715: 1712: 1704: 1703: 1700: 1697: 1694: 1691: 1688: 1685: 1682: 1679: 1676: 1671: 1670: 1667: 1664: 1661: 1658: 1655: 1652: 1649: 1646: 1643: 1640: 1637: 1634: 1631: 1628: 1625: 1583:Inline expansion 1375:addressing modes 1329:constant folding 1247: 1243: 1229: 1225: 1221: 1151:Loop unswitching 1072:Loop interchange 983: 895:memory hierarchy 860:branch-free code 848:embedded systems 837:Avoid redundancy 803:Firmware for an 745:inline expansion 632: 625: 621: 618: 612: 609:inline citations 585: 584: 577: 504:Most high-level 478:devirtualization 355:execution time, 333: 326: 319: 195:Notable runtimes 182:Transcompilation 29:General concepts 19: 18: 3126: 3125: 3121: 3120: 3119: 3117: 3116: 3115: 3111:Compiler theory 3086: 3085: 3084: 3079: 3060:Escape analysis 3028:Static analysis 3023: 2972: 2956: 2935: 2908:Code generation 2902: 2878: 2834: 2827: 2749: 2728: 2723: 2681: 2676: 2672:, pp. 740, 779. 2667: 2663: 2632: 2628: 2613: 2587: 2583: 2574: 2570: 2562: 2558: 2547: 2543: 2534: 2527: 2518: 2514: 2497: 2493: 2471: 2467: 2431: 2427: 2422: 2415: 2402: 2398: 2385: 2381: 2371: 2349: 2345: 2336: 2332: 2319: 2318: 2314: 2305: 2301: 2292: 2285: 2272: 2271: 2267: 2244: 2242: 2238: 2213: 2205: 2196: 2187: 2183: 2176: 2166:Morgan Kaufmann 2155: 2148: 2141: 2127: 2123: 2119: 2076: 2029:Escape analysis 2010: 1979: 1947: 1939: 1880: 1867:lazy evaluation 1861:Test reordering 1800: 1797: 1794: 1791: 1788: 1785: 1782: 1779: 1776: 1773: 1770: 1767: 1764: 1758: 1755: 1752: 1749: 1746: 1743: 1740: 1737: 1734: 1731: 1728: 1725: 1722: 1719: 1716: 1713: 1710: 1701: 1698: 1695: 1692: 1689: 1686: 1683: 1680: 1677: 1674: 1668: 1665: 1662: 1659: 1656: 1653: 1650: 1647: 1644: 1641: 1638: 1635: 1632: 1629: 1626: 1623: 1567:loop invariants 1519:bounds checking 1505: 1445: 1415:tail-recursion. 1346: 1294: 1245: 1241: 1227: 1223: 1219: 1196: 1183: 1045:(also known as 981: 955: 949: 944: 815: 805:embedded system 707:pipeline stalls 671:Local variables 633: 622: 616: 613: 598: 586: 582: 575: 555:microprocessors 535: 502: 486: 471: 455: 431: 409: 401: 337: 217:(CLR) and  203:Android Runtime 99:Virtual machine 17: 12: 11: 5: 3124: 3114: 3113: 3108: 3103: 3098: 3081: 3080: 3078: 3077: 3072: 3070:Shape analysis 3067: 3062: 3057: 3052: 3047: 3042: 3037: 3035:Alias analysis 3031: 3029: 3025: 3024: 3022: 3021: 3016: 3011: 3009:Jump threading 3006: 3001: 2996: 2991: 2986: 2980: 2978: 2974: 2973: 2971: 2970: 2964: 2962: 2958: 2957: 2955: 2954: 2949: 2943: 2941: 2937: 2936: 2934: 2933: 2928: 2923: 2918: 2912: 2910: 2904: 2903: 2901: 2900: 2895: 2889: 2887: 2880: 2879: 2877: 2876: 2871: 2866: 2861: 2855: 2850: 2845: 2839: 2837: 2829: 2828: 2826: 2825: 2820: 2815: 2810: 2808:Loop unrolling 2805: 2803:Loop splitting 2800: 2795: 2790: 2788:Loop inversion 2785: 2780: 2775: 2770: 2765: 2759: 2757: 2751: 2750: 2748: 2747: 2742: 2736: 2734: 2730: 2729: 2722: 2721: 2714: 2707: 2699: 2693: 2692: 2680: 2679:External links 2677: 2675: 2674: 2661: 2626: 2611: 2581: 2568: 2556: 2541: 2525: 2512: 2491: 2465: 2425: 2413: 2396: 2379: 2369: 2343: 2341:, pp. 592–594. 2330: 2312: 2310:, pp. 596–598. 2299: 2283: 2265: 2263: 2262: 2194: 2181: 2174: 2146: 2139: 2120: 2118: 2115: 2114: 2113: 2108: 2103: 2098: 2092: 2087: 2082: 2075: 2072: 2071: 2070: 2069: 2068: 2062: 2057: 2046: 2041: 2036: 2031: 2026: 2024:Shape analysis 2021: 2016: 2014:Alias analysis 2009: 2006: 1978: 1975: 1938: 1935: 1879: 1876: 1875: 1874: 1862: 1858: 1857: 1854: 1847: 1846: 1843: 1835: 1834: 1827:microcomputers 1810: 1806: 1805: 1706: 1619: 1616: 1614:Jump threading 1610: 1609: 1606:function calls 1590: 1579: 1578: 1570: 1562: 1561: 1554: 1548: 1547: 1540: 1536: 1535: 1532: 1528: 1527: 1511: 1504: 1501: 1500: 1499: 1493: 1492: 1489: 1486:data structure 1478: 1477: 1470:Tail-recursive 1466: 1444: 1441: 1440: 1439: 1432: 1428: 1427: 1423: 1417: 1416: 1412: 1411:Code factoring 1408: 1407: 1404: 1398: 1397: 1393: 1387: 1386: 1367: 1361: 1360: 1352: 1345: 1342: 1341: 1340: 1325: 1319: 1318: 1307: 1293: 1290: 1289: 1288: 1285: 1278: 1277: 1270: 1264: 1263: 1256: 1250: 1249: 1238: 1232: 1231: 1216: 1195: 1192: 1182: 1179: 1178: 1177: 1174: 1168: 1167: 1163: 1157: 1156: 1153: 1147: 1146: 1136: 1134:Loop splitting 1130: 1129: 1125: 1123:Loop unrolling 1119: 1118: 1110: 1104: 1103: 1099: 1093: 1092: 1089:loop inversion 1084: 1078: 1077: 1074: 1068: 1067: 1056:pipeline stall 1035: 1033:Loop inversion 1029: 1028: 1024: 1017: 1016: 1009: 1002: 1001: 978: 951:Main article: 948: 945: 943: 940: 939: 938: 931: 928: 913: 910: 907: 904: 901: 898: 891: 888: 881: 878: 863: 857:, also called 851: 844: 841: 838: 835: 822: 814: 811: 810: 809: 800: 799: 783: 782: 778: 777: 772: 771: 764: 763: 759: 758: 754: 753: 749:loop unrolling 737: 736: 732: 731: 715: 714: 695: 694: 679: 678: 662: 661: 653: 652: 641: 640:Target machine 635: 634: 589: 587: 580: 574: 571: 534: 531: 519:alias analysis 501: 498: 485: 482: 470: 467: 454: 451: 430: 427: 408: 405: 400: 399:Categorization 397: 339: 338: 336: 335: 328: 321: 313: 310: 309: 308: 307: 302: 293: 284: 283: 279: 278: 277: 276: 270: 261: 251: 246: 240: 230: 221: 212: 206: 197: 196: 192: 191: 190: 189: 184: 179: 177:Precompilation 174: 173: 172: 167: 156: 147: 146: 142: 141: 140: 139: 134: 129: 124: 119: 111: 110: 106: 105: 104: 103: 102: 101: 96: 91: 86: 85: 84: 77:Runtime system 69: 63: 62: 61: 56: 55: 54: 39: 31: 30: 26: 25: 15: 9: 6: 4: 3: 2: 3123: 3112: 3109: 3107: 3104: 3102: 3099: 3097: 3094: 3093: 3091: 3076: 3073: 3071: 3068: 3066: 3063: 3061: 3058: 3056: 3053: 3051: 3048: 3046: 3043: 3041: 3038: 3036: 3033: 3032: 3030: 3026: 3020: 3017: 3015: 3012: 3010: 3007: 3005: 3002: 3000: 2997: 2995: 2992: 2990: 2987: 2985: 2982: 2981: 2979: 2975: 2969: 2966: 2965: 2963: 2959: 2953: 2950: 2948: 2947:Deforestation 2945: 2944: 2942: 2938: 2932: 2929: 2927: 2924: 2922: 2919: 2917: 2914: 2913: 2911: 2909: 2905: 2899: 2896: 2894: 2891: 2890: 2888: 2885: 2881: 2875: 2872: 2870: 2867: 2865: 2862: 2859: 2856: 2854: 2851: 2849: 2846: 2844: 2841: 2840: 2838: 2836: 2830: 2824: 2821: 2819: 2816: 2814: 2811: 2809: 2806: 2804: 2801: 2799: 2796: 2794: 2791: 2789: 2786: 2784: 2781: 2779: 2776: 2774: 2771: 2769: 2766: 2764: 2761: 2760: 2758: 2756: 2752: 2746: 2743: 2741: 2738: 2737: 2735: 2731: 2727: 2720: 2715: 2713: 2708: 2706: 2701: 2700: 2697: 2690: 2686: 2683: 2682: 2671: 2665: 2657: 2653: 2649: 2645: 2641: 2637: 2630: 2622: 2618: 2614: 2612:9781450343909 2608: 2604: 2600: 2596: 2592: 2585: 2578: 2572: 2565: 2560: 2552: 2545: 2538: 2532: 2530: 2522: 2516: 2507: 2502: 2495: 2487: 2483: 2479: 2475: 2469: 2461: 2457: 2452: 2447: 2443: 2442:MICRO SPITBOL 2439: 2435: 2429: 2420: 2418: 2410: 2406: 2400: 2393: 2389: 2383: 2376: 2372: 2366: 2362: 2357: 2356: 2347: 2340: 2334: 2326: 2322: 2316: 2309: 2303: 2296: 2290: 2288: 2279: 2275: 2269: 2259: 2258: 2252: 2251: 2237: 2233: 2229: 2224: 2219: 2212: 2211: 2203: 2201: 2199: 2191: 2185: 2177: 2171: 2167: 2163: 2159: 2153: 2151: 2142: 2140:0-201-10088-6 2136: 2132: 2125: 2121: 2112: 2109: 2107: 2104: 2102: 2099: 2096: 2093: 2091: 2088: 2086: 2083: 2081: 2078: 2077: 2066: 2063: 2061: 2058: 2055: 2052: 2051: 2050: 2047: 2045: 2042: 2040: 2037: 2035: 2032: 2030: 2027: 2025: 2022: 2020: 2017: 2015: 2012: 2011: 2005: 2003: 1999: 1995: 1991: 1990: 1985: 1974: 1970: 1968: 1964: 1960: 1956: 1951: 1945: 1934: 1930: 1928: 1924: 1920: 1916: 1912: 1908: 1903: 1901: 1897: 1893: 1889: 1884: 1872: 1868: 1863: 1860: 1859: 1855: 1852: 1849: 1848: 1844: 1841: 1838:Reduction of 1837: 1836: 1832: 1828: 1824: 1823:Macro Spitbol 1820: 1816: 1811: 1808: 1807: 1707: 1620: 1617: 1615: 1612: 1611: 1607: 1603: 1599: 1595: 1591: 1588: 1584: 1581: 1580: 1576: 1571: 1568: 1564: 1563: 1559: 1555: 1553: 1550: 1549: 1545: 1541: 1538: 1537: 1533: 1530: 1529: 1524: 1520: 1516: 1512: 1510: 1507: 1506: 1498: 1495: 1494: 1490: 1487: 1483: 1482:Deforestation 1480: 1479: 1475: 1471: 1467: 1465: 1462: 1461: 1460: 1458: 1454: 1450: 1437: 1433: 1430: 1429: 1424: 1422: 1419: 1418: 1413: 1410: 1409: 1405: 1403: 1400: 1399: 1394: 1392: 1389: 1388: 1384: 1380: 1376: 1372: 1368: 1366: 1363: 1362: 1358: 1353: 1351: 1348: 1347: 1338: 1334: 1330: 1326: 1324: 1321: 1320: 1316: 1312: 1308: 1306: 1303: 1302: 1301: 1299: 1286: 1283: 1280: 1279: 1275: 1271: 1269: 1266: 1265: 1261: 1257: 1255: 1252: 1251: 1239: 1237: 1234: 1233: 1217: 1215: 1212: 1211: 1210: 1208: 1204: 1200: 1191: 1188: 1175: 1173: 1170: 1169: 1164: 1162: 1159: 1158: 1154: 1152: 1149: 1148: 1144: 1143: 1137: 1135: 1132: 1131: 1126: 1124: 1121: 1120: 1115: 1111: 1109: 1108:Loop reversal 1106: 1105: 1100: 1098: 1095: 1094: 1090: 1085: 1083: 1080: 1079: 1075: 1073: 1070: 1069: 1065: 1061: 1057: 1052: 1048: 1044: 1040: 1036: 1034: 1031: 1030: 1025: 1022: 1019: 1018: 1014: 1010: 1007: 1004: 1003: 999: 995: 991: 987: 979: 977: 974: 973: 972: 969: 967: 963: 959: 954: 936: 932: 929: 926: 922: 918: 914: 911: 908: 905: 902: 899: 896: 892: 889: 886: 882: 879: 876: 872: 868: 864: 862: 861: 856: 852: 849: 845: 842: 839: 836: 833: 829: 828: 823: 820: 819: 818: 813:Common themes 806: 802: 801: 797: 793: 789: 785: 784: 780: 779: 774: 773: 769: 766: 765: 761: 760: 756: 755: 750: 746: 742: 739: 738: 734: 733: 728: 724: 720: 717: 716: 712: 708: 704: 703:assembly line 700: 697: 696: 692: 688: 684: 681: 680: 676: 672: 668: 664: 663: 659: 655: 654: 650: 646: 642: 639: 638: 631: 628: 620: 610: 606: 602: 596: 595: 590:This section 588: 579: 578: 570: 568: 563: 560: 556: 552: 548: 544: 539: 530: 528: 524: 520: 516: 512: 507: 497: 495: 491: 481: 479: 475: 466: 464: 459: 450: 448: 444: 443:left-shifting 439: 435: 426: 424: 419: 417: 412: 404: 396: 393: 388: 386: 382: 378: 374: 370: 366: 362: 358: 354: 350: 346: 334: 329: 327: 322: 320: 315: 314: 312: 311: 306: 303: 301: 297: 294: 291: 288: 287: 286: 285: 281: 280: 274: 271: 269: 265: 262: 259: 255: 252: 250: 247: 244: 241: 238: 234: 231: 229: 225: 222: 220: 216: 213: 210: 207: 204: 201: 200: 199: 198: 194: 193: 188: 187:Recompilation 185: 183: 180: 178: 175: 171: 168: 166: 163: 162: 160: 157: 154: 153:Ahead-of-time 151: 150: 149: 148: 144: 143: 138: 135: 133: 130: 128: 125: 123: 120: 118: 115: 114: 113: 112: 109:Types of code 108: 107: 100: 97: 95: 92: 90: 87: 83: 80: 79: 78: 75: 74: 73: 70: 67: 64: 60: 57: 53: 50: 49: 48: 45: 44: 43: 40: 38: 35: 34: 33: 32: 28: 27: 24: 21: 20: 2725: 2669: 2664: 2642:(8): 39–42. 2639: 2635: 2629: 2594: 2584: 2576: 2571: 2559: 2544: 2536: 2520: 2515: 2494: 2485: 2481: 2468: 2441: 2428: 2408: 2399: 2391: 2382: 2374: 2354: 2346: 2338: 2333: 2315: 2307: 2302: 2294: 2268: 2256: 2243:. Retrieved 2209: 2189: 2184: 2161: 2130: 2124: 1987: 1980: 1971: 1963:machine code 1952: 1940: 1931: 1904: 1881: 1815:machine code 1446: 1383:68000 family 1295: 1259: 1197: 1184: 1142:loop peeling 1140: 1114:dependencies 1102:interchange. 1063: 1050: 1047:repeat/until 1046: 1042: 1041:loop into a 1038: 1006:Loop fission 970: 961: 957: 956: 875:basic blocks 858: 854: 831: 825: 816: 762:Intended use 710: 660:architecture 623: 614: 591: 557:such as the 540: 536: 527:side effects 503: 487: 472: 456: 441:executed by 438:machine code 432: 422: 420: 413: 410: 402: 391: 389: 364: 344: 342: 159:Just-in-time 132:Machine code 58: 52:Compile time 2860:elimination 2778:Loop fusion 2733:Basic block 2278:GNU Project 1831:NP-complete 1819:byte stream 1421:Trampolines 1311:value graph 1284:elimination 1062:-free, the 1060:side-effect 1021:Loop fusion 900:Parallelize 871:binary file 617:August 2020 416:basic block 385:undecidable 381:NP-complete 273:Zend Engine 254:Objective-C 122:Object code 117:Source code 94:Interpreter 42:Translation 3090:Categories 2940:Functional 2858:Dead store 2553:. Red Hat. 2488:: 485–495. 2117:References 1900:call graph 1842:collisions 1598:statements 1523:bottleneck 1517:, enforce 1426:factoring. 1396:semantics. 1282:Dead-store 665:Number of 601:improve it 565:causing a 383:, or even 377:programmer 89:Executable 2833:Data-flow 2689:Agner Fog 2670:Compilers 2579:, p. 736. 2577:Compilers 2539:, p. 737. 2537:Compilers 2521:Compilers 2506:1212.1651 2451:1308.6096 2339:Compilers 2325:Microsoft 2308:Compilers 2297:, p. 596. 2295:Compilers 2223:1308.4815 2192:, p. 554. 2190:Compilers 1915:Microsoft 1594:procedure 1589:expansion 1558:dead code 1526:variable. 1474:iteration 1434:Based on 1199:Data-flow 990:dead code 877:into one. 843:Less code 832:slow path 827:fast path 798:are used. 768:Debugging 741:CPU cache 699:Pipelines 667:registers 605:verifying 298:and  266:and  256:and  226:and  137:Microcode 72:Execution 2835:analysis 2523:, p. 15. 2236:Archived 2074:See also 2067:analysis 2056:analysis 1950:switch. 1888:inlining 1451:such as 1274:pointers 1043:do/while 921:overhead 880:Locality 788:parallel 711:schedule 567:pipeline 543:register 463:inlining 349:compiler 211:(Erlang) 127:Bytecode 47:Compiler 2656:2224606 2621:8339241 2456:Bibcode 2228:Bibcode 1977:History 1488:fusion) 1228:(a + b) 1224:(a + b) 1187:threads 1166:values. 656:Target 599:Please 392:optimal 353:program 268:Node.js 224:CPython 82:Runtime 2961:Global 2886:-based 2654:  2619:  2609:  2367:  2245:22 Aug 2172:  2137:  2000:, and 1927:Open64 1917:, and 1621:E.g., 1577:(PRE). 1544:blocks 1331:, and 494:linked 373:memory 357:memory 249:LuaJIT 161:(JIT) 2977:Other 2652:S2CID 2617:S2CID 2501:arXiv 2446:arXiv 2239:(PDF) 2218:arXiv 2214:(PDF) 2097:(JIT) 1984:BLISS 1944:Clang 1911:Intel 1851:Stack 1840:cache 1587:macro 1242:3 + 5 1039:while 1027:data. 675:stack 649:Clang 559:Intel 361:power 347:is a 300:Clang 292:(GCC) 275:(PHP) 258:Swift 245:(JVM) 205:(ART) 155:(AOT) 2755:Loop 2607:ISBN 2365:ISBN 2247:2013 2170:ISBN 2135:ISBN 1789:else 1708:and 1515:Java 1455:and 1453:Lisp 1371:CISC 996:and 790:and 747:and 727:FPUs 725:and 723:ALUs 687:CISC 685:vs. 683:RISC 647:and 551:RISC 523:PL/I 513:and 371:and 305:MSVC 296:LLVM 233:crt0 228:PyPy 219:Mono 209:BEAM 68:(IR) 37:Code 2884:SSA 2687:by 2644:doi 2599:doi 2407:", 2390:." 2363:–. 2361:329 1961:or 1948:-O2 1923:GCC 1907:SGI 1825:on 1795:bar 1780:foo 1762:to 1753:bar 1726:foo 1696:bar 1690:foo 1672:to 1663:bar 1639:foo 1600:of 1585:or 962:for 925:JIT 658:CPU 645:GCC 603:by 562:x86 547:XOR 515:C++ 509:in 449:). 369:CPU 343:An 3092:: 2650:. 2640:28 2638:. 2615:. 2605:. 2593:. 2528:^ 2486:29 2484:. 2476:; 2454:. 2436:; 2416:^ 2373:. 2323:. 2286:^ 2276:. 2234:. 2226:. 2197:^ 2164:. 2149:^ 1996:, 1929:. 1913:, 1909:, 1902:. 1765:if 1735:if 1711:if 1675:if 1648:if 1624:if 1459:. 1457:ML 1064:if 1051:if 693:). 480:. 387:. 264:V8 260:'s 2718:e 2711:t 2704:v 2658:. 2646:: 2623:. 2601:: 2509:. 2503:: 2462:. 2458:: 2448:: 2327:. 2280:. 2249:. 2230:: 2220:: 2178:. 2143:. 1804:. 1801:} 1798:; 1792:{ 1786:} 1783:; 1777:{ 1774:) 1771:c 1768:( 1759:} 1756:; 1750:{ 1747:) 1744:c 1741:! 1738:( 1732:} 1729:; 1723:{ 1720:) 1717:c 1714:( 1705:, 1702:} 1699:; 1693:; 1687:{ 1684:) 1681:c 1678:( 1669:} 1666:; 1660:{ 1657:) 1654:c 1651:( 1645:} 1642:; 1636:{ 1633:) 1630:c 1627:( 1569:) 1484:( 1262:. 1246:8 887:. 630:) 624:( 619:) 615:( 597:. 511:C 332:e 325:t 318:v 237:C 235:(

Index

Program execution
Code
Translation
Compiler
Compile time
Optimizing compiler
Intermediate representation
Execution
Runtime system
Runtime
Executable
Interpreter
Virtual machine
Source code
Object code
Bytecode
Machine code
Microcode
Ahead-of-time
Just-in-time
Tracing just-in-time
Compile and go system
Precompilation
Transcompilation
Recompilation
Android Runtime
BEAM
Common Language Runtime
Mono
CPython

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.