Knowledge

bzip2

Source 📝

766:.magic:16 = 'BZ' signature/magic number .version:8 = 'h' for Bzip2 ('H'uffman coding), '0' for Bzip1 (deprecated) .hundred_k_blocksize:8 = '1'..'9' block-size 100 kB-900 kB (uncompressed) .compressed_magic:48 = 0x314159265359 (BCD (pi)) .crc:32 = checksum for this block .randomised:1 = 0=>normal, 1=>randomised (deprecated) .origPtr:24 = starting pointer into BWT for after untransform .huffman_used_map:16 = bitmap, of ranges of 16 bytes, present/not present .huffman_used_bitmaps:0..256 = bitmap, of symbols used, present/not present (multiples of 16) .huffman_groups:3 = 2..6 number of different Huffman tables in use .selectors_used:15 = number of times that the Huffman tables are swapped (each 50 symbols) *.selector_list:1..6 = zero-terminated bit runs (0..62) of MTF'ed Huffman table (*selectors_used) .start_huffman_length:5 = 0..20 starting bit length for Huffman deltas *.delta_bit_length:1..40 = 0=>next symbol; 1=>alter length { 1=>decrement length; 0=>increment length } (*(symbols+2)*groups) .contents:2..∞ = Huffman encoded data stream until end of block (max. 7372800 bit) .eos_magic:48 = 0x177245385090 (BCD sqrt(pi)) .crc:32 = checksum for whole stream .padding:0..7 = align to whole byte 730:. Each bit length is stored as an encoded difference against the previous-code bit length. A zero bit (0) means that the previous bit length should be duplicated for the current code, whilst a one bit (1) means that a further bit should be read and the bit length incremented or decremented based on that value. In the common case a single bit is used per symbol per table and the worst case—going from length 1 to length 20—would require approximately 37 bits. As a result of the earlier MTF encoding, code lengths would start at 2–3 bits long (very frequently used codes) and gradually increase, meaning that the delta format is fairly efficient, requiring around 300 bits (38 bytes) per full Huffman table. 906: 379: 3236: 1754: 3226: 428: 1744: 1734: 875:, archiving could be done by a separate program producing an archive which is then compressed with bzip2, and un-archiving could be done by bzip2 uncompressing the compressed archive file and a separate program decompressing it. Some archivers have built-in support for compression and decompression, so that it is not necessary to use the bzip2 program to compress or decompress the archive. 339:. bzip2 compresses data in blocks between 100 and 900 kB and uses the Burrows–Wheeler transform to convert frequently recurring character sequences into strings of identical letters. The move-to-front transform and Huffman coding are then applied. The compression performance is asymmetric, with decompression being faster than compression. 689:− 1 symbol codes and one end-of-stream code. Because of the combined result of the MTF and RLE encodings in the previous two steps, there is never any need to explicitly reference the first symbol in the MTF table (would be zero in the ordinary MTF), thus saving one symbol for the end-of-stream marker (and explaining why only 693:− 1 symbols are coded in the Huffman tree). In the extreme case where only one symbol is used in the uncompressed data, there will be no symbol codes at all in the Huffman tree, and the entire block will consist of RUNA and RUNB (implicitly repeating the single byte) and an end-of-stream marker with value 2. 742:
method is used: the 256 symbols are divided up into 16 ranges, and only if symbols are used within that block is a 16-bit array included. The presence of each of these 16 ranges is indicated by an additional 16-bit bit array at the front. The total bitmap uses between 32 and 272 bits of storage (4–34
722:
list of the tables. Using this feature results in a maximal expansion of around 1.015, but generally less. This expansion is likely to be greatly over-shadowed by the advantage of selecting more appropriate Huffman tables, and the common-case of continuing to use the same Huffman table is represented
710:
Several identically sized Huffman tables can be used with a block if the gain from using them is greater than the cost of including the extra table. At least 2 and up to 6 tables can be present, with the most appropriate table being reselected before every 50 symbols processed. This has the advantage
663:). The run-length is encoded in this fashion: assigning place values of 1 to the first bit, 2 to the second, 4 to the third, etc. in the sequence, multiply each place value in a RUNB spot by 2, and add all the resulting place values (for RUNA and RUNB values alike) together. This is similar to base-2 626:
The move-to-front transform again does not alter the size of the processed block. Each of the symbols in use in the document is placed in an array. When a symbol is processed, it is replaced by its location (index) in the array and that symbol is shuffled to the front of the array. The effect is that
677:
This process replaces fixed-length symbols in the range 0–258 with variable-length codes based on the frequency of use. More frequently used codes end up shorter (2–3 bits), whilst rare codes can be allocated up to 20 bits. The codes are selected carefully so that no sequence of bits can be confused
638:
Long strings of zeros in the output of the move-to-front transform (which come from repeated symbols in the output of the BWT) are replaced by a sequence of two special codes, RUNA and RUNB, which represent the run-length as a binary number. Actual zeros are never encoded in the output; a lone zero
622:
for when the block is untransformed. In practice, it is not necessary to construct the full matrix; rather, the sort is performed using pointers for each position in the buffer. The output buffer is the last column of the matrix; this contains the whole buffer, but reordered so that it is likely to
654:
represents the value 5 as described below. The run-length code is terminated by reaching another normal symbol. This RLE process is more flexible than the initial RLE step, as it is able to encode arbitrarily long integers (in practice, this is usually limited by the block size, so that this step
634:
Much "natural" data contains identical symbols that recur within a limited range (text is a good example). As the MTF transform assigns low values to symbols that reappear frequently, this results in a data stream containing many symbols in the low integer range, many of them being identical
409:
Seward made the first public release of bzip2, version 0.15, in July 1996. The compressor's stability and popularity grew over the next several years, and Seward released version 1.0 in late 2000. Following a nine-year hiatus of updates for the project since 2010, on 4 June 2019
769:
Because of the first-stage RLE compression (see above), the maximum length of plaintext that a single 900 kB bzip2 block can contain is around 46 MB (45,899,236 bytes). This can occur if the whole plaintext consists entirely of repeated values (the resulting
733:
A bitmap is used to show which symbols are used inside the block and should be included in the Huffman trees. Binary data is likely to use all 256 symbols representable by a byte, whereas textual data may only use a small subset of available values, perhaps covering the
762:
stream consists of a 4-byte header, followed by zero or more compressed blocks, immediately followed by an end-of-stream marker containing a 32-bit CRC for the plaintext whole stream processed. The compressed blocks are bit-aligned and no padding occurs.
598:
In the worst case, it can cause an expansion of 1.25, and in the best case, a reduction to <0.02. While the specification theoretically allows for runs of length 256–259 to be encoded, the reference encoder will not produce such output.
605:
The Burrows–Wheeler transform is the reversible block-sort that is at the core of bzip2. The block is entirely self-contained, with input and output buffers remaining of the same size—in bzip2, the operating limit for this stage is
870:
or ZIP; the bzip2 file format does not support storing the contents of multiple files in a single compressed file, and the program itself has no facilities for multiple files, encryption or archive-splitting. In the
342:
The algorithm has gone through multiple maintainers since its initial release, with Micah Snyder being the maintainer since June 2021. There have been some modifications to the algorithm, such as pbzip2, which uses
859:
to encode the file in multiple chunks, giving almost linear speedup on multi-CPU and multi-core computers. As of May 2010, this functionality has not been incorporated into the main project.
774:
file in this case is 46 bytes long). An even smaller file of 40 bytes can be achieved by using an input containing entirely values of 251, an apparent compression ratio of 1147480.9:1.
718:
If multiple Huffman tables are in use, the selection of each table (numbered 0 to 5) is done from a list by a zero-terminated bit run between 1 and 6 bits in length. The selection is into a
595:
represent byte values 3 and 0 respectively. Runs of symbols are always transformed after 4 consecutive symbols, even if the run-length is set to zero, to keep the transformation reversible.
777:
A compressed block in bzip2 can be decompressed without having to process earlier blocks. This means that bzip2 files can be decompressed in parallel, making it a good format for use in
855:
bzip2 performance is asymmetric, as decompression is relatively fast. Motivated by the long time required for compression, a modified version was created in 2003 called pbzip2 that used
524:
bzip2 uses several layers of compression techniques stacked on top of each other, which occur in the following order during compression and the reverse order during decompression:
36: 2729: 639:
becomes RUNA. (This step in fact is done at the same time as MTF is; whenever MTF would produce zero, it instead increases a counter to then encode with RUNA and RUNB.)
323:
but is slower. bzip2 is particularly efficient for text data, and decompression is relatively fast. The algorithm uses several layers of compression techniques, such as
715:. Run-length encoding in the previous step is designed to take care of codes that have an inverse probability of use higher than the shortest code Huffman code in use. 3285: 602:
The author of bzip2 has stated that the RLE step was a historical mistake and was only intended to protect the original BWT implementation from pathological cases.
2808: 723:
as a single bit. Rather than unary encoding, effectively this is an extreme form of a Huffman tree, where each code has half the probability of the previous code.
1127: 635:(different recurring input symbols can actually map to the same output symbol). Such data can be very efficiently encoded by any legacy compression method. 3027: 2649: 2866: 747:
algorithm would show the absence of symbols by encoding the symbols as having a zero bit length with run-length encoding and additional Huffman coding.
2798: 579:
Any sequence of 4 to 255 consecutive duplicate symbols is replaced by the first 4 symbols and a repeat length between 0 and 251. Thus the sequence
2818: 2704: 1790: 2803: 2786: 755:
No formal specification for bzip2 exists, although an informal specification has been reverse engineered from the reference implementation.
618:-th symbol. Following rotation, the rows of the matrix are sorted into alphabetic (numerical) order. A 24-bit pointer is stored marking the 2756: 2654: 852:
restriction. bzip3, a modern compressor that shares common ancestry and set of algorithms with bzip2, switched back to arithmetic coding.
2932: 2766: 2739: 2719: 2744: 2694: 2664: 1414: 1320: 818: 3280: 2841: 890: 3275: 821:
is generally more space-efficient than bzip2 at the expense of even slower compression speed, while having faster decompression.
685:
different bytes (symbols) used in the uncompressed data, then the Huffman code will consist of two RLE codes (RUNA and RUNB),
2813: 2771: 2714: 958: 3229: 3177: 3107: 2791: 911: 2751: 2644: 1253: 631:
arbitrary symbol thus become runs of zero symbols), while other symbols are remapped according to their local frequency.
305:. It relies on separate external utilities for tasks such as handling multiple files, encryption, and archive-splitting. 3132: 2984: 2136: 1783: 993: 3047: 2957: 2952: 711:
of having very responsive Huffman dynamics without having to continuously supply new tables, as would be required in
511: 492: 3270: 2776: 2684: 2494: 141: 464: 3042: 2874: 2781: 1203: 1131: 2621: 1651: 1282: 449: 3137: 3032: 2856: 2846: 2630: 1776: 1737: 1313: 1153: 919: 829: 534: 471: 328: 298: 3203: 3057: 2851: 2588: 2584: 2430: 1817: 1808: 1465: 924: 117: 3165: 3072: 2907: 1419: 894:
tool allows directly searching through compressed text without needing to uncompress the contents first.
91: 3062: 2947: 2724: 2479: 2314: 2252: 2177: 1714: 1104: 832:
to convert frequently-recurring character sequences into strings of identical letters. It then applies
478: 3197: 3112: 3097: 1688: 1076: 856: 344: 3265: 3192: 2989: 2942: 2927: 2879: 2689: 2561: 1757: 1306: 798: 738:
range between 32 and 126. Storing 256 zero bits would be inefficient if they were mostly unused. A
460: 1747: 929: 833: 719: 438: 332: 228: 1846: 1292: 3260: 3239: 3127: 2974: 2937: 2836: 2110: 934: 727: 445: 414:
accepted maintainership of the bzip2 project. Since June 2021, the maintainer is Micah Snyder.
320: 3102: 2897: 2889: 2828: 2761: 2524: 2499: 2377: 2272: 2234: 2155: 1856: 1485: 390: 3187: 3122: 3117: 2679: 2538: 2244: 2090: 1733: 664: 288: 8: 3067: 2669: 2306: 2214: 2128: 2060: 1963: 1678: 1500: 1058: 528: 324: 294: 1033: 674:
RUNA RUNB RUNA RUNA RUNB (ABAAB) 1 2 4 8 16 1 4 4 8 32 = 49
2614: 2160: 1373: 1353: 565: 188: 366:, as a compressed block can be decompressed without having to process earlier blocks. 238: 3225: 3147: 2994: 2674: 2325: 2267: 2147: 1971: 1930: 1925: 1683: 1663: 1656: 1646: 1641: 1631: 1626: 1616: 1611: 1588: 1583: 1548: 1543: 1538: 1523: 1513: 1490: 1470: 1363: 1248: 876: 867: 845: 810: 355: 1099: 2912: 1936: 1799: 1508: 1480: 485: 271: 263: 153: 146: 129: 975: 3182: 2282: 2080: 1981: 1673: 1358: 1286: 872: 849: 2599: 2979: 2902: 2504: 2165: 1942: 1886: 1841: 1603: 1578: 1329: 1263: 837: 549: 336: 134: 1259: 1100:"[HADOOP-4012] Providing splitting support for bzip2 compressed files" 1034:"Bzip2's experimental repository is changing maintainership - Federico's Blog" 3254: 3210: 3171: 3077: 2734: 2659: 2607: 2334: 2006: 1876: 1828: 1668: 540: 411: 309: 302: 291: 253: 61: 48: 1768: 3052: 3009: 2922: 2405: 2352: 1866: 1518: 1434: 1368: 1170:- section "How does it relate to your previous offering (bzip-0.21) ?" 786: 363: 3142: 2969: 2964: 1495: 1181: 627:
immediately recurring symbols are replaced by zero symbols (long runs of
20: 378: 2489: 2474: 2262: 2204: 1704: 1273: 452: in this section. Unsourced material may be challenged and removed. 202: 55: 43: 726:
Huffman-code bit lengths are required to reconstruct each of the used
2999: 2415: 2277: 1996: 1991: 1953: 1892: 1444: 1343: 1221: 1001: 671:
results in the value (1 + 2 × 2) = 5. As a more complicated example:
572: 216: 1279: 1015: 427: 347:
to improve compression speed on multi-CPU and multi-core computers.
3037: 2917: 2556: 2459: 2410: 2390: 2224: 2095: 1912: 1861: 1439: 1399: 1267: 879:
also has built-in support for bzip2 compression and decompression.
825: 802: 778: 351: 1298: 1242: 1157: 122: 2551: 2464: 2385: 2287: 2199: 2011: 1851: 1563: 1558: 806: 744: 712: 681:
The end-of-stream code is particularly interesting. If there are
610:
For the block-sort, a (notional) matrix is created, in which row
317: 2519: 2514: 2400: 2357: 2229: 2026: 2021: 2016: 1948: 1920: 1881: 1409: 1389: 1276:
for different kinds of parallel bzip2 implementations available
1084: 782: 559: 359: 167: 866:, bzip2 is only a data compressor. It is not an archiver like 3004: 2484: 2469: 2219: 2187: 2001: 1976: 1897: 1836: 1528: 735: 706:
258: end of stream, finish processing (could be as low as 2).
35: 824:
bzip2 compresses data in blocks of size between 100 and 900
797:
bzip2 compresses most files more effectively than the older
614:
contains the whole of the buffer, rotated to start from the
2566: 2546: 2443: 2438: 2420: 2395: 2367: 2362: 2347: 2342: 2292: 2257: 2209: 2192: 2170: 2100: 2085: 2075: 2070: 2065: 1902: 1709: 1636: 1621: 1593: 1553: 1429: 1424: 1404: 1348: 939: 884: 863: 814: 2509: 2182: 2116: 1986: 1871: 1719: 1573: 1533: 1475: 313: 1460: 998:
Apple Developer Documentation: Uniform Type Identifiers
312:. It compresses most files more effectively than older 1270:, the standard bzip2 is available at the command line) 817:) compression algorithms, but is considerably slower. 1196: 848:
instead of Huffman. The change was made because of a
2650:
Comparison of open-source and closed-source software
901: 781:
applications with cluster computing frameworks like
19:"Bzip" redirects here. For the protein domain, see 3018: 2629: 3286:Unix archivers and compression-related utilities 3252: 301:. It only compresses single files and is not a 2615: 1798: 1784: 1314: 2655:Comparison of source-code-hosting facilities 967: 1179: 2622: 2608: 1791: 1777: 1743: 1321: 1307: 1245:- by The Linux Information Project (LINFO) 555:Selection between multiple Huffman tables. 623:contain large runs of identical symbols. 512:Learn how and when to remove this message 1120: 546:Run-length encoding (RLE) of MTF result. 308:bzip2 was initially released in 1996 by 1222:"bzgrep command in Linux with examples" 3253: 2603: 1772: 1302: 3178:Microsoft Open Specification Promise 912:Free and open-source software portal 562:encoding of Huffman table selection. 450:adding citations to reliable sources 421: 373: 2645:Alternative terms for free software 1698:Document packaging and distribution 1604:Software packaging and distribution 1328: 1059:"bzip2 and libbzip2, version 1.0.8" 655:does not encode a run of more than 13: 2985:Python Software Foundation License 973: 229:Uniform Type Identifier (UTI) 14: 3297: 3048:Definition of Free Cultural Works 2665:Free software project directories 1289:at The Data Compression News Blog 1274:Feature comparison and benchmarks 1236: 417: 3235: 3234: 3224: 2685:Open-source software development 1753: 1752: 1742: 1732: 1280:4 Parallel bzip2 Implementations 904: 568:(Δ) of Huffman-code bit lengths. 426: 377: 34: 3281:Lossless compression algorithms 3043:Debian Free Software Guidelines 2875:Free Software Movement of India 1214: 1180:Palaiologos (13 October 2022), 1173: 1146: 575:showing which symbols are used. 437:needs additional citations for 123:https://gitlab.com/bzip2/bzip2/ 3276:Free data compression software 1295:- may be restricted by patents 1092: 1069: 1051: 1026: 1008: 986: 952: 750: 1: 3033:Contributor License Agreement 2847:Open-source-software movement 2631:Free and open-source software 945: 920:Comparison of archive formats 792: 350:bzip2 is suitable for use in 3204:The Cathedral and the Bazaar 3058:The Free Software Definition 1293:The original bzip compressor 1077:"BZIP2 Format Specification" 925:Comparison of file archivers 97:1.0.8 / 13 July 2019 7: 3108:Mozilla software rebranding 3073:Permissive software license 1254:Graphical bzip2 for Windows 963:18 July 1996 (version 0.15) 897: 10: 3302: 3113:Proprietary device drivers 3063:The Open Source Definition 1715:Open Packaging Conventions 1208:ww1.compressionratings.com 1105:Apache Software Foundation 743:bytes). For contrast, the 369: 18: 3220: 3198:Source-available software 3156: 3098:Digital rights management 3090: 2888: 2865: 2827: 2703: 2637: 2579: 2537: 2452: 2429: 2376: 2333: 2324: 2305: 2243: 2146: 2127: 2109: 2048: 2039: 1962: 1911: 1827: 1807: 1728: 1697: 1602: 1454:Archiving and compression 1453: 1382: 1336: 1016:"Articles with tag bzip2" 830:Burrows–Wheeler transform 703:2–257: byte values 0–255, 535:Burrows–Wheeler transform 329:Burrows–Wheeler transform 299:Burrows–Wheeler algorithm 269: 259: 249: 237: 227: 215: 201: 187: 162: 152: 140: 128: 116: 112: 90: 86: 68: 54: 42: 33: 16:File compression software 3193:Shared Source Initiative 2990:Shared Source Initiative 2943:Free Software Foundation 2880:Free Software Foundation 2730:Configuration management 1204:"compressionratings.com" 1128:"7-zip vs bzip2 vs gzip" 728:canonical Huffman tables 646:would be represented as 537:(BWT), or block sorting. 203:Internet media type 3271:Cross-platform software 3128:SCO/Linux controversies 1285:18 October 2006 at the 930:List of archive formats 834:move-to-front transform 333:move-to-front transform 3028:Comparison of licenses 2837:Free software movement 1710:OEBPS Container Format 935:List of file archivers 678:for a different code. 531:(RLE) of initial data. 321:compression algorithms 74:; 28 years ago 3103:License proliferation 2525:Windows Media Encoder 2235:Windows Media Encoder 1857:GNOME Archive Manager 1154:"The bzip2 home page" 667:. Thus, the sequence 158:Modified zlib license 99:; 5 years ago 3188:Open-source hardware 3123:Proprietary software 3118:Proprietary firmware 2819:Formerly open-source 2814:Formerly proprietary 2680:Open-source software 1945:(decompression only) 1939:(decompression only) 1933:(decompression only) 1889:(decompression only) 976:"bzip2 and libbzip2" 665:bijective numeration 446:improve this article 289:free and open-source 233:public.bzip2-archive 3068:Open-source license 2670:Gratis versus libre 2589:compression formats 2585:compression methods 2495:Schrödinger (Dirac) 1183:kspalaiologos/bzip3 840:. bzip2's ancestor 529:Run-length encoding 325:run-length encoding 295:compression program 209:application/x-bzip2 184: 30: 2480:Helix DNA Producer 2178:Helix DNA Producer 2161:Fraunhofer FDK AAC 1705:OEB Package Format 758:As an overview, a 389:. You can help by 354:applications with 189:Filename extension 182: 44:Original author(s) 28: 3248: 3247: 3148:Trusted Computing 3138:Software security 3086: 3085: 2767:Operating systems 2675:Long-term support 2597: 2596: 2575: 2574: 2533: 2532: 2301: 2300: 2123: 2122: 2035: 2034: 1766: 1765: 1249:bzip2 for Windows 1243:The bzip2 Command 846:arithmetic coding 620:starting position 583:is replaced with 522: 521: 514: 496: 407: 406: 356:cluster computing 282: 281: 250:Developed by 180: 179: 72:18 July 1996 3293: 3238: 3237: 3228: 3133:Software patents 3016: 3015: 2928:Creative Commons 2787:Web applications 2624: 2617: 2610: 2601: 2600: 2331: 2330: 2322: 2321: 2318: 2144: 2143: 2140: 2046: 2045: 1937:StuffIt Expander 1825: 1824: 1821: 1800:Data compression 1793: 1786: 1779: 1770: 1769: 1756: 1755: 1746: 1745: 1736: 1383:Compression only 1323: 1316: 1309: 1300: 1299: 1230: 1229: 1218: 1212: 1211: 1200: 1194: 1193: 1192: 1190: 1177: 1171: 1169: 1167: 1165: 1156:. Archived from 1150: 1144: 1143: 1141: 1139: 1134:on 24 April 2016 1130:. Archived from 1124: 1118: 1117: 1115: 1113: 1096: 1090: 1089: 1088:. 17 March 2022. 1081: 1073: 1067: 1066: 1055: 1049: 1048: 1046: 1044: 1030: 1024: 1023: 1012: 1006: 1005: 990: 984: 983: 974:Seward, Julian. 971: 965: 956: 914: 909: 908: 907: 893: 887: 773: 761: 670: 662: 660: 653: 649: 645: 644:0, 0, 0, 0, 0, 1 609: 594: 590: 586: 585:AAAA\3BBBB\0CCCD 582: 543:(MTF) transform. 517: 510: 506: 503: 497: 495: 454: 430: 422: 402: 399: 381: 374: 358:frameworks like 275: 264:Data compression 245: 223: 210: 196: 185: 181: 176: 173: 171: 169: 147:Data compression 130:Operating system 107: 105: 100: 82: 80: 75: 38: 31: 27: 3301: 3300: 3296: 3295: 3294: 3292: 3291: 3290: 3266:Archive formats 3251: 3250: 3249: 3244: 3216: 3183:Open-core model 3158: 3152: 3082: 3020: 3014: 2884: 2861: 2823: 2706: 2699: 2633: 2628: 2598: 2593: 2571: 2529: 2448: 2425: 2372: 2312: 2311: 2308: 2297: 2239: 2134: 2133: 2130: 2119: 2105: 2041: 2031: 2002:PKZIP/SecureZIP 1982:Archive Utility 1958: 1907: 1815: 1814: 1812: 1803: 1797: 1767: 1762: 1724: 1693: 1674:Package (macOS) 1598: 1449: 1378: 1332: 1330:Archive formats 1327: 1287:Wayback Machine 1239: 1234: 1233: 1220: 1219: 1215: 1202: 1201: 1197: 1188: 1186: 1178: 1174: 1163: 1161: 1152: 1151: 1147: 1137: 1135: 1126: 1125: 1121: 1111: 1109: 1098: 1097: 1093: 1079: 1075: 1074: 1070: 1057: 1056: 1052: 1042: 1040: 1032: 1031: 1027: 1014: 1013: 1009: 992: 991: 987: 972: 968: 957: 953: 948: 910: 905: 903: 900: 889: 883: 857:multi-threading 850:software patent 795: 771: 767: 759: 753: 675: 668: 658: 656: 651: 647: 643: 607: 592: 588: 584: 581:AAAAAAABBBBCCCD 580: 518: 507: 501: 498: 455: 453: 443: 431: 420: 403: 397: 394: 387:needs expansion 372: 345:multi-threading 270: 243: 221: 211: 208: 197: 194: 166: 108: 103: 101: 98: 78: 76: 73: 69:Initial release 60:Mark Wielaard, 24: 17: 12: 11: 5: 3299: 3289: 3288: 3283: 3278: 3273: 3268: 3263: 3246: 3245: 3243: 3242: 3232: 3221: 3218: 3217: 3215: 3214: 3207: 3200: 3195: 3190: 3185: 3180: 3175: 3168: 3162: 3160: 3154: 3153: 3151: 3150: 3145: 3140: 3135: 3130: 3125: 3120: 3115: 3110: 3105: 3100: 3094: 3092: 3088: 3087: 3084: 3083: 3081: 3080: 3075: 3070: 3065: 3060: 3055: 3050: 3045: 3040: 3035: 3030: 3024: 3022: 3013: 3012: 3007: 3002: 2997: 2992: 2987: 2982: 2977: 2972: 2967: 2962: 2961: 2960: 2955: 2950: 2940: 2935: 2930: 2925: 2920: 2915: 2910: 2905: 2900: 2894: 2892: 2886: 2885: 2883: 2882: 2877: 2871: 2869: 2863: 2862: 2860: 2859: 2854: 2849: 2844: 2839: 2833: 2831: 2825: 2824: 2822: 2821: 2816: 2811: 2806: 2801: 2796: 2795: 2794: 2784: 2779: 2774: 2769: 2764: 2759: 2754: 2749: 2748: 2747: 2742: 2732: 2727: 2722: 2720:Bioinformatics 2717: 2711: 2709: 2701: 2700: 2698: 2697: 2692: 2687: 2682: 2677: 2672: 2667: 2662: 2657: 2652: 2647: 2641: 2639: 2635: 2634: 2627: 2626: 2619: 2612: 2604: 2595: 2594: 2592: 2591: 2580: 2577: 2576: 2573: 2572: 2570: 2569: 2564: 2559: 2554: 2549: 2543: 2541: 2535: 2534: 2531: 2530: 2528: 2527: 2522: 2517: 2512: 2507: 2502: 2497: 2492: 2487: 2482: 2477: 2472: 2467: 2462: 2456: 2454: 2450: 2449: 2447: 2446: 2441: 2435: 2433: 2427: 2426: 2424: 2423: 2418: 2413: 2408: 2403: 2398: 2393: 2388: 2382: 2380: 2374: 2373: 2371: 2370: 2365: 2360: 2355: 2350: 2345: 2339: 2337: 2328: 2319: 2303: 2302: 2299: 2298: 2296: 2295: 2290: 2285: 2280: 2275: 2270: 2268:Monkey's Audio 2265: 2260: 2255: 2249: 2247: 2241: 2240: 2238: 2237: 2232: 2227: 2222: 2217: 2212: 2207: 2202: 2197: 2196: 2195: 2190: 2180: 2175: 2174: 2173: 2168: 2166:Nero AAC Codec 2163: 2152: 2150: 2141: 2125: 2124: 2121: 2120: 2115: 2113: 2107: 2106: 2104: 2103: 2098: 2093: 2088: 2083: 2078: 2073: 2068: 2063: 2058: 2052: 2050: 2043: 2037: 2036: 2033: 2032: 2030: 2029: 2024: 2019: 2014: 2009: 2004: 1999: 1994: 1989: 1984: 1979: 1974: 1968: 1966: 1960: 1959: 1957: 1956: 1951: 1946: 1943:The Unarchiver 1940: 1934: 1928: 1923: 1917: 1915: 1909: 1908: 1906: 1905: 1900: 1895: 1890: 1884: 1879: 1874: 1869: 1864: 1859: 1854: 1849: 1844: 1839: 1833: 1831: 1822: 1805: 1804: 1796: 1795: 1788: 1781: 1773: 1764: 1763: 1761: 1760: 1750: 1740: 1729: 1726: 1725: 1723: 1722: 1717: 1712: 1707: 1701: 1699: 1695: 1694: 1692: 1691: 1686: 1681: 1676: 1671: 1666: 1661: 1660: 1659: 1654: 1649: 1639: 1634: 1629: 1624: 1619: 1614: 1608: 1606: 1600: 1599: 1597: 1596: 1591: 1586: 1581: 1576: 1571: 1568: 1567: 1566: 1556: 1551: 1546: 1541: 1536: 1531: 1526: 1521: 1516: 1511: 1506: 1503: 1498: 1493: 1488: 1483: 1478: 1473: 1468: 1463: 1457: 1455: 1451: 1450: 1448: 1447: 1442: 1437: 1432: 1427: 1422: 1417: 1412: 1407: 1402: 1397: 1392: 1386: 1384: 1380: 1379: 1377: 1376: 1371: 1366: 1361: 1356: 1351: 1346: 1340: 1338: 1337:Archiving only 1334: 1333: 1326: 1325: 1318: 1311: 1303: 1297: 1296: 1290: 1277: 1271: 1264:Classic Mac OS 1257: 1251: 1246: 1238: 1237:External links 1235: 1232: 1231: 1213: 1195: 1172: 1160:on 4 July 1998 1145: 1119: 1091: 1068: 1063:sourceware.org 1050: 1025: 1007: 985: 980:sourceware.org 966: 950: 949: 947: 944: 943: 942: 937: 932: 927: 922: 916: 915: 899: 896: 873:UNIX tradition 838:Huffman coding 794: 791: 765: 752: 749: 708: 707: 704: 701: 698: 673: 577: 576: 569: 566:Delta encoding 563: 556: 553: 550:Huffman coding 547: 544: 538: 532: 520: 519: 434: 432: 425: 419: 418:Implementation 416: 405: 404: 384: 382: 371: 368: 337:Huffman coding 297:that uses the 280: 279: 276: 267: 266: 261: 260:Type of format 257: 256: 251: 247: 246: 241: 235: 234: 231: 225: 224: 219: 213: 212: 207: 205: 199: 198: 193: 191: 178: 177: 164: 160: 159: 156: 150: 149: 144: 138: 137: 135:Cross-platform 132: 126: 125: 120: 114: 113: 110: 109: 96: 94: 92:Stable release 88: 87: 84: 83: 70: 66: 65: 64:, Micah Snyder 58: 52: 51: 46: 40: 39: 15: 9: 6: 4: 3: 2: 3298: 3287: 3284: 3282: 3279: 3277: 3274: 3272: 3269: 3267: 3264: 3262: 3261:1996 software 3259: 3258: 3256: 3241: 3233: 3231: 3227: 3223: 3222: 3219: 3213: 3212: 3211:Revolution OS 3208: 3206: 3205: 3201: 3199: 3196: 3194: 3191: 3189: 3186: 3184: 3181: 3179: 3176: 3174: 3173: 3172:GNU Manifesto 3169: 3167: 3164: 3163: 3161: 3155: 3149: 3146: 3144: 3141: 3139: 3136: 3134: 3131: 3129: 3126: 3124: 3121: 3119: 3116: 3114: 3111: 3109: 3106: 3104: 3101: 3099: 3096: 3095: 3093: 3089: 3079: 3078:Public domain 3076: 3074: 3071: 3069: 3066: 3064: 3061: 3059: 3056: 3054: 3051: 3049: 3046: 3044: 3041: 3039: 3036: 3034: 3031: 3029: 3026: 3025: 3023: 3017: 3011: 3008: 3006: 3003: 3001: 2998: 2996: 2993: 2991: 2988: 2986: 2983: 2981: 2978: 2976: 2973: 2971: 2968: 2966: 2963: 2959: 2956: 2954: 2951: 2949: 2946: 2945: 2944: 2941: 2939: 2936: 2934: 2931: 2929: 2926: 2924: 2921: 2919: 2916: 2914: 2911: 2909: 2906: 2904: 2901: 2899: 2896: 2895: 2893: 2891: 2887: 2881: 2878: 2876: 2873: 2872: 2870: 2868: 2867:Organisations 2864: 2858: 2855: 2853: 2850: 2848: 2845: 2843: 2840: 2838: 2835: 2834: 2832: 2830: 2826: 2820: 2817: 2815: 2812: 2810: 2807: 2805: 2802: 2800: 2797: 2793: 2790: 2789: 2788: 2785: 2783: 2780: 2778: 2775: 2773: 2770: 2768: 2765: 2763: 2762:Office suites 2760: 2758: 2755: 2753: 2750: 2746: 2743: 2741: 2738: 2737: 2736: 2733: 2731: 2728: 2726: 2723: 2721: 2718: 2716: 2713: 2712: 2710: 2708: 2702: 2696: 2693: 2691: 2688: 2686: 2683: 2681: 2678: 2676: 2673: 2671: 2668: 2666: 2663: 2661: 2660:Free software 2658: 2656: 2653: 2651: 2648: 2646: 2643: 2642: 2640: 2636: 2632: 2625: 2620: 2618: 2613: 2611: 2606: 2605: 2602: 2590: 2586: 2582: 2581: 2578: 2568: 2565: 2563: 2560: 2558: 2555: 2553: 2550: 2548: 2545: 2544: 2542: 2540: 2536: 2526: 2523: 2521: 2518: 2516: 2513: 2511: 2508: 2506: 2503: 2501: 2498: 2496: 2493: 2491: 2488: 2486: 2483: 2481: 2478: 2476: 2473: 2471: 2468: 2466: 2463: 2461: 2458: 2457: 2455: 2451: 2445: 2442: 2440: 2437: 2436: 2434: 2432: 2428: 2422: 2419: 2417: 2414: 2412: 2409: 2407: 2404: 2402: 2399: 2397: 2394: 2392: 2389: 2387: 2384: 2383: 2381: 2379: 2375: 2369: 2366: 2364: 2361: 2359: 2356: 2354: 2351: 2349: 2346: 2344: 2341: 2340: 2338: 2336: 2332: 2329: 2327: 2323: 2320: 2316: 2310: 2304: 2294: 2291: 2289: 2286: 2284: 2281: 2279: 2276: 2274: 2271: 2269: 2266: 2264: 2261: 2259: 2256: 2254: 2251: 2250: 2248: 2246: 2242: 2236: 2233: 2231: 2228: 2226: 2223: 2221: 2218: 2216: 2213: 2211: 2208: 2206: 2203: 2201: 2198: 2194: 2191: 2189: 2186: 2185: 2184: 2181: 2179: 2176: 2172: 2169: 2167: 2164: 2162: 2159: 2158: 2157: 2154: 2153: 2151: 2149: 2145: 2142: 2138: 2132: 2126: 2118: 2114: 2112: 2108: 2102: 2099: 2097: 2094: 2092: 2089: 2087: 2084: 2082: 2079: 2077: 2074: 2072: 2069: 2067: 2064: 2062: 2059: 2057: 2054: 2053: 2051: 2047: 2044: 2040:Non-archiving 2038: 2028: 2025: 2023: 2020: 2018: 2015: 2013: 2010: 2008: 2007:PowerArchiver 2005: 2003: 2000: 1998: 1995: 1993: 1990: 1988: 1985: 1983: 1980: 1978: 1975: 1973: 1970: 1969: 1967: 1965: 1961: 1955: 1952: 1950: 1947: 1944: 1941: 1938: 1935: 1932: 1929: 1927: 1924: 1922: 1919: 1918: 1916: 1914: 1910: 1904: 1901: 1899: 1896: 1894: 1891: 1888: 1885: 1883: 1880: 1878: 1875: 1873: 1870: 1868: 1865: 1863: 1860: 1858: 1855: 1853: 1850: 1848: 1845: 1843: 1840: 1838: 1835: 1834: 1832: 1830: 1829:Free software 1826: 1823: 1819: 1810: 1806: 1801: 1794: 1789: 1787: 1782: 1780: 1775: 1774: 1771: 1759: 1751: 1749: 1741: 1739: 1735: 1731: 1730: 1727: 1721: 1718: 1716: 1713: 1711: 1708: 1706: 1703: 1702: 1700: 1696: 1690: 1687: 1685: 1682: 1680: 1677: 1675: 1672: 1670: 1667: 1665: 1662: 1658: 1655: 1653: 1650: 1648: 1645: 1644: 1643: 1640: 1638: 1635: 1633: 1630: 1628: 1625: 1623: 1620: 1618: 1615: 1613: 1610: 1609: 1607: 1605: 1601: 1595: 1592: 1590: 1587: 1585: 1582: 1580: 1577: 1575: 1572: 1569: 1565: 1562: 1561: 1560: 1557: 1555: 1552: 1550: 1547: 1545: 1542: 1540: 1537: 1535: 1532: 1530: 1527: 1525: 1522: 1520: 1517: 1515: 1512: 1510: 1507: 1504: 1502: 1499: 1497: 1494: 1492: 1489: 1487: 1484: 1482: 1479: 1477: 1474: 1472: 1469: 1467: 1464: 1462: 1459: 1458: 1456: 1452: 1446: 1443: 1441: 1438: 1436: 1433: 1431: 1428: 1426: 1423: 1421: 1418: 1416: 1413: 1411: 1408: 1406: 1403: 1401: 1398: 1396: 1393: 1391: 1388: 1387: 1385: 1381: 1375: 1372: 1370: 1367: 1365: 1362: 1360: 1357: 1355: 1352: 1350: 1347: 1345: 1342: 1341: 1339: 1335: 1331: 1324: 1319: 1317: 1312: 1310: 1305: 1304: 1301: 1294: 1291: 1288: 1284: 1281: 1278: 1275: 1272: 1269: 1265: 1261: 1258: 1255: 1252: 1250: 1247: 1244: 1241: 1240: 1227: 1223: 1217: 1209: 1205: 1199: 1185: 1184: 1176: 1159: 1155: 1149: 1133: 1129: 1123: 1107: 1106: 1101: 1095: 1087: 1086: 1078: 1072: 1064: 1060: 1054: 1039: 1035: 1029: 1021: 1017: 1011: 1003: 999: 995: 989: 981: 977: 970: 964: 960: 955: 951: 941: 938: 936: 933: 931: 928: 926: 923: 921: 918: 917: 913: 902: 895: 892: 886: 880: 878: 874: 869: 865: 860: 858: 853: 851: 847: 843: 839: 835: 831: 828:and uses the 827: 822: 820: 816: 812: 808: 804: 800: 790: 788: 784: 780: 775: 764: 756: 748: 746: 741: 737: 731: 729: 724: 721: 716: 714: 705: 702: 699: 696: 695: 694: 692: 688: 684: 679: 672: 666: 648:RUNA, RUNB, 1 642:The sequence 640: 636: 632: 630: 624: 621: 617: 613: 603: 600: 596: 574: 570: 567: 564: 561: 557: 554: 551: 548: 545: 542: 541:Move-to-front 539: 536: 533: 530: 527: 526: 525: 516: 513: 505: 494: 491: 487: 484: 480: 477: 473: 470: 466: 463: â€“  462: 458: 457:Find sources: 451: 447: 441: 440: 435:This section 433: 429: 424: 423: 415: 413: 412:Federico Mena 401: 392: 388: 385:This section 383: 380: 376: 375: 367: 365: 361: 357: 353: 348: 346: 340: 338: 334: 330: 326: 322: 319: 315: 311: 310:Julian Seward 306: 304: 303:file archiver 300: 296: 293: 290: 286: 277: 273: 268: 265: 262: 258: 255: 254:Julian Seward 252: 248: 242: 240: 236: 232: 230: 226: 220: 218: 214: 206: 204: 200: 192: 190: 186: 175: 165: 161: 157: 155: 151: 148: 145: 143: 139: 136: 133: 131: 127: 124: 121: 119: 115: 111: 95: 93: 89: 85: 71: 67: 63: 62:Federico Mena 59: 57: 53: 50: 49:Julian Seward 47: 45: 41: 37: 32: 26: 22: 3209: 3202: 3170: 3053:Free license 2799:Android apps 2562:MSU Lossless 2406:Nero Digital 2353:Nero Digital 2055: 1867:KGB Archiver 1394: 1225: 1216: 1207: 1198: 1187:, retrieved 1182: 1175: 1162:. Retrieved 1158:the original 1148: 1136:. Retrieved 1132:the original 1122: 1110:. Retrieved 1103: 1094: 1083: 1071: 1062: 1053: 1041:. Retrieved 1037: 1028: 1019: 1010: 997: 988: 979: 969: 962: 959:bzip2/README 954: 881: 861: 854: 841: 823: 796: 787:Apache Spark 776: 768: 757: 754: 739: 732: 725: 717: 709: 690: 686: 682: 680: 676: 641: 637: 633: 628: 625: 619: 615: 611: 604: 601: 597: 578: 523: 508: 499: 489: 482: 475: 468: 456: 444:Please help 439:verification 436: 408: 395: 391:adding to it 386: 364:Apache Spark 349: 341: 307: 284: 283: 239:Magic number 56:Developer(s) 25: 3143:Tivoization 2782:Video games 2757:Mathematics 2309:compression 2131:compression 2042:compressors 1813:compression 1138:12 February 751:File format 661: bytes 502:August 2021 398:August 2021 335:(MTF), and 272:Open format 21:bZIP domain 3255:Categories 3091:Challenges 2809:Commercial 2792:E-commerce 2777:Television 2583:See also: 2490:libavcodec 2335:MPEG-4 ASP 2315:comparison 2263:libavcodec 2205:libavcodec 2137:comparison 1964:Commercial 1818:comparison 1738:Comparison 1189:13 October 1112:14 October 1038:viruta.org 1020:viruta.org 946:References 793:Efficiency 669:RUNA, RUNB 652:RUNA, RUNB 472:newspapers 168:sourceware 118:Repository 104:2019-07-13 79:1996-07-18 3021:standards 3019:Types and 3000:Unlicense 2995:Sleepycat 2829:Community 2515:libtheora 2416:QuickTime 2278:OptimFROG 2230:libvorbis 1997:MacBinary 1992:BetterZip 1954:ZipGenius 1893:Xarchiver 1809:Archivers 1445:Zstandard 1002:Apple Inc 573:bit array 217:Type code 3240:Category 3157:Related 3038:Copyleft 2958:GNU LGPL 2953:GNU AGPL 2918:Beerware 2913:Artistic 2890:Licenses 2857:Advocacy 2804:iOS apps 2745:Wireless 2740:Graphics 2707:packages 2705:Software 2695:Timeline 2557:Lagarith 2539:Lossless 2505:Sorenson 2460:CineForm 2411:OpenH264 2391:Blu-code 2245:Lossless 2225:Musepack 2220:libspeex 2111:For code 2096:XZ Utils 2061:compress 1913:Freeware 1862:Info-ZIP 1847:Expander 1802:software 1758:Category 1652:Java RAR 1400:compress 1283:Archived 1268:Mac OS X 1266:; under 1260:MacBzip2 1256:(WBZip2) 898:See also 779:big data 700:1: RUNB, 697:0: RUNA, 587:, where 352:big data 3166:Forking 2948:GNU GPL 2842:History 2772:Routing 2735:Drivers 2690:Outline 2638:General 2552:Huffyuv 2465:Cinepak 2386:CoreAVC 2288:WavPack 2283:Shorten 2215:libopus 2210:libcelt 2200:TooLAME 2049:Generic 2012:StuffIt 1852:FreeArc 1486:Cabinet 1226:die.net 1164:5 March 1043:27 July 888:-based 807:Deflate 745:DEFLATE 713:DEFLATE 608:900 kB. 571:Sparse 486:scholar 461:"Bzip2" 370:History 331:(BWT), 327:(RLE), 318:Deflate 163:Website 154:License 102: ( 77: ( 3230:Portal 3159:topics 2980:Python 2903:Apache 2852:Events 2752:Health 2725:Codecs 2520:libvpx 2453:Others 2401:FFmpeg 2358:FFmpeg 2273:mp4als 2091:Snappy 2027:WinZip 2022:WinRAR 2017:WinAce 1949:TUGZip 1921:Filzip 1882:PeaZip 1410:Zopfli 1390:Brotli 1108:. 2009 1085:GitHub 891:bzgrep 805:) and 783:Hadoop 740:sparse 560:base-1 558:Unary 488:  481:  474:  467:  459:  360:Hadoop 172:/bzip2 3005:WTFPL 2715:Audio 2485:Indeo 2475:DNxHD 2470:Daala 2378:H.264 2326:Lossy 2307:Video 2188:l3enc 2148:Lossy 2129:Audio 2056:bzip2 1977:ALZip 1931:Lhasa 1898:Zipeg 1837:7-Zip 1529:lrzip 1395:bzip2 1262:(for 1080:(PDF) 994:"bz2" 877:GnuPG 862:Like 844:used 736:ASCII 493:JSTOR 479:books 287:is a 285:bzip2 183:bzip2 29:bzip2 3010:zlib 2933:CDDL 2908:APSL 2587:and 2567:YULS 2547:FFV1 2444:x265 2439:DivX 2431:HEVC 2421:x264 2396:DivX 2368:Xvid 2363:HDX4 2348:DivX 2343:3ivx 2293:L2HC 2258:FLAC 2253:ALAC 2193:LAME 2171:FAAC 2101:zstd 2086:rzip 2081:pack 2076:lzop 2071:lzip 2066:gzip 1903:ZPAQ 1811:with 1748:List 1689:XBAP 1669:MSIX 1622:APPX 1594:ZPAQ 1564:sitx 1554:rzip 1514:.egg 1509:.dmg 1505:DGCA 1430:lzop 1425:lzip 1415:LZMA 1405:gzip 1374:WARC 1354:shar 1349:cpio 1191:2022 1166:2009 1140:2019 1114:2015 1045:2022 940:rzip 885:grep 882:The 864:gzip 842:bzip 836:and 819:LZMA 813:and 811:.zip 785:and 772:.bz2 760:.bz2 591:and 465:news 362:and 316:and 292:file 222:Bzp2 195:.bz2 170:.org 142:Type 2975:MPL 2970:MIT 2965:ISC 2938:EPL 2923:BSD 2898:AFL 2510:VP7 2500:SBC 2183:MP3 2156:AAC 2117:UPX 1987:ARJ 1972:ARC 1926:LHA 1887:XAD 1877:pax 1872:PAQ 1842:Ark 1720:PAQ 1684:XAP 1679:RPM 1664:MSI 1657:EAR 1647:WAR 1642:JAR 1637:ipa 1632:HAP 1627:deb 1617:App 1612:apk 1589:ZIP 1584:zoo 1579:Xar 1574:UDA 1570:SQX 1559:sit 1549:RAR 1544:PEA 1539:MPQ 1534:LZX 1524:LHA 1519:kgb 1501:dar 1496:cpt 1491:cfs 1476:ARJ 1471:ARC 1466:ACE 1420:LZ4 1369:WAD 1364:LBR 1359:tar 868:tar 815:.gz 799:LZW 720:MTF 659:000 657:900 629:any 448:by 393:. 314:LZW 278:Yes 244:BZh 3257:: 1481:B1 1461:7z 1440:xz 1435:SQ 1344:ar 1224:. 1206:. 1102:. 1082:. 1061:. 1036:. 1018:. 1000:. 996:. 978:. 961:, 826:kB 803:.Z 789:. 650:; 593:\0 589:\3 2623:e 2616:t 2609:v 2317:) 2313:( 2139:) 2135:( 1820:) 1816:( 1792:e 1785:t 1778:v 1322:e 1315:t 1308:v 1228:. 1210:. 1168:. 1142:. 1116:. 1065:. 1047:. 1022:. 1004:. 982:. 809:( 801:( 691:n 687:n 683:n 616:i 612:i 552:. 515:) 509:( 504:) 500:( 490:· 483:· 476:· 469:· 442:. 400:) 396:( 274:? 174:/ 106:) 81:) 23:.

Index

bZIP domain

Original author(s)
Julian Seward
Developer(s)
Federico Mena
Stable release
Repository
https://gitlab.com/bzip2/bzip2/
Operating system
Cross-platform
Type
Data compression
License
sourceware.org/bzip2/
Filename extension
Internet media type
Type code
Uniform Type Identifier (UTI)
Magic number
Julian Seward
Data compression
Open format
free and open-source
file
compression program
Burrows–Wheeler algorithm
file archiver
Julian Seward
LZW

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑