aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorXin LI <delphij@FreeBSD.org>2007-03-28 07:58:30 +0000
committerXin LI <delphij@FreeBSD.org>2007-03-28 07:58:30 +0000
commit078c615eaccfef1341e98edfc4255474da173847 (patch)
treecff6d718ccf97f86055c2d3c473357ca1ad8fb5f
parent4446b1794f2c3e302d9897f1f8b0f236aeb96183 (diff)
downloadsrc-078c615eaccfef1341e98edfc4255474da173847.tar.gz
src-078c615eaccfef1341e98edfc4255474da173847.zip
Remove manual.texi which does not belong to this distributionvendor/bzip2/1.0.4
Notes
Notes: svn path=/vendor/bzip2/dist/; revision=167984 svn path=/vendor/bzip2/1.0.4/; revision=167976; tag=vendor/bzip2/1.0.4
-rw-r--r--contrib/bzip2/manual.texi2243
1 files changed, 0 insertions, 2243 deletions
diff --git a/contrib/bzip2/manual.texi b/contrib/bzip2/manual.texi
deleted file mode 100644
index 5bc27d5f9f90..000000000000
--- a/contrib/bzip2/manual.texi
+++ /dev/null
@@ -1,2243 +0,0 @@
-\input texinfo @c -*- Texinfo -*-
-@setfilename bzip2.info
-
-@ignore
-This file documents bzip2 version 1.0.2, and associated library
-libbzip2, written by Julian Seward (jseward@acm.org).
-
-Copyright (C) 1996-2002 Julian R Seward
-
-Permission is granted to make and distribute verbatim copies of
-this manual provided the copyright notice and this permission notice
-are preserved on all copies.
-
-Permission is granted to copy and distribute translations of this manual
-into another language, under the above conditions for verbatim copies.
-@end ignore
-
-@ifinfo
-@format
-START-INFO-DIR-ENTRY
-* Bzip2: (bzip2). A program and library for data compression.
-END-INFO-DIR-ENTRY
-@end format
-
-@end ifinfo
-
-@iftex
-@c @finalout
-@settitle bzip2 and libbzip2
-@titlepage
-@title bzip2 and libbzip2
-@subtitle a program and library for data compression
-@subtitle copyright (C) 1996-2002 Julian Seward
-@subtitle version 1.0.2 of 30 December 2001
-@author Julian Seward
-
-@end titlepage
-
-@parindent 0mm
-@parskip 2mm
-
-@end iftex
-@node Top,,, (dir)
-
-The following text is the License for this software. You should
-find it identical to that contained in the file LICENSE in the
-source distribution.
-
-@bf{------------------ START OF THE LICENSE ------------------}
-
-This program, @code{bzip2},
-and associated library @code{libbzip2}, are
-Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
-
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions
-are met:
-@itemize @bullet
-@item
- Redistributions of source code must retain the above copyright
- notice, this list of conditions and the following disclaimer.
-@item
- The origin of this software must not be misrepresented; you must
- not claim that you wrote the original software. If you use this
- software in a product, an acknowledgment in the product
- documentation would be appreciated but is not required.
-@item
- Altered source versions must be plainly marked as such, and must
- not be misrepresented as being the original software.
-@item
- The name of the author may not be used to endorse or promote
- products derived from this software without specific prior written
- permission.
-@end itemize
-THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
-OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
-DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
-GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
-WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-Julian Seward, Cambridge, UK.
-
-@code{jseward@@acm.org}
-
-@code{bzip2}/@code{libbzip2} version 1.0.2 of 30 December 2001.
-
-@bf{------------------ END OF THE LICENSE ------------------}
-
-Web sites:
-
-@code{http://sources.redhat.com/bzip2}
-
-@code{http://www.cacheprof.org}
-
-PATENTS: To the best of my knowledge, @code{bzip2} does not use any patented
-algorithms. However, I do not have the resources available to carry out
-a full patent search. Therefore I cannot give any guarantee of the
-above statement.
-
-
-
-
-
-
-
-@chapter Introduction
-
-@code{bzip2} compresses files using the Burrows-Wheeler
-block-sorting text compression algorithm, and Huffman coding.
-Compression is generally considerably better than that
-achieved by more conventional LZ77/LZ78-based compressors,
-and approaches the performance of the PPM family of statistical compressors.
-
-@code{bzip2} is built on top of @code{libbzip2}, a flexible library
-for handling compressed data in the @code{bzip2} format. This manual
-describes both how to use the program and
-how to work with the library interface. Most of the
-manual is devoted to this library, not the program,
-which is good news if your interest is only in the program.
-
-Chapter 2 describes how to use @code{bzip2}; this is the only part
-you need to read if you just want to know how to operate the program.
-Chapter 3 describes the programming interfaces in detail, and
-Chapter 4 records some miscellaneous notes which I thought
-ought to be recorded somewhere.
-
-
-@chapter How to use @code{bzip2}
-
-This chapter contains a copy of the @code{bzip2} man page,
-and nothing else.
-
-@quotation
-
-@unnumberedsubsubsec NAME
-@itemize
-@item @code{bzip2}, @code{bunzip2}
-- a block-sorting file compressor, v1.0.2
-@item @code{bzcat}
-- decompresses files to stdout
-@item @code{bzip2recover}
-- recovers data from damaged bzip2 files
-@end itemize
-
-@unnumberedsubsubsec SYNOPSIS
-@itemize
-@item @code{bzip2} [ -cdfkqstvzVL123456789 ] [ filenames ... ]
-@item @code{bunzip2} [ -fkvsVL ] [ filenames ... ]
-@item @code{bzcat} [ -s ] [ filenames ... ]
-@item @code{bzip2recover} filename
-@end itemize
-
-@unnumberedsubsubsec DESCRIPTION
-
-@code{bzip2} compresses files using the Burrows-Wheeler block sorting
-text compression algorithm, and Huffman coding. Compression is
-generally considerably better than that achieved by more conventional
-LZ77/LZ78-based compressors, and approaches the performance of the PPM
-family of statistical compressors.
-
-The command-line options are deliberately very similar to those of GNU
-@code{gzip}, but they are not identical.
-
-@code{bzip2} expects a list of file names to accompany the command-line
-flags. Each file is replaced by a compressed version of itself, with
-the name @code{original_name.bz2}. Each compressed file has the same
-modification date, permissions, and, when possible, ownership as the
-corresponding original, so that these properties can be correctly
-restored at decompression time. File name handling is naive in the
-sense that there is no mechanism for preserving original file names,
-permissions, ownerships or dates in filesystems which lack these
-concepts, or have serious file name length restrictions, such as MS-DOS.
-
-@code{bzip2} and @code{bunzip2} will by default not overwrite existing
-files. If you want this to happen, specify the @code{-f} flag.
-
-If no file names are specified, @code{bzip2} compresses from standard
-input to standard output. In this case, @code{bzip2} will decline to
-write compressed output to a terminal, as this would be entirely
-incomprehensible and therefore pointless.
-
-@code{bunzip2} (or @code{bzip2 -d}) decompresses all
-specified files. Files which were not created by @code{bzip2}
-will be detected and ignored, and a warning issued.
-@code{bzip2} attempts to guess the filename for the decompressed file
-from that of the compressed file as follows:
-@itemize
-@item @code{filename.bz2 } becomes @code{filename}
-@item @code{filename.bz } becomes @code{filename}
-@item @code{filename.tbz2} becomes @code{filename.tar}
-@item @code{filename.tbz } becomes @code{filename.tar}
-@item @code{anyothername } becomes @code{anyothername.out}
-@end itemize
-If the file does not end in one of the recognised endings,
-@code{.bz2}, @code{.bz},
-@code{.tbz2} or @code{.tbz}, @code{bzip2} complains that it cannot
-guess the name of the original file, and uses the original name
-with @code{.out} appended.
-
-As with compression, supplying no
-filenames causes decompression from standard input to standard output.
-
-@code{bunzip2} will correctly decompress a file which is the
-concatenation of two or more compressed files. The result is the
-concatenation of the corresponding uncompressed files. Integrity
-testing (@code{-t}) of concatenated compressed files is also supported.
-
-You can also compress or decompress files to the standard output by
-giving the @code{-c} flag. Multiple files may be compressed and
-decompressed like this. The resulting outputs are fed sequentially to
-stdout. Compression of multiple files in this manner generates a stream
-containing multiple compressed file representations. Such a stream
-can be decompressed correctly only by @code{bzip2} version 0.9.0 or
-later. Earlier versions of @code{bzip2} will stop after decompressing
-the first file in the stream.
-
-@code{bzcat} (or @code{bzip2 -dc}) decompresses all specified files to
-the standard output.
-
-@code{bzip2} will read arguments from the environment variables
-@code{BZIP2} and @code{BZIP}, in that order, and will process them
-before any arguments read from the command line. This gives a
-convenient way to supply default arguments.
-
-Compression is always performed, even if the compressed file is slightly
-larger than the original. Files of less than about one hundred bytes
-tend to get larger, since the compression mechanism has a constant
-overhead in the region of 50 bytes. Random data (including the output
-of most file compressors) is coded at about 8.05 bits per byte, giving
-an expansion of around 0.5%.
-
-As a self-check for your protection, @code{bzip2} uses 32-bit CRCs to
-make sure that the decompressed version of a file is identical to the
-original. This guards against corruption of the compressed data, and
-against undetected bugs in @code{bzip2} (hopefully very unlikely). The
-chances of data corruption going undetected is microscopic, about one
-chance in four billion for each file processed. Be aware, though, that
-the check occurs upon decompression, so it can only tell you that
-something is wrong. It can't help you recover the original uncompressed
-data. You can use @code{bzip2recover} to try to recover data from
-damaged files.
-
-Return values: 0 for a normal exit, 1 for environmental problems (file
-not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt
-compressed file, 3 for an internal consistency error (eg, bug) which
-caused @code{bzip2} to panic.
-
-
-@unnumberedsubsubsec OPTIONS
-@table @code
-@item -c --stdout
-Compress or decompress to standard output.
-@item -d --decompress
-Force decompression. @code{bzip2}, @code{bunzip2} and @code{bzcat} are
-really the same program, and the decision about what actions to take is
-done on the basis of which name is used. This flag overrides that
-mechanism, and forces bzip2 to decompress.
-@item -z --compress
-The complement to @code{-d}: forces compression, regardless of the
-invokation name.
-@item -t --test
-Check integrity of the specified file(s), but don't decompress them.
-This really performs a trial decompression and throws away the result.
-@item -f --force
-Force overwrite of output files. Normally, @code{bzip2} will not overwrite
-existing output files. Also forces @code{bzip2} to break hard links
-to files, which it otherwise wouldn't do.
-
-@code{bzip2} normally declines to decompress files which don't have the
-correct magic header bytes. If forced (@code{-f}), however, it will
-pass such files through unmodified. This is how GNU @code{gzip}
-behaves.
-@item -k --keep
-Keep (don't delete) input files during compression
-or decompression.
-@item -s --small
-Reduce memory usage, for compression, decompression and testing. Files
-are decompressed and tested using a modified algorithm which only
-requires 2.5 bytes per block byte. This means any file can be
-decompressed in 2300k of memory, albeit at about half the normal speed.
-
-During compression, @code{-s} selects a block size of 200k, which limits
-memory use to around the same figure, at the expense of your compression
-ratio. In short, if your machine is low on memory (8 megabytes or
-less), use -s for everything. See MEMORY MANAGEMENT below.
-@item -q --quiet
-Suppress non-essential warning messages. Messages pertaining to
-I/O errors and other critical events will not be suppressed.
-@item -v --verbose
-Verbose mode -- show the compression ratio for each file processed.
-Further @code{-v}'s increase the verbosity level, spewing out lots of
-information which is primarily of interest for diagnostic purposes.
-@item -L --license -V --version
-Display the software version, license terms and conditions.
-@item -1 (or --fast) to -9 (or --best)
-Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
-effect when decompressing. See MEMORY MANAGEMENT below.
-The @code{--fast} and @code{--best} aliases are primarily for GNU
-@code{gzip} compatibility. In particular, @code{--fast} doesn't make
-things significantly faster. And @code{--best} merely selects the
-default behaviour.
-@item --
-Treats all subsequent arguments as file names, even if they start
-with a dash. This is so you can handle files with names beginning
-with a dash, for example: @code{bzip2 -- -myfilename}.
-@item --repetitive-fast
-@item --repetitive-best
-These flags are redundant in versions 0.9.5 and above. They provided
-some coarse control over the behaviour of the sorting algorithm in
-earlier versions, which was sometimes useful. 0.9.5 and above have an
-improved algorithm which renders these flags irrelevant.
-@end table
-
-
-@unnumberedsubsubsec MEMORY MANAGEMENT
-
-@code{bzip2} compresses large files in blocks. The block size affects
-both the compression ratio achieved, and the amount of memory needed for
-compression and decompression. The flags @code{-1} through @code{-9}
-specify the block size to be 100,000 bytes through 900,000 bytes (the
-default) respectively. At decompression time, the block size used for
-compression is read from the header of the compressed file, and
-@code{bunzip2} then allocates itself just enough memory to decompress
-the file. Since block sizes are stored in compressed files, it follows
-that the flags @code{-1} to @code{-9} are irrelevant to and so ignored
-during decompression.
-
-Compression and decompression requirements, in bytes, can be estimated
-as:
-@example
- Compression: 400k + ( 8 x block size )
-
- Decompression: 100k + ( 4 x block size ), or
- 100k + ( 2.5 x block size )
-@end example
-Larger block sizes give rapidly diminishing marginal returns. Most of
-the compression comes from the first two or three hundred k of block
-size, a fact worth bearing in mind when using @code{bzip2} on small machines.
-It is also important to appreciate that the decompression memory
-requirement is set at compression time by the choice of block size.
-
-For files compressed with the default 900k block size, @code{bunzip2}
-will require about 3700 kbytes to decompress. To support decompression
-of any file on a 4 megabyte machine, @code{bunzip2} has an option to
-decompress using approximately half this amount of memory, about 2300
-kbytes. Decompression speed is also halved, so you should use this
-option only where necessary. The relevant flag is @code{-s}.
-
-In general, try and use the largest block size memory constraints allow,
-since that maximises the compression achieved. Compression and
-decompression speed are virtually unaffected by block size.
-
-Another significant point applies to files which fit in a single block
--- that means most files you'd encounter using a large block size. The
-amount of real memory touched is proportional to the size of the file,
-since the file is smaller than a block. For example, compressing a file
-20,000 bytes long with the flag @code{-9} will cause the compressor to
-allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
-kbytes of it. Similarly, the decompressor will allocate 3700k but only
-touch 100k + 20000 * 4 = 180 kbytes.
-
-Here is a table which summarises the maximum memory usage for different
-block sizes. Also recorded is the total compressed size for 14 files of
-the Calgary Text Compression Corpus totalling 3,141,622 bytes. This
-column gives some feel for how compression varies with block size.
-These figures tend to understate the advantage of larger block sizes for
-larger files, since the Corpus is dominated by smaller files.
-@example
- Compress Decompress Decompress Corpus
- Flag usage usage -s usage Size
-
- -1 1200k 500k 350k 914704
- -2 2000k 900k 600k 877703
- -3 2800k 1300k 850k 860338
- -4 3600k 1700k 1100k 846899
- -5 4400k 2100k 1350k 845160
- -6 5200k 2500k 1600k 838626
- -7 6100k 2900k 1850k 834096
- -8 6800k 3300k 2100k 828642
- -9 7600k 3700k 2350k 828642
-@end example
-
-@unnumberedsubsubsec RECOVERING DATA FROM DAMAGED FILES
-
-@code{bzip2} compresses files in blocks, usually 900kbytes long. Each
-block is handled independently. If a media or transmission error causes
-a multi-block @code{.bz2} file to become damaged, it may be possible to
-recover data from the undamaged blocks in the file.
-
-The compressed representation of each block is delimited by a 48-bit
-pattern, which makes it possible to find the block boundaries with
-reasonable certainty. Each block also carries its own 32-bit CRC, so
-damaged blocks can be distinguished from undamaged ones.
-
-@code{bzip2recover} is a simple program whose purpose is to search for
-blocks in @code{.bz2} files, and write each block out into its own
-@code{.bz2} file. You can then use @code{bzip2 -t} to test the
-integrity of the resulting files, and decompress those which are
-undamaged.
-
-@code{bzip2recover}
-takes a single argument, the name of the damaged file, and writes a
-number of files @code{rec00001file.bz2}, @code{rec00002file.bz2}, etc,
-containing the extracted blocks. The output filenames are designed so
-that the use of wildcards in subsequent processing -- for example,
-@code{bzip2 -dc rec*file.bz2 > recovered_data} -- processes the files in
-the correct order.
-
-@code{bzip2recover} should be of most use dealing with large @code{.bz2}
-files, as these will contain many blocks. It is clearly futile to use
-it on damaged single-block files, since a damaged block cannot be
-recovered. If you wish to minimise any potential data loss through
-media or transmission errors, you might consider compressing with a
-smaller block size.
-
-
-@unnumberedsubsubsec PERFORMANCE NOTES
-
-The sorting phase of compression gathers together similar strings in the
-file. Because of this, files containing very long runs of repeated
-symbols, like "aabaabaabaab ..." (repeated several hundred times) may
-compress more slowly than normal. Versions 0.9.5 and above fare much
-better than previous versions in this respect. The ratio between
-worst-case and average-case compression time is in the region of 10:1.
-For previous versions, this figure was more like 100:1. You can use the
-@code{-vvvv} option to monitor progress in great detail, if you want.
-
-Decompression speed is unaffected by these phenomena.
-
-@code{bzip2} usually allocates several megabytes of memory to operate
-in, and then charges all over it in a fairly random fashion. This means
-that performance, both for compressing and decompressing, is largely
-determined by the speed at which your machine can service cache misses.
-Because of this, small changes to the code to reduce the miss rate have
-been observed to give disproportionately large performance improvements.
-I imagine @code{bzip2} will perform best on machines with very large
-caches.
-
-
-@unnumberedsubsubsec CAVEATS
-
-I/O error messages are not as helpful as they could be. @code{bzip2}
-tries hard to detect I/O errors and exit cleanly, but the details of
-what the problem is sometimes seem rather misleading.
-
-This manual page pertains to version 1.0.2 of @code{bzip2}. Compressed
-data created by this version is entirely forwards and backwards
-compatible with the previous public releases, versions 0.1pl2, 0.9.0,
-0.9.5, 1.0.0 and 1.0.1, but with the following exception: 0.9.0 and
-above can correctly decompress multiple concatenated compressed files.
-0.1pl2 cannot do this; it will stop after decompressing just the first
-file in the stream.
-
-@code{bzip2recover} versions prior to this one, 1.0.2, used 32-bit
-integers to represent bit positions in compressed files, so it could not
-handle compressed files more than 512 megabytes long. Version 1.0.2 and
-above uses 64-bit ints on some platforms which support them (GNU
-supported targets, and Windows). To establish whether or not
-@code{bzip2recover} was built with such a limitation, run it without
-arguments. In any event you can build yourself an unlimited version if
-you can recompile it with @code{MaybeUInt64} set to be an unsigned
-64-bit integer.
-
-
-
-@unnumberedsubsubsec AUTHOR
-Julian Seward, @code{jseward@@acm.org}.
-
-@code{http://sources.redhat.com/bzip2}
-
-The ideas embodied in @code{bzip2} are due to (at least) the following
-people: Michael Burrows and David Wheeler (for the block sorting
-transformation), David Wheeler (again, for the Huffman coder), Peter
-Fenwick (for the structured coding model in the original @code{bzip},
-and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
-(for the arithmetic coder in the original @code{bzip}). I am much
-indebted for their help, support and advice. See the manual in the
-source distribution for pointers to sources of documentation. Christian
-von Roques encouraged me to look for faster sorting algorithms, so as to
-speed up compression. Bela Lubkin encouraged me to improve the
-worst-case compression performance. The @code{bz*} scripts are derived
-from those of GNU @code{gzip}. Many people sent patches, helped with
-portability problems, lent machines, gave advice and were generally
-helpful.
-
-@end quotation
-
-
-
-
-@chapter Programming with @code{libbzip2}
-
-This chapter describes the programming interface to @code{libbzip2}.
-
-For general background information, particularly about memory
-use and performance aspects, you'd be well advised to read Chapter 2
-as well.
-
-@section Top-level structure
-
-@code{libbzip2} is a flexible library for compressing and decompressing
-data in the @code{bzip2} data format. Although packaged as a single
-entity, it helps to regard the library as three separate parts: the low
-level interface, and the high level interface, and some utility
-functions.
-
-The structure of @code{libbzip2}'s interfaces is similar to
-that of Jean-loup Gailly's and Mark Adler's excellent @code{zlib}
-library.
-
-All externally visible symbols have names beginning @code{BZ2_}.
-This is new in version 1.0. The intention is to minimise pollution
-of the namespaces of library clients.
-
-@subsection Low-level summary
-
-This interface provides services for compressing and decompressing
-data in memory. There's no provision for dealing with files, streams
-or any other I/O mechanisms, just straight memory-to-memory work.
-In fact, this part of the library can be compiled without inclusion
-of @code{stdio.h}, which may be helpful for embedded applications.
-
-The low-level part of the library has no global variables and
-is therefore thread-safe.
-
-Six routines make up the low level interface:
-@code{BZ2_bzCompressInit}, @code{BZ2_bzCompress}, and @* @code{BZ2_bzCompressEnd}
-for compression,
-and a corresponding trio @code{BZ2_bzDecompressInit}, @* @code{BZ2_bzDecompress}
-and @code{BZ2_bzDecompressEnd} for decompression.
-The @code{*Init} functions allocate
-memory for compression/decompression and do other
-initialisations, whilst the @code{*End} functions close down operations
-and release memory.
-
-The real work is done by @code{BZ2_bzCompress} and @code{BZ2_bzDecompress}.
-These compress and decompress data from a user-supplied input buffer
-to a user-supplied output buffer. These buffers can be any size;
-arbitrary quantities of data are handled by making repeated calls
-to these functions. This is a flexible mechanism allowing a
-consumer-pull style of activity, or producer-push, or a mixture of
-both.
-
-
-
-@subsection High-level summary
-
-This interface provides some handy wrappers around the low-level
-interface to facilitate reading and writing @code{bzip2} format
-files (@code{.bz2} files). The routines provide hooks to facilitate
-reading files in which the @code{bzip2} data stream is embedded
-within some larger-scale file structure, or where there are
-multiple @code{bzip2} data streams concatenated end-to-end.
-
-For reading files, @code{BZ2_bzReadOpen}, @code{BZ2_bzRead},
-@code{BZ2_bzReadClose} and @* @code{BZ2_bzReadGetUnused} are supplied. For
-writing files, @code{BZ2_bzWriteOpen}, @code{BZ2_bzWrite} and
-@code{BZ2_bzWriteFinish} are available.
-
-As with the low-level library, no global variables are used
-so the library is per se thread-safe. However, if I/O errors
-occur whilst reading or writing the underlying compressed files,
-you may have to consult @code{errno} to determine the cause of
-the error. In that case, you'd need a C library which correctly
-supports @code{errno} in a multithreaded environment.
-
-To make the library a little simpler and more portable,
-@code{BZ2_bzReadOpen} and @code{BZ2_bzWriteOpen} require you to pass them file
-handles (@code{FILE*}s) which have previously been opened for reading or
-writing respectively. That avoids portability problems associated with
-file operations and file attributes, whilst not being much of an
-imposition on the programmer.
-
-
-
-@subsection Utility functions summary
-For very simple needs, @code{BZ2_bzBuffToBuffCompress} and
-@code{BZ2_bzBuffToBuffDecompress} are provided. These compress
-data in memory from one buffer to another buffer in a single
-function call. You should assess whether these functions
-fulfill your memory-to-memory compression/decompression
-requirements before investing effort in understanding the more
-general but more complex low-level interface.
-
-Yoshioka Tsuneo (@code{QWF00133@@niftyserve.or.jp} /
-@code{tsuneo-y@@is.aist-nara.ac.jp}) has contributed some functions to
-give better @code{zlib} compatibility. These functions are
-@code{BZ2_bzopen}, @code{BZ2_bzread}, @code{BZ2_bzwrite}, @code{BZ2_bzflush},
-@code{BZ2_bzclose},
-@code{BZ2_bzerror} and @code{BZ2_bzlibVersion}. You may find these functions
-more convenient for simple file reading and writing, than those in the
-high-level interface. These functions are not (yet) officially part of
-the library, and are minimally documented here. If they break, you
-get to keep all the pieces. I hope to document them properly when time
-permits.
-
-Yoshioka also contributed modifications to allow the library to be
-built as a Windows DLL.
-
-
-@section Error handling
-
-The library is designed to recover cleanly in all situations, including
-the worst-case situation of decompressing random data. I'm not
-100% sure that it can always do this, so you might want to add
-a signal handler to catch segmentation violations during decompression
-if you are feeling especially paranoid. I would be interested in
-hearing more about the robustness of the library to corrupted
-compressed data.
-
-Version 1.0 is much more robust in this respect than
-0.9.0 or 0.9.5. Investigations with Checker (a tool for
-detecting problems with memory management, similar to Purify)
-indicate that, at least for the few files I tested, all single-bit
-errors in the decompressed data are caught properly, with no
-segmentation faults, no reads of uninitialised data and no
-out of range reads or writes. So it's certainly much improved,
-although I wouldn't claim it to be totally bombproof.
-
-The file @code{bzlib.h} contains all definitions needed to use
-the library. In particular, you should definitely not include
-@code{bzlib_private.h}.
-
-In @code{bzlib.h}, the various return values are defined. The following
-list is not intended as an exhaustive description of the circumstances
-in which a given value may be returned -- those descriptions are given
-later. Rather, it is intended to convey the rough meaning of each
-return value. The first five actions are normal and not intended to
-denote an error situation.
-@table @code
-@item BZ_OK
-The requested action was completed successfully.
-@item BZ_RUN_OK
-@itemx BZ_FLUSH_OK
-@itemx BZ_FINISH_OK
-In @code{BZ2_bzCompress}, the requested flush/finish/nothing-special action
-was completed successfully.
-@item BZ_STREAM_END
-Compression of data was completed, or the logical stream end was
-detected during decompression.
-@end table
-
-The following return values indicate an error of some kind.
-@table @code
-@item BZ_CONFIG_ERROR
-Indicates that the library has been improperly compiled on your
-platform -- a major configuration error. Specifically, it means
-that @code{sizeof(char)}, @code{sizeof(short)} and @code{sizeof(int)}
-are not 1, 2 and 4 respectively, as they should be. Note that the
-library should still work properly on 64-bit platforms which follow
-the LP64 programming model -- that is, where @code{sizeof(long)}
-and @code{sizeof(void*)} are 8. Under LP64, @code{sizeof(int)} is
-still 4, so @code{libbzip2}, which doesn't use the @code{long} type,
-is OK.
-@item BZ_SEQUENCE_ERROR
-When using the library, it is important to call the functions in the
-correct sequence and with data structures (buffers etc) in the correct
-states. @code{libbzip2} checks as much as it can to ensure this is
-happening, and returns @code{BZ_SEQUENCE_ERROR} if not. Code which
-complies precisely with the function semantics, as detailed below,
-should never receive this value; such an event denotes buggy code
-which you should investigate.
-@item BZ_PARAM_ERROR
-Returned when a parameter to a function call is out of range
-or otherwise manifestly incorrect. As with @code{BZ_SEQUENCE_ERROR},
-this denotes a bug in the client code. The distinction between
-@code{BZ_PARAM_ERROR} and @code{BZ_SEQUENCE_ERROR} is a bit hazy, but still worth
-making.
-@item BZ_MEM_ERROR
-Returned when a request to allocate memory failed. Note that the
-quantity of memory needed to decompress a stream cannot be determined
-until the stream's header has been read. So @code{BZ2_bzDecompress} and
-@code{BZ2_bzRead} may return @code{BZ_MEM_ERROR} even though some of
-the compressed data has been read. The same is not true for
-compression; once @code{BZ2_bzCompressInit} or @code{BZ2_bzWriteOpen} have
-successfully completed, @code{BZ_MEM_ERROR} cannot occur.
-@item BZ_DATA_ERROR
-Returned when a data integrity error is detected during decompression.
-Most importantly, this means when stored and computed CRCs for the
-data do not match. This value is also returned upon detection of any
-other anomaly in the compressed data.
-@item BZ_DATA_ERROR_MAGIC
-As a special case of @code{BZ_DATA_ERROR}, it is sometimes useful to
-know when the compressed stream does not start with the correct
-magic bytes (@code{'B' 'Z' 'h'}).
-@item BZ_IO_ERROR
-Returned by @code{BZ2_bzRead} and @code{BZ2_bzWrite} when there is an error
-reading or writing in the compressed file, and by @code{BZ2_bzReadOpen}
-and @code{BZ2_bzWriteOpen} for attempts to use a file for which the
-error indicator (viz, @code{ferror(f)}) is set.
-On receipt of @code{BZ_IO_ERROR}, the caller should consult
-@code{errno} and/or @code{perror} to acquire operating-system
-specific information about the problem.
-@item BZ_UNEXPECTED_EOF
-Returned by @code{BZ2_bzRead} when the compressed file finishes
-before the logical end of stream is detected.
-@item BZ_OUTBUFF_FULL
-Returned by @code{BZ2_bzBuffToBuffCompress} and
-@code{BZ2_bzBuffToBuffDecompress} to indicate that the output data
-will not fit into the output buffer provided.
-@end table
-
-
-
-@section Low-level interface
-
-@subsection @code{BZ2_bzCompressInit}
-@example
-typedef
- struct @{
- char *next_in;
- unsigned int avail_in;
- unsigned int total_in_lo32;
- unsigned int total_in_hi32;
-
- char *next_out;
- unsigned int avail_out;
- unsigned int total_out_lo32;
- unsigned int total_out_hi32;
-
- void *state;
-
- void *(*bzalloc)(void *,int,int);
- void (*bzfree)(void *,void *);
- void *opaque;
- @}
- bz_stream;
-
-int BZ2_bzCompressInit ( bz_stream *strm,
- int blockSize100k,
- int verbosity,
- int workFactor );
-
-@end example
-
-Prepares for compression. The @code{bz_stream} structure
-holds all data pertaining to the compression activity.
-A @code{bz_stream} structure should be allocated and initialised
-prior to the call.
-The fields of @code{bz_stream}
-comprise the entirety of the user-visible data. @code{state}
-is a pointer to the private data structures required for compression.
-
-Custom memory allocators are supported, via fields @code{bzalloc},
-@code{bzfree},
-and @code{opaque}. The value
-@code{opaque} is passed to as the first argument to
-all calls to @code{bzalloc} and @code{bzfree}, but is
-otherwise ignored by the library.
-The call @code{bzalloc ( opaque, n, m )} is expected to return a
-pointer @code{p} to
-@code{n * m} bytes of memory, and @code{bzfree ( opaque, p )}
-should free
-that memory.
-
-If you don't want to use a custom memory allocator, set @code{bzalloc},
-@code{bzfree} and
-@code{opaque} to @code{NULL},
-and the library will then use the standard @code{malloc}/@code{free}
-routines.
-
-Before calling @code{BZ2_bzCompressInit}, fields @code{bzalloc},
-@code{bzfree} and @code{opaque} should
-be filled appropriately, as just described. Upon return, the internal
-state will have been allocated and initialised, and @code{total_in_lo32},
-@code{total_in_hi32}, @code{total_out_lo32} and
-@code{total_out_hi32} will have been set to zero.
-These four fields are used by the library
-to inform the caller of the total amount of data passed into and out of
-the library, respectively. You should not try to change them.
-As of version 1.0, 64-bit counts are maintained, even on 32-bit
-platforms, using the @code{_hi32} fields to store the upper 32 bits
-of the count. So, for example, the total amount of data in
-is @code{(total_in_hi32 << 32) + total_in_lo32}.
-
-Parameter @code{blockSize100k} specifies the block size to be used for
-compression. It should be a value between 1 and 9 inclusive, and the
-actual block size used is 100000 x this figure. 9 gives the best
-compression but takes most memory.
-
-Parameter @code{verbosity} should be set to a number between 0 and 4
-inclusive. 0 is silent, and greater numbers give increasingly verbose
-monitoring/debugging output. If the library has been compiled with
-@code{-DBZ_NO_STDIO}, no such output will appear for any verbosity
-setting.
-
-Parameter @code{workFactor} controls how the compression phase behaves
-when presented with worst case, highly repetitive, input data. If
-compression runs into difficulties caused by repetitive data, the
-library switches from the standard sorting algorithm to a fallback
-algorithm. The fallback is slower than the standard algorithm by
-perhaps a factor of three, but always behaves reasonably, no matter how
-bad the input.
-
-Lower values of @code{workFactor} reduce the amount of effort the
-standard algorithm will expend before resorting to the fallback. You
-should set this parameter carefully; too low, and many inputs will be
-handled by the fallback algorithm and so compress rather slowly, too
-high, and your average-to-worst case compression times can become very
-large. The default value of 30 gives reasonable behaviour over a wide
-range of circumstances.
-
-Allowable values range from 0 to 250 inclusive. 0 is a special case,
-equivalent to using the default value of 30.
-
-Note that the compressed output generated is the same regardless of
-whether or not the fallback algorithm is used.
-
-Be aware also that this parameter may disappear entirely in future
-versions of the library. In principle it should be possible to devise a
-good way to automatically choose which algorithm to use. Such a
-mechanism would render the parameter obsolete.
-
-Possible return values:
-@display
- @code{BZ_CONFIG_ERROR}
- if the library has been mis-compiled
- @code{BZ_PARAM_ERROR}
- if @code{strm} is @code{NULL}
- or @code{blockSize} < 1 or @code{blockSize} > 9
- or @code{verbosity} < 0 or @code{verbosity} > 4
- or @code{workFactor} < 0 or @code{workFactor} > 250
- @code{BZ_MEM_ERROR}
- if not enough memory is available
- @code{BZ_OK}
- otherwise
-@end display
-Allowable next actions:
-@display
- @code{BZ2_bzCompress}
- if @code{BZ_OK} is returned
- no specific action needed in case of error
-@end display
-
-@subsection @code{BZ2_bzCompress}
-@example
- int BZ2_bzCompress ( bz_stream *strm, int action );
-@end example
-Provides more input and/or output buffer space for the library. The
-caller maintains input and output buffers, and calls @code{BZ2_bzCompress} to
-transfer data between them.
-
-Before each call to @code{BZ2_bzCompress}, @code{next_in} should point at
-the data to be compressed, and @code{avail_in} should indicate how many
-bytes the library may read. @code{BZ2_bzCompress} updates @code{next_in},
-@code{avail_in} and @code{total_in} to reflect the number of bytes it
-has read.
-
-Similarly, @code{next_out} should point to a buffer in which the
-compressed data is to be placed, with @code{avail_out} indicating how
-much output space is available. @code{BZ2_bzCompress} updates
-@code{next_out}, @code{avail_out} and @code{total_out} to reflect the
-number of bytes output.
-
-You may provide and remove as little or as much data as you like on each
-call of @code{BZ2_bzCompress}. In the limit, it is acceptable to supply and
-remove data one byte at a time, although this would be terribly
-inefficient. You should always ensure that at least one byte of output
-space is available at each call.
-
-A second purpose of @code{BZ2_bzCompress} is to request a change of mode of the
-compressed stream.
-
-Conceptually, a compressed stream can be in one of four states: IDLE,
-RUNNING, FLUSHING and FINISHING. Before initialisation
-(@code{BZ2_bzCompressInit}) and after termination (@code{BZ2_bzCompressEnd}), a
-stream is regarded as IDLE.
-
-Upon initialisation (@code{BZ2_bzCompressInit}), the stream is placed in the
-RUNNING state. Subsequent calls to @code{BZ2_bzCompress} should pass
-@code{BZ_RUN} as the requested action; other actions are illegal and
-will result in @code{BZ_SEQUENCE_ERROR}.
-
-At some point, the calling program will have provided all the input data
-it wants to. It will then want to finish up -- in effect, asking the
-library to process any data it might have buffered internally. In this
-state, @code{BZ2_bzCompress} will no longer attempt to read data from
-@code{next_in}, but it will want to write data to @code{next_out}.
-Because the output buffer supplied by the user can be arbitrarily small,
-the finishing-up operation cannot necessarily be done with a single call
-of @code{BZ2_bzCompress}.
-
-Instead, the calling program passes @code{BZ_FINISH} as an action to
-@code{BZ2_bzCompress}. This changes the stream's state to FINISHING. Any
-remaining input (ie, @code{next_in[0 .. avail_in-1]}) is compressed and
-transferred to the output buffer. To do this, @code{BZ2_bzCompress} must be
-called repeatedly until all the output has been consumed. At that
-point, @code{BZ2_bzCompress} returns @code{BZ_STREAM_END}, and the stream's
-state is set back to IDLE. @code{BZ2_bzCompressEnd} should then be
-called.
-
-Just to make sure the calling program does not cheat, the library makes
-a note of @code{avail_in} at the time of the first call to
-@code{BZ2_bzCompress} which has @code{BZ_FINISH} as an action (ie, at the
-time the program has announced its intention to not supply any more
-input). By comparing this value with that of @code{avail_in} over
-subsequent calls to @code{BZ2_bzCompress}, the library can detect any
-attempts to slip in more data to compress. Any calls for which this is
-detected will return @code{BZ_SEQUENCE_ERROR}. This indicates a
-programming mistake which should be corrected.
-
-Instead of asking to finish, the calling program may ask
-@code{BZ2_bzCompress} to take all the remaining input, compress it and
-terminate the current (Burrows-Wheeler) compression block. This could
-be useful for error control purposes. The mechanism is analogous to
-that for finishing: call @code{BZ2_bzCompress} with an action of
-@code{BZ_FLUSH}, remove output data, and persist with the
-@code{BZ_FLUSH} action until the value @code{BZ_RUN} is returned. As
-with finishing, @code{BZ2_bzCompress} detects any attempt to provide more
-input data once the flush has begun.
-
-Once the flush is complete, the stream returns to the normal RUNNING
-state.
-
-This all sounds pretty complex, but isn't really. Here's a table
-which shows which actions are allowable in each state, what action
-will be taken, what the next state is, and what the non-error return
-values are. Note that you can't explicitly ask what state the
-stream is in, but nor do you need to -- it can be inferred from the
-values returned by @code{BZ2_bzCompress}.
-@display
-IDLE/@code{any}
- Illegal. IDLE state only exists after @code{BZ2_bzCompressEnd} or
- before @code{BZ2_bzCompressInit}.
- Return value = @code{BZ_SEQUENCE_ERROR}
-
-RUNNING/@code{BZ_RUN}
- Compress from @code{next_in} to @code{next_out} as much as possible.
- Next state = RUNNING
- Return value = @code{BZ_RUN_OK}
-
-RUNNING/@code{BZ_FLUSH}
- Remember current value of @code{next_in}. Compress from @code{next_in}
- to @code{next_out} as much as possible, but do not accept any more input.
- Next state = FLUSHING
- Return value = @code{BZ_FLUSH_OK}
-
-RUNNING/@code{BZ_FINISH}
- Remember current value of @code{next_in}. Compress from @code{next_in}
- to @code{next_out} as much as possible, but do not accept any more input.
- Next state = FINISHING
- Return value = @code{BZ_FINISH_OK}
-
-FLUSHING/@code{BZ_FLUSH}
- Compress from @code{next_in} to @code{next_out} as much as possible,
- but do not accept any more input.
- If all the existing input has been used up and all compressed
- output has been removed
- Next state = RUNNING; Return value = @code{BZ_RUN_OK}
- else
- Next state = FLUSHING; Return value = @code{BZ_FLUSH_OK}
-
-FLUSHING/other
- Illegal.
- Return value = @code{BZ_SEQUENCE_ERROR}
-
-FINISHING/@code{BZ_FINISH}
- Compress from @code{next_in} to @code{next_out} as much as possible,
- but to not accept any more input.
- If all the existing input has been used up and all compressed
- output has been removed
- Next state = IDLE; Return value = @code{BZ_STREAM_END}
- else
- Next state = FINISHING; Return value = @code{BZ_FINISHING}
-
-FINISHING/other
- Illegal.
- Return value = @code{BZ_SEQUENCE_ERROR}
-@end display
-
-That still looks complicated? Well, fair enough. The usual sequence
-of calls for compressing a load of data is:
-@itemize @bullet
-@item Get started with @code{BZ2_bzCompressInit}.
-@item Shovel data in and shlurp out its compressed form using zero or more
-calls of @code{BZ2_bzCompress} with action = @code{BZ_RUN}.
-@item Finish up.
-Repeatedly call @code{BZ2_bzCompress} with action = @code{BZ_FINISH},
-copying out the compressed output, until @code{BZ_STREAM_END} is returned.
-@item Close up and go home. Call @code{BZ2_bzCompressEnd}.
-@end itemize
-If the data you want to compress fits into your input buffer all
-at once, you can skip the calls of @code{BZ2_bzCompress ( ..., BZ_RUN )} and
-just do the @code{BZ2_bzCompress ( ..., BZ_FINISH )} calls.
-
-All required memory is allocated by @code{BZ2_bzCompressInit}. The
-compression library can accept any data at all (obviously). So you
-shouldn't get any error return values from the @code{BZ2_bzCompress} calls.
-If you do, they will be @code{BZ_SEQUENCE_ERROR}, and indicate a bug in
-your programming.
-
-Trivial other possible return values:
-@display
- @code{BZ_PARAM_ERROR}
- if @code{strm} is @code{NULL}, or @code{strm->s} is @code{NULL}
-@end display
-
-@subsection @code{BZ2_bzCompressEnd}
-@example
-int BZ2_bzCompressEnd ( bz_stream *strm );
-@end example
-Releases all memory associated with a compression stream.
-
-Possible return values:
-@display
- @code{BZ_PARAM_ERROR} if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL}
- @code{BZ_OK} otherwise
-@end display
-
-
-@subsection @code{BZ2_bzDecompressInit}
-@example
-int BZ2_bzDecompressInit ( bz_stream *strm, int verbosity, int small );
-@end example
-Prepares for decompression. As with @code{BZ2_bzCompressInit}, a
-@code{bz_stream} record should be allocated and initialised before the
-call. Fields @code{bzalloc}, @code{bzfree} and @code{opaque} should be
-set if a custom memory allocator is required, or made @code{NULL} for
-the normal @code{malloc}/@code{free} routines. Upon return, the internal
-state will have been initialised, and @code{total_in} and
-@code{total_out} will be zero.
-
-For the meaning of parameter @code{verbosity}, see @code{BZ2_bzCompressInit}.
-
-If @code{small} is nonzero, the library will use an alternative
-decompression algorithm which uses less memory but at the cost of
-decompressing more slowly (roughly speaking, half the speed, but the
-maximum memory requirement drops to around 2300k). See Chapter 2 for
-more information on memory management.
-
-Note that the amount of memory needed to decompress
-a stream cannot be determined until the stream's header has been read,
-so even if @code{BZ2_bzDecompressInit} succeeds, a subsequent
-@code{BZ2_bzDecompress} could fail with @code{BZ_MEM_ERROR}.
-
-Possible return values:
-@display
- @code{BZ_CONFIG_ERROR}
- if the library has been mis-compiled
- @code{BZ_PARAM_ERROR}
- if @code{(small != 0 && small != 1)}
- or @code{(verbosity < 0 || verbosity > 4)}
- @code{BZ_MEM_ERROR}
- if insufficient memory is available
-@end display
-
-Allowable next actions:
-@display
- @code{BZ2_bzDecompress}
- if @code{BZ_OK} was returned
- no specific action required in case of error
-@end display
-
-
-
-@subsection @code{BZ2_bzDecompress}
-@example
-int BZ2_bzDecompress ( bz_stream *strm );
-@end example
-Provides more input and/out output buffer space for the library. The
-caller maintains input and output buffers, and uses @code{BZ2_bzDecompress}
-to transfer data between them.
-
-Before each call to @code{BZ2_bzDecompress}, @code{next_in}
-should point at the compressed data,
-and @code{avail_in} should indicate how many bytes the library
-may read. @code{BZ2_bzDecompress} updates @code{next_in}, @code{avail_in}
-and @code{total_in}
-to reflect the number of bytes it has read.
-
-Similarly, @code{next_out} should point to a buffer in which the uncompressed
-output is to be placed, with @code{avail_out} indicating how much output space
-is available. @code{BZ2_bzCompress} updates @code{next_out},
-@code{avail_out} and @code{total_out} to reflect
-the number of bytes output.
-
-You may provide and remove as little or as much data as you like on
-each call of @code{BZ2_bzDecompress}.
-In the limit, it is acceptable to
-supply and remove data one byte at a time, although this would be
-terribly inefficient. You should always ensure that at least one
-byte of output space is available at each call.
-
-Use of @code{BZ2_bzDecompress} is simpler than @code{BZ2_bzCompress}.
-
-You should provide input and remove output as described above, and
-repeatedly call @code{BZ2_bzDecompress} until @code{BZ_STREAM_END} is
-returned. Appearance of @code{BZ_STREAM_END} denotes that
-@code{BZ2_bzDecompress} has detected the logical end of the compressed
-stream. @code{BZ2_bzDecompress} will not produce @code{BZ_STREAM_END} until
-all output data has been placed into the output buffer, so once
-@code{BZ_STREAM_END} appears, you are guaranteed to have available all
-the decompressed output, and @code{BZ2_bzDecompressEnd} can safely be
-called.
-
-If case of an error return value, you should call @code{BZ2_bzDecompressEnd}
-to clean up and release memory.
-
-Possible return values:
-@display
- @code{BZ_PARAM_ERROR}
- if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL}
- or @code{strm->avail_out < 1}
- @code{BZ_DATA_ERROR}
- if a data integrity error is detected in the compressed stream
- @code{BZ_DATA_ERROR_MAGIC}
- if the compressed stream doesn't begin with the right magic bytes
- @code{BZ_MEM_ERROR}
- if there wasn't enough memory available
- @code{BZ_STREAM_END}
- if the logical end of the data stream was detected and all
- output in has been consumed, eg @code{s->avail_out > 0}
- @code{BZ_OK}
- otherwise
-@end display
-Allowable next actions:
-@display
- @code{BZ2_bzDecompress}
- if @code{BZ_OK} was returned
- @code{BZ2_bzDecompressEnd}
- otherwise
-@end display
-
-
-@subsection @code{BZ2_bzDecompressEnd}
-@example
-int BZ2_bzDecompressEnd ( bz_stream *strm );
-@end example
-Releases all memory associated with a decompression stream.
-
-Possible return values:
-@display
- @code{BZ_PARAM_ERROR}
- if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL}
- @code{BZ_OK}
- otherwise
-@end display
-
-Allowable next actions:
-@display
- None.
-@end display
-
-
-@section High-level interface
-
-This interface provides functions for reading and writing
-@code{bzip2} format files. First, some general points.
-
-@itemize @bullet
-@item All of the functions take an @code{int*} first argument,
- @code{bzerror}.
- After each call, @code{bzerror} should be consulted first to determine
- the outcome of the call. If @code{bzerror} is @code{BZ_OK},
- the call completed
- successfully, and only then should the return value of the function
- (if any) be consulted. If @code{bzerror} is @code{BZ_IO_ERROR},
- there was an error
- reading/writing the underlying compressed file, and you should
- then consult @code{errno}/@code{perror} to determine the
- cause of the difficulty.
- @code{bzerror} may also be set to various other values; precise details are
- given on a per-function basis below.
-@item If @code{bzerror} indicates an error
- (ie, anything except @code{BZ_OK} and @code{BZ_STREAM_END}),
- you should immediately call @code{BZ2_bzReadClose} (or @code{BZ2_bzWriteClose},
- depending on whether you are attempting to read or to write)
- to free up all resources associated
- with the stream. Once an error has been indicated, behaviour of all calls
- except @code{BZ2_bzReadClose} (@code{BZ2_bzWriteClose}) is undefined.
- The implication is that (1) @code{bzerror} should
- be checked after each call, and (2) if @code{bzerror} indicates an error,
- @code{BZ2_bzReadClose} (@code{BZ2_bzWriteClose}) should then be called to clean up.
-@item The @code{FILE*} arguments passed to
- @code{BZ2_bzReadOpen}/@code{BZ2_bzWriteOpen}
- should be set to binary mode.
- Most Unix systems will do this by default, but other platforms,
- including Windows and Mac, will not. If you omit this, you may
- encounter problems when moving code to new platforms.
-@item Memory allocation requests are handled by
- @code{malloc}/@code{free}.
- At present
- there is no facility for user-defined memory allocators in the file I/O
- functions (could easily be added, though).
-@end itemize
-
-
-
-@subsection @code{BZ2_bzReadOpen}
-@example
- typedef void BZFILE;
-
- BZFILE *BZ2_bzReadOpen ( int *bzerror, FILE *f,
- int small, int verbosity,
- void *unused, int nUnused );
-@end example
-Prepare to read compressed data from file handle @code{f}. @code{f}
-should refer to a file which has been opened for reading, and for which
-the error indicator (@code{ferror(f)})is not set. If @code{small} is 1,
-the library will try to decompress using less memory, at the expense of
-speed.
-
-For reasons explained below, @code{BZ2_bzRead} will decompress the
-@code{nUnused} bytes starting at @code{unused}, before starting to read
-from the file @code{f}. At most @code{BZ_MAX_UNUSED} bytes may be
-supplied like this. If this facility is not required, you should pass
-@code{NULL} and @code{0} for @code{unused} and n@code{Unused}
-respectively.
-
-For the meaning of parameters @code{small} and @code{verbosity},
-see @code{BZ2_bzDecompressInit}.
-
-The amount of memory needed to decompress a file cannot be determined
-until the file's header has been read. So it is possible that
-@code{BZ2_bzReadOpen} returns @code{BZ_OK} but a subsequent call of
-@code{BZ2_bzRead} will return @code{BZ_MEM_ERROR}.
-
-Possible assignments to @code{bzerror}:
-@display
- @code{BZ_CONFIG_ERROR}
- if the library has been mis-compiled
- @code{BZ_PARAM_ERROR}
- if @code{f} is @code{NULL}
- or @code{small} is neither @code{0} nor @code{1}
- or @code{(unused == NULL && nUnused != 0)}
- or @code{(unused != NULL && !(0 <= nUnused <= BZ_MAX_UNUSED))}
- @code{BZ_IO_ERROR}
- if @code{ferror(f)} is nonzero
- @code{BZ_MEM_ERROR}
- if insufficient memory is available
- @code{BZ_OK}
- otherwise.
-@end display
-
-Possible return values:
-@display
- Pointer to an abstract @code{BZFILE}
- if @code{bzerror} is @code{BZ_OK}
- @code{NULL}
- otherwise
-@end display
-
-Allowable next actions:
-@display
- @code{BZ2_bzRead}
- if @code{bzerror} is @code{BZ_OK}
- @code{BZ2_bzClose}
- otherwise
-@end display
-
-
-@subsection @code{BZ2_bzRead}
-@example
- int BZ2_bzRead ( int *bzerror, BZFILE *b, void *buf, int len );
-@end example
-Reads up to @code{len} (uncompressed) bytes from the compressed file
-@code{b} into
-the buffer @code{buf}. If the read was successful,
-@code{bzerror} is set to @code{BZ_OK}
-and the number of bytes read is returned. If the logical end-of-stream
-was detected, @code{bzerror} will be set to @code{BZ_STREAM_END},
-and the number
-of bytes read is returned. All other @code{bzerror} values denote an error.
-
-@code{BZ2_bzRead} will supply @code{len} bytes,
-unless the logical stream end is detected
-or an error occurs. Because of this, it is possible to detect the
-stream end by observing when the number of bytes returned is
-less than the number
-requested. Nevertheless, this is regarded as inadvisable; you should
-instead check @code{bzerror} after every call and watch out for
-@code{BZ_STREAM_END}.
-
-Internally, @code{BZ2_bzRead} copies data from the compressed file in chunks
-of size @code{BZ_MAX_UNUSED} bytes
-before decompressing it. If the file contains more bytes than strictly
-needed to reach the logical end-of-stream, @code{BZ2_bzRead} will almost certainly
-read some of the trailing data before signalling @code{BZ_SEQUENCE_END}.
-To collect the read but unused data once @code{BZ_SEQUENCE_END} has
-appeared, call @code{BZ2_bzReadGetUnused} immediately before @code{BZ2_bzReadClose}.
-
-Possible assignments to @code{bzerror}:
-@display
- @code{BZ_PARAM_ERROR}
- if @code{b} is @code{NULL} or @code{buf} is @code{NULL} or @code{len < 0}
- @code{BZ_SEQUENCE_ERROR}
- if @code{b} was opened with @code{BZ2_bzWriteOpen}
- @code{BZ_IO_ERROR}
- if there is an error reading from the compressed file
- @code{BZ_UNEXPECTED_EOF}
- if the compressed file ended before the logical end-of-stream was detected
- @code{BZ_DATA_ERROR}
- if a data integrity error was detected in the compressed stream
- @code{BZ_DATA_ERROR_MAGIC}
- if the stream does not begin with the requisite header bytes (ie, is not
- a @code{bzip2} data file). This is really a special case of @code{BZ_DATA_ERROR}.
- @code{BZ_MEM_ERROR}
- if insufficient memory was available
- @code{BZ_STREAM_END}
- if the logical end of stream was detected.
- @code{BZ_OK}
- otherwise.
-@end display
-
-Possible return values:
-@display
- number of bytes read
- if @code{bzerror} is @code{BZ_OK} or @code{BZ_STREAM_END}
- undefined
- otherwise
-@end display
-
-Allowable next actions:
-@display
- collect data from @code{buf}, then @code{BZ2_bzRead} or @code{BZ2_bzReadClose}
- if @code{bzerror} is @code{BZ_OK}
- collect data from @code{buf}, then @code{BZ2_bzReadClose} or @code{BZ2_bzReadGetUnused}
- if @code{bzerror} is @code{BZ_SEQUENCE_END}
- @code{BZ2_bzReadClose}
- otherwise
-@end display
-
-
-
-@subsection @code{BZ2_bzReadGetUnused}
-@example
- void BZ2_bzReadGetUnused ( int* bzerror, BZFILE *b,
- void** unused, int* nUnused );
-@end example
-Returns data which was read from the compressed file but was not needed
-to get to the logical end-of-stream. @code{*unused} is set to the address
-of the data, and @code{*nUnused} to the number of bytes. @code{*nUnused} will
-be set to a value between @code{0} and @code{BZ_MAX_UNUSED} inclusive.
-
-This function may only be called once @code{BZ2_bzRead} has signalled
-@code{BZ_STREAM_END} but before @code{BZ2_bzReadClose}.
-
-Possible assignments to @code{bzerror}:
-@display
- @code{BZ_PARAM_ERROR}
- if @code{b} is @code{NULL}
- or @code{unused} is @code{NULL} or @code{nUnused} is @code{NULL}
- @code{BZ_SEQUENCE_ERROR}
- if @code{BZ_STREAM_END} has not been signalled
- or if @code{b} was opened with @code{BZ2_bzWriteOpen}
- @code{BZ_OK}
- otherwise
-@end display
-
-Allowable next actions:
-@display
- @code{BZ2_bzReadClose}
-@end display
-
-
-@subsection @code{BZ2_bzReadClose}
-@example
- void BZ2_bzReadClose ( int *bzerror, BZFILE *b );
-@end example
-Releases all memory pertaining to the compressed file @code{b}.
-@code{BZ2_bzReadClose} does not call @code{fclose} on the underlying file
-handle, so you should do that yourself if appropriate.
-@code{BZ2_bzReadClose} should be called to clean up after all error
-situations.
-
-Possible assignments to @code{bzerror}:
-@display
- @code{BZ_SEQUENCE_ERROR}
- if @code{b} was opened with @code{BZ2_bzOpenWrite}
- @code{BZ_OK}
- otherwise
-@end display
-
-Allowable next actions:
-@display
- none
-@end display
-
-
-
-@subsection @code{BZ2_bzWriteOpen}
-@example
- BZFILE *BZ2_bzWriteOpen ( int *bzerror, FILE *f,
- int blockSize100k, int verbosity,
- int workFactor );
-@end example
-Prepare to write compressed data to file handle @code{f}.
-@code{f} should refer to
-a file which has been opened for writing, and for which the error
-indicator (@code{ferror(f)})is not set.
-
-For the meaning of parameters @code{blockSize100k},
-@code{verbosity} and @code{workFactor}, see
-@* @code{BZ2_bzCompressInit}.
-
-All required memory is allocated at this stage, so if the call
-completes successfully, @code{BZ_MEM_ERROR} cannot be signalled by a
-subsequent call to @code{BZ2_bzWrite}.
-
-Possible assignments to @code{bzerror}:
-@display
- @code{BZ_CONFIG_ERROR}
- if the library has been mis-compiled
- @code{BZ_PARAM_ERROR}
- if @code{f} is @code{NULL}
- or @code{blockSize100k < 1} or @code{blockSize100k > 9}
- @code{BZ_IO_ERROR}
- if @code{ferror(f)} is nonzero
- @code{BZ_MEM_ERROR}
- if insufficient memory is available
- @code{BZ_OK}
- otherwise
-@end display
-
-Possible return values:
-@display
- Pointer to an abstract @code{BZFILE}
- if @code{bzerror} is @code{BZ_OK}
- @code{NULL}
- otherwise
-@end display
-
-Allowable next actions:
-@display
- @code{BZ2_bzWrite}
- if @code{bzerror} is @code{BZ_OK}
- (you could go directly to @code{BZ2_bzWriteClose}, but this would be pretty pointless)
- @code{BZ2_bzWriteClose}
- otherwise
-@end display
-
-
-
-@subsection @code{BZ2_bzWrite}
-@example
- void BZ2_bzWrite ( int *bzerror, BZFILE *b, void *buf, int len );
-@end example
-Absorbs @code{len} bytes from the buffer @code{buf}, eventually to be
-compressed and written to the file.
-
-Possible assignments to @code{bzerror}:
-@display
- @code{BZ_PARAM_ERROR}
- if @code{b} is @code{NULL} or @code{buf} is @code{NULL} or @code{len < 0}
- @code{BZ_SEQUENCE_ERROR}
- if b was opened with @code{BZ2_bzReadOpen}
- @code{BZ_IO_ERROR}
- if there is an error writing the compressed file.
- @code{BZ_OK}
- otherwise
-@end display
-
-
-
-
-@subsection @code{BZ2_bzWriteClose}
-@example
- void BZ2_bzWriteClose ( int *bzerror, BZFILE* f,
- int abandon,
- unsigned int* nbytes_in,
- unsigned int* nbytes_out );
-
- void BZ2_bzWriteClose64 ( int *bzerror, BZFILE* f,
- int abandon,
- unsigned int* nbytes_in_lo32,
- unsigned int* nbytes_in_hi32,
- unsigned int* nbytes_out_lo32,
- unsigned int* nbytes_out_hi32 );
-@end example
-
-Compresses and flushes to the compressed file all data so far supplied
-by @code{BZ2_bzWrite}. The logical end-of-stream markers are also written, so
-subsequent calls to @code{BZ2_bzWrite} are illegal. All memory associated
-with the compressed file @code{b} is released.
-@code{fflush} is called on the
-compressed file, but it is not @code{fclose}'d.
-
-If @code{BZ2_bzWriteClose} is called to clean up after an error, the only
-action is to release the memory. The library records the error codes
-issued by previous calls, so this situation will be detected
-automatically. There is no attempt to complete the compression
-operation, nor to @code{fflush} the compressed file. You can force this
-behaviour to happen even in the case of no error, by passing a nonzero
-value to @code{abandon}.
-
-If @code{nbytes_in} is non-null, @code{*nbytes_in} will be set to be the
-total volume of uncompressed data handled. Similarly, @code{nbytes_out}
-will be set to the total volume of compressed data written. For
-compatibility with older versions of the library, @code{BZ2_bzWriteClose}
-only yields the lower 32 bits of these counts. Use
-@code{BZ2_bzWriteClose64} if you want the full 64 bit counts. These
-two functions are otherwise absolutely identical.
-
-
-Possible assignments to @code{bzerror}:
-@display
- @code{BZ_SEQUENCE_ERROR}
- if @code{b} was opened with @code{BZ2_bzReadOpen}
- @code{BZ_IO_ERROR}
- if there is an error writing the compressed file
- @code{BZ_OK}
- otherwise
-@end display
-
-@subsection Handling embedded compressed data streams
-
-The high-level library facilitates use of
-@code{bzip2} data streams which form some part of a surrounding, larger
-data stream.
-@itemize @bullet
-@item For writing, the library takes an open file handle, writes
-compressed data to it, @code{fflush}es it but does not @code{fclose} it.
-The calling application can write its own data before and after the
-compressed data stream, using that same file handle.
-@item Reading is more complex, and the facilities are not as general
-as they could be since generality is hard to reconcile with efficiency.
-@code{BZ2_bzRead} reads from the compressed file in blocks of size
-@code{BZ_MAX_UNUSED} bytes, and in doing so probably will overshoot
-the logical end of compressed stream.
-To recover this data once decompression has
-ended, call @code{BZ2_bzReadGetUnused} after the last call of @code{BZ2_bzRead}
-(the one returning @code{BZ_STREAM_END}) but before calling
-@code{BZ2_bzReadClose}.
-@end itemize
-
-This mechanism makes it easy to decompress multiple @code{bzip2}
-streams placed end-to-end. As the end of one stream, when @code{BZ2_bzRead}
-returns @code{BZ_STREAM_END}, call @code{BZ2_bzReadGetUnused} to collect the
-unused data (copy it into your own buffer somewhere).
-That data forms the start of the next compressed stream.
-To start uncompressing that next stream, call @code{BZ2_bzReadOpen} again,
-feeding in the unused data via the @code{unused}/@code{nUnused}
-parameters.
-Keep doing this until @code{BZ_STREAM_END} return coincides with the
-physical end of file (@code{feof(f)}). In this situation
-@code{BZ2_bzReadGetUnused}
-will of course return no data.
-
-This should give some feel for how the high-level interface can be used.
-If you require extra flexibility, you'll have to bite the bullet and get
-to grips with the low-level interface.
-
-@subsection Standard file-reading/writing code
-Here's how you'd write data to a compressed file:
-@example @code
-FILE* f;
-BZFILE* b;
-int nBuf;
-char buf[ /* whatever size you like */ ];
-int bzerror;
-int nWritten;
-
-f = fopen ( "myfile.bz2", "w" );
-if (!f) @{
- /* handle error */
-@}
-b = BZ2_bzWriteOpen ( &bzerror, f, 9 );
-if (bzerror != BZ_OK) @{
- BZ2_bzWriteClose ( b );
- /* handle error */
-@}
-
-while ( /* condition */ ) @{
- /* get data to write into buf, and set nBuf appropriately */
- nWritten = BZ2_bzWrite ( &bzerror, b, buf, nBuf );
- if (bzerror == BZ_IO_ERROR) @{
- BZ2_bzWriteClose ( &bzerror, b );
- /* handle error */
- @}
-@}
-
-BZ2_bzWriteClose ( &bzerror, b );
-if (bzerror == BZ_IO_ERROR) @{
- /* handle error */
-@}
-@end example
-And to read from a compressed file:
-@example
-FILE* f;
-BZFILE* b;
-int nBuf;
-char buf[ /* whatever size you like */ ];
-int bzerror;
-int nWritten;
-
-f = fopen ( "myfile.bz2", "r" );
-if (!f) @{
- /* handle error */
-@}
-b = BZ2_bzReadOpen ( &bzerror, f, 0, NULL, 0 );
-if (bzerror != BZ_OK) @{
- BZ2_bzReadClose ( &bzerror, b );
- /* handle error */
-@}
-
-bzerror = BZ_OK;
-while (bzerror == BZ_OK && /* arbitrary other conditions */) @{
- nBuf = BZ2_bzRead ( &bzerror, b, buf, /* size of buf */ );
- if (bzerror == BZ_OK) @{
- /* do something with buf[0 .. nBuf-1] */
- @}
-@}
-if (bzerror != BZ_STREAM_END) @{
- BZ2_bzReadClose ( &bzerror, b );
- /* handle error */
-@} else @{
- BZ2_bzReadClose ( &bzerror );
-@}
-@end example
-
-
-
-@section Utility functions
-@subsection @code{BZ2_bzBuffToBuffCompress}
-@example
- int BZ2_bzBuffToBuffCompress( char* dest,
- unsigned int* destLen,
- char* source,
- unsigned int sourceLen,
- int blockSize100k,
- int verbosity,
- int workFactor );
-@end example
-Attempts to compress the data in @code{source[0 .. sourceLen-1]}
-into the destination buffer, @code{dest[0 .. *destLen-1]}.
-If the destination buffer is big enough, @code{*destLen} is
-set to the size of the compressed data, and @code{BZ_OK} is
-returned. If the compressed data won't fit, @code{*destLen}
-is unchanged, and @code{BZ_OUTBUFF_FULL} is returned.
-
-Compression in this manner is a one-shot event, done with a single call
-to this function. The resulting compressed data is a complete
-@code{bzip2} format data stream. There is no mechanism for making
-additional calls to provide extra input data. If you want that kind of
-mechanism, use the low-level interface.
-
-For the meaning of parameters @code{blockSize100k}, @code{verbosity}
-and @code{workFactor}, @* see @code{BZ2_bzCompressInit}.
-
-To guarantee that the compressed data will fit in its buffer, allocate
-an output buffer of size 1% larger than the uncompressed data, plus
-six hundred extra bytes.
-
-@code{BZ2_bzBuffToBuffDecompress} will not write data at or
-beyond @code{dest[*destLen]}, even in case of buffer overflow.
-
-Possible return values:
-@display
- @code{BZ_CONFIG_ERROR}
- if the library has been mis-compiled
- @code{BZ_PARAM_ERROR}
- if @code{dest} is @code{NULL} or @code{destLen} is @code{NULL}
- or @code{blockSize100k < 1} or @code{blockSize100k > 9}
- or @code{verbosity < 0} or @code{verbosity > 4}
- or @code{workFactor < 0} or @code{workFactor > 250}
- @code{BZ_MEM_ERROR}
- if insufficient memory is available
- @code{BZ_OUTBUFF_FULL}
- if the size of the compressed data exceeds @code{*destLen}
- @code{BZ_OK}
- otherwise
-@end display
-
-
-
-@subsection @code{BZ2_bzBuffToBuffDecompress}
-@example
- int BZ2_bzBuffToBuffDecompress ( char* dest,
- unsigned int* destLen,
- char* source,
- unsigned int sourceLen,
- int small,
- int verbosity );
-@end example
-Attempts to decompress the data in @code{source[0 .. sourceLen-1]}
-into the destination buffer, @code{dest[0 .. *destLen-1]}.
-If the destination buffer is big enough, @code{*destLen} is
-set to the size of the uncompressed data, and @code{BZ_OK} is
-returned. If the compressed data won't fit, @code{*destLen}
-is unchanged, and @code{BZ_OUTBUFF_FULL} is returned.
-
-@code{source} is assumed to hold a complete @code{bzip2} format
-data stream. @* @code{BZ2_bzBuffToBuffDecompress} tries to decompress
-the entirety of the stream into the output buffer.
-
-For the meaning of parameters @code{small} and @code{verbosity},
-see @code{BZ2_bzDecompressInit}.
-
-Because the compression ratio of the compressed data cannot be known in
-advance, there is no easy way to guarantee that the output buffer will
-be big enough. You may of course make arrangements in your code to
-record the size of the uncompressed data, but such a mechanism is beyond
-the scope of this library.
-
-@code{BZ2_bzBuffToBuffDecompress} will not write data at or
-beyond @code{dest[*destLen]}, even in case of buffer overflow.
-
-Possible return values:
-@display
- @code{BZ_CONFIG_ERROR}
- if the library has been mis-compiled
- @code{BZ_PARAM_ERROR}
- if @code{dest} is @code{NULL} or @code{destLen} is @code{NULL}
- or @code{small != 0 && small != 1}
- or @code{verbosity < 0} or @code{verbosity > 4}
- @code{BZ_MEM_ERROR}
- if insufficient memory is available
- @code{BZ_OUTBUFF_FULL}
- if the size of the compressed data exceeds @code{*destLen}
- @code{BZ_DATA_ERROR}
- if a data integrity error was detected in the compressed data
- @code{BZ_DATA_ERROR_MAGIC}
- if the compressed data doesn't begin with the right magic bytes
- @code{BZ_UNEXPECTED_EOF}
- if the compressed data ends unexpectedly
- @code{BZ_OK}
- otherwise
-@end display
-
-
-
-@section @code{zlib} compatibility functions
-Yoshioka Tsuneo has contributed some functions to
-give better @code{zlib} compatibility. These functions are
-@code{BZ2_bzopen}, @code{BZ2_bzread}, @code{BZ2_bzwrite}, @code{BZ2_bzflush},
-@code{BZ2_bzclose},
-@code{BZ2_bzerror} and @code{BZ2_bzlibVersion}.
-These functions are not (yet) officially part of
-the library. If they break, you get to keep all the pieces.
-Nevertheless, I think they work ok.
-@example
-typedef void BZFILE;
-
-const char * BZ2_bzlibVersion ( void );
-@end example
-Returns a string indicating the library version.
-@example
-BZFILE * BZ2_bzopen ( const char *path, const char *mode );
-BZFILE * BZ2_bzdopen ( int fd, const char *mode );
-@end example
-Opens a @code{.bz2} file for reading or writing, using either its name
-or a pre-existing file descriptor.
-Analogous to @code{fopen} and @code{fdopen}.
-@example
-int BZ2_bzread ( BZFILE* b, void* buf, int len );
-int BZ2_bzwrite ( BZFILE* b, void* buf, int len );
-@end example
-Reads/writes data from/to a previously opened @code{BZFILE}.
-Analogous to @code{fread} and @code{fwrite}.
-@example
-int BZ2_bzflush ( BZFILE* b );
-void BZ2_bzclose ( BZFILE* b );
-@end example
-Flushes/closes a @code{BZFILE}. @code{BZ2_bzflush} doesn't actually do
-anything. Analogous to @code{fflush} and @code{fclose}.
-
-@example
-const char * BZ2_bzerror ( BZFILE *b, int *errnum )
-@end example
-Returns a string describing the more recent error status of
-@code{b}, and also sets @code{*errnum} to its numerical value.
-
-
-@section Using the library in a @code{stdio}-free environment
-
-@subsection Getting rid of @code{stdio}
-
-In a deeply embedded application, you might want to use just
-the memory-to-memory functions. You can do this conveniently
-by compiling the library with preprocessor symbol @code{BZ_NO_STDIO}
-defined. Doing this gives you a library containing only the following
-eight functions:
-
-@code{BZ2_bzCompressInit}, @code{BZ2_bzCompress}, @code{BZ2_bzCompressEnd} @*
-@code{BZ2_bzDecompressInit}, @code{BZ2_bzDecompress}, @code{BZ2_bzDecompressEnd} @*
-@code{BZ2_bzBuffToBuffCompress}, @code{BZ2_bzBuffToBuffDecompress}
-
-When compiled like this, all functions will ignore @code{verbosity}
-settings.
-
-@subsection Critical error handling
-@code{libbzip2} contains a number of internal assertion checks which
-should, needless to say, never be activated. Nevertheless, if an
-assertion should fail, behaviour depends on whether or not the library
-was compiled with @code{BZ_NO_STDIO} set.
-
-For a normal compile, an assertion failure yields the message
-@example
- bzip2/libbzip2: internal error number N.
- This is a bug in bzip2/libbzip2, 1.0.2, 30-Dec-2001.
- Please report it to me at: jseward@@acm.org. If this happened
- when you were using some program which uses libbzip2 as a
- component, you should also report this bug to the author(s)
- of that program. Please make an effort to report this bug;
- timely and accurate bug reports eventually lead to higher
- quality software. Thanks. Julian Seward, 30 December 2001.
-@end example
-where @code{N} is some error code number. If @code{N == 1007}, it also
-prints some extra text advising the reader that unreliable memory is
-often associated with internal error 1007. (This is a
-frequently-observed-phenomenon with versions 1.0.0/1.0.1).
-
-@code{exit(3)} is then called.
-
-For a @code{stdio}-free library, assertion failures result
-in a call to a function declared as:
-@example
- extern void bz_internal_error ( int errcode );
-@end example
-The relevant code is passed as a parameter. You should supply
-such a function.
-
-In either case, once an assertion failure has occurred, any
-@code{bz_stream} records involved can be regarded as invalid.
-You should not attempt to resume normal operation with them.
-
-You may, of course, change critical error handling to suit
-your needs. As I said above, critical errors indicate bugs
-in the library and should not occur. All "normal" error
-situations are indicated via error return codes from functions,
-and can be recovered from.
-
-
-@section Making a Windows DLL
-Everything related to Windows has been contributed by Yoshioka Tsuneo
-@* (@code{QWF00133@@niftyserve.or.jp} /
-@code{tsuneo-y@@is.aist-nara.ac.jp}), so you should send your queries to
-him (but perhaps Cc: me, @code{jseward@@acm.org}).
-
-My vague understanding of what to do is: using Visual C++ 5.0,
-open the project file @code{libbz2.dsp}, and build. That's all.
-
-If you can't
-open the project file for some reason, make a new one, naming these files:
-@code{blocksort.c}, @code{bzlib.c}, @code{compress.c},
-@code{crctable.c}, @code{decompress.c}, @code{huffman.c}, @*
-@code{randtable.c} and @code{libbz2.def}. You will also need
-to name the header files @code{bzlib.h} and @code{bzlib_private.h}.
-
-If you don't use VC++, you may need to define the proprocessor symbol
-@code{_WIN32}.
-
-Finally, @code{dlltest.c} is a sample program using the DLL. It has a
-project file, @code{dlltest.dsp}.
-
-If you just want a makefile for Visual C, have a look at
-@code{makefile.msc}.
-
-Be aware that if you compile @code{bzip2} itself on Win32, you must set
-@code{BZ_UNIX} to 0 and @code{BZ_LCCWIN32} to 1, in the file
-@code{bzip2.c}, before compiling. Otherwise the resulting binary won't
-work correctly.
-
-I haven't tried any of this stuff myself, but it all looks plausible.
-
-
-
-@chapter Miscellanea
-
-These are just some random thoughts of mine. Your mileage may
-vary.
-
-@section Limitations of the compressed file format
-@code{bzip2-1.0}, @code{0.9.5} and @code{0.9.0}
-use exactly the same file format as the previous
-version, @code{bzip2-0.1}. This decision was made in the interests of
-stability. Creating yet another incompatible compressed file format
-would create further confusion and disruption for users.
-
-Nevertheless, this is not a painless decision. Development
-work since the release of @code{bzip2-0.1} in August 1997
-has shown complexities in the file format which slow down
-decompression and, in retrospect, are unnecessary. These are:
-@itemize @bullet
-@item The run-length encoder, which is the first of the
- compression transformations, is entirely irrelevant.
- The original purpose was to protect the sorting algorithm
- from the very worst case input: a string of repeated
- symbols. But algorithm steps Q6a and Q6b in the original
- Burrows-Wheeler technical report (SRC-124) show how
- repeats can be handled without difficulty in block
- sorting.
-@item The randomisation mechanism doesn't really need to be
- there. Udi Manber and Gene Myers published a suffix
- array construction algorithm a few years back, which
- can be employed to sort any block, no matter how
- repetitive, in O(N log N) time. Subsequent work by
- Kunihiko Sadakane has produced a derivative O(N (log N)^2)
- algorithm which usually outperforms the Manber-Myers
- algorithm.
-
- I could have changed to Sadakane's algorithm, but I find
- it to be slower than @code{bzip2}'s existing algorithm for
- most inputs, and the randomisation mechanism protects
- adequately against bad cases. I didn't think it was
- a good tradeoff to make. Partly this is due to the fact
- that I was not flooded with email complaints about
- @code{bzip2-0.1}'s performance on repetitive data, so
- perhaps it isn't a problem for real inputs.
-
- Probably the best long-term solution,
- and the one I have incorporated into 0.9.5 and above,
- is to use the existing sorting
- algorithm initially, and fall back to a O(N (log N)^2)
- algorithm if the standard algorithm gets into difficulties.
-@item The compressed file format was never designed to be
- handled by a library, and I have had to jump though
- some hoops to produce an efficient implementation of
- decompression. It's a bit hairy. Try passing
- @code{decompress.c} through the C preprocessor
- and you'll see what I mean. Much of this complexity
- could have been avoided if the compressed size of
- each block of data was recorded in the data stream.
-@item An Adler-32 checksum, rather than a CRC32 checksum,
- would be faster to compute.
-@end itemize
-It would be fair to say that the @code{bzip2} format was frozen
-before I properly and fully understood the performance
-consequences of doing so.
-
-Improvements which I was able to incorporate into
-0.9.0, despite using the same file format, are:
-@itemize @bullet
-@item Single array implementation of the inverse BWT. This
- significantly speeds up decompression, presumably
- because it reduces the number of cache misses.
-@item Faster inverse MTF transform for large MTF values. The
- new implementation is based on the notion of sliding blocks
- of values.
-@item @code{bzip2-0.9.0} now reads and writes files with @code{fread}
- and @code{fwrite}; version 0.1 used @code{putc} and @code{getc}.
- Duh! Well, you live and learn.
-
-@end itemize
-Further ahead, it would be nice
-to be able to do random access into files. This will
-require some careful design of compressed file formats.
-
-
-
-@section Portability issues
-After some consideration, I have decided not to use
-GNU @code{autoconf} to configure 0.9.5 or 1.0.
-
-@code{autoconf}, admirable and wonderful though it is,
-mainly assists with portability problems between Unix-like
-platforms. But @code{bzip2} doesn't have much in the way
-of portability problems on Unix; most of the difficulties appear
-when porting to the Mac, or to Microsoft's operating systems.
-@code{autoconf} doesn't help in those cases, and brings in a
-whole load of new complexity.
-
-Most people should be able to compile the library and program
-under Unix straight out-of-the-box, so to speak, especially
-if you have a version of GNU C available.
-
-There are a couple of @code{__inline__} directives in the code. GNU C
-(@code{gcc}) should be able to handle them. If you're not using
-GNU C, your C compiler shouldn't see them at all.
-If your compiler does, for some reason, see them and doesn't
-like them, just @code{#define} @code{__inline__} to be @code{/* */}. One
-easy way to do this is to compile with the flag @code{-D__inline__=},
-which should be understood by most Unix compilers.
-
-If you still have difficulties, try compiling with the macro
-@code{BZ_STRICT_ANSI} defined. This should enable you to build the
-library in a strictly ANSI compliant environment. Building the program
-itself like this is dangerous and not supported, since you remove
-@code{bzip2}'s checks against compressing directories, symbolic links,
-devices, and other not-really-a-file entities. This could cause
-filesystem corruption!
-
-One other thing: if you create a @code{bzip2} binary for public
-distribution, please try and link it statically (@code{gcc -s}). This
-avoids all sorts of library-version issues that others may encounter
-later on.
-
-If you build @code{bzip2} on Win32, you must set @code{BZ_UNIX} to 0 and
-@code{BZ_LCCWIN32} to 1, in the file @code{bzip2.c}, before compiling.
-Otherwise the resulting binary won't work correctly.
-
-
-
-@section Reporting bugs
-I tried pretty hard to make sure @code{bzip2} is
-bug free, both by design and by testing. Hopefully
-you'll never need to read this section for real.
-
-Nevertheless, if @code{bzip2} dies with a segmentation
-fault, a bus error or an internal assertion failure, it
-will ask you to email me a bug report. Experience with
-version 0.1 shows that almost all these problems can
-be traced to either compiler bugs or hardware problems.
-@itemize @bullet
-@item
-Recompile the program with no optimisation, and see if it
-works. And/or try a different compiler.
-I heard all sorts of stories about various flavours
-of GNU C (and other compilers) generating bad code for
-@code{bzip2}, and I've run across two such examples myself.
-
-2.7.X versions of GNU C are known to generate bad code from
-time to time, at high optimisation levels.
-If you get problems, try using the flags
-@code{-O2} @code{-fomit-frame-pointer} @code{-fno-strength-reduce}.
-You should specifically @emph{not} use @code{-funroll-loops}.
-
-You may notice that the Makefile runs six tests as part of
-the build process. If the program passes all of these, it's
-a pretty good (but not 100%) indication that the compiler has
-done its job correctly.
-@item
-If @code{bzip2} crashes randomly, and the crashes are not
-repeatable, you may have a flaky memory subsystem. @code{bzip2}
-really hammers your memory hierarchy, and if it's a bit marginal,
-you may get these problems. Ditto if your disk or I/O subsystem
-is slowly failing. Yup, this really does happen.
-
-Try using a different machine of the same type, and see if
-you can repeat the problem.
-@item This isn't really a bug, but ... If @code{bzip2} tells
-you your file is corrupted on decompression, and you
-obtained the file via FTP, there is a possibility that you
-forgot to tell FTP to do a binary mode transfer. That absolutely
-will cause the file to be non-decompressible. You'll have to transfer
-it again.
-@end itemize
-
-If you've incorporated @code{libbzip2} into your own program
-and are getting problems, please, please, please, check that the
-parameters you are passing in calls to the library, are
-correct, and in accordance with what the documentation says
-is allowable. I have tried to make the library robust against
-such problems, but I'm sure I haven't succeeded.
-
-Finally, if the above comments don't help, you'll have to send
-me a bug report. Now, it's just amazing how many people will
-send me a bug report saying something like
-@display
- bzip2 crashed with segmentation fault on my machine
-@end display
-and absolutely nothing else. Needless to say, a such a report
-is @emph{totally, utterly, completely and comprehensively 100% useless;
-a waste of your time, my time, and net bandwidth}.
-With no details at all, there's no way I can possibly begin
-to figure out what the problem is.
-
-The rules of the game are: facts, facts, facts. Don't omit
-them because "oh, they won't be relevant". At the bare
-minimum:
-@display
- Machine type. Operating system version.
- Exact version of @code{bzip2} (do @code{bzip2 -V}).
- Exact version of the compiler used.
- Flags passed to the compiler.
-@end display
-However, the most important single thing that will help me is
-the file that you were trying to compress or decompress at the
-time the problem happened. Without that, my ability to do anything
-more than speculate about the cause, is limited.
-
-Please remember that I connect to the Internet with a modem, so
-you should contact me before mailing me huge files.
-
-
-@section Did you get the right package?
-
-@code{bzip2} is a resource hog. It soaks up large amounts of CPU cycles
-and memory. Also, it gives very large latencies. In the worst case, you
-can feed many megabytes of uncompressed data into the library before
-getting any compressed output, so this probably rules out applications
-requiring interactive behaviour.
-
-These aren't faults of my implementation, I hope, but more
-an intrinsic property of the Burrows-Wheeler transform (unfortunately).
-Maybe this isn't what you want.
-
-If you want a compressor and/or library which is faster, uses less
-memory but gets pretty good compression, and has minimal latency,
-consider Jean-loup
-Gailly's and Mark Adler's work, @code{zlib-1.1.3} and
-@code{gzip-1.2.4}. Look for them at
-
-@code{http://www.zlib.org} and
-@code{http://www.gzip.org} respectively.
-
-For something faster and lighter still, you might try Markus F X J
-Oberhumer's @code{LZO} real-time compression/decompression library, at
-@* @code{http://wildsau.idv.uni-linz.ac.at/mfx/lzo.html}.
-
-If you want to use the @code{bzip2} algorithms to compress small blocks
-of data, 64k bytes or smaller, for example on an on-the-fly disk
-compressor, you'd be well advised not to use this library. Instead,
-I've made a special library tuned for that kind of use. It's part of
-@code{e2compr-0.40}, an on-the-fly disk compressor for the Linux
-@code{ext2} filesystem. Look at
-@code{http://www.netspace.net.au/~reiter/e2compr}.
-
-
-
-@section Testing
-
-A record of the tests I've done.
-
-First, some data sets:
-@itemize @bullet
-@item B: a directory containing 6001 files, one for every length in the
- range 0 to 6000 bytes. The files contain random lowercase
- letters. 18.7 megabytes.
-@item H: my home directory tree. Documents, source code, mail files,
- compressed data. H contains B, and also a directory of
- files designed as boundary cases for the sorting; mostly very
- repetitive, nasty files. 565 megabytes.
-@item A: directory tree holding various applications built from source:
- @code{egcs}, @code{gcc-2.8.1}, KDE, GTK, Octave, etc.
- 2200 megabytes.
-@end itemize
-The tests conducted are as follows. Each test means compressing
-(a copy of) each file in the data set, decompressing it and
-comparing it against the original.
-
-First, a bunch of tests with block sizes and internal buffer
-sizes set very small,
-to detect any problems with the
-blocking and buffering mechanisms.
-This required modifying the source code so as to try to
-break it.
-@enumerate
-@item Data set H, with
- buffer size of 1 byte, and block size of 23 bytes.
-@item Data set B, buffer sizes 1 byte, block size 1 byte.
-@item As (2) but small-mode decompression.
-@item As (2) with block size 2 bytes.
-@item As (2) with block size 3 bytes.
-@item As (2) with block size 4 bytes.
-@item As (2) with block size 5 bytes.
-@item As (2) with block size 6 bytes and small-mode decompression.
-@item H with buffer size of 1 byte, but normal block
- size (up to 900000 bytes).
-@end enumerate
-Then some tests with unmodified source code.
-@enumerate
-@item H, all settings normal.
-@item As (1), with small-mode decompress.
-@item H, compress with flag @code{-1}.
-@item H, compress with flag @code{-s}, decompress with flag @code{-s}.
-@item Forwards compatibility: H, @code{bzip2-0.1pl2} compressing,
- @code{bzip2-0.9.5} decompressing, all settings normal.
-@item Backwards compatibility: H, @code{bzip2-0.9.5} compressing,
- @code{bzip2-0.1pl2} decompressing, all settings normal.
-@item Bigger tests: A, all settings normal.
-@item As (7), using the fallback (Sadakane-like) sorting algorithm.
-@item As (8), compress with flag @code{-1}, decompress with flag
- @code{-s}.
-@item H, using the fallback sorting algorithm.
-@item Forwards compatibility: A, @code{bzip2-0.1pl2} compressing,
- @code{bzip2-0.9.5} decompressing, all settings normal.
-@item Backwards compatibility: A, @code{bzip2-0.9.5} compressing,
- @code{bzip2-0.1pl2} decompressing, all settings normal.
-@item Misc test: about 400 megabytes of @code{.tar} files with
- @code{bzip2} compiled with Checker (a memory access error
- detector, like Purify).
-@item Misc tests to make sure it builds and runs ok on non-Linux/x86
- platforms.
-@end enumerate
-These tests were conducted on a 225 MHz IDT WinChip machine, running
-Linux 2.0.36. They represent nearly a week of continuous computation.
-All tests completed successfully.
-
-
-@section Further reading
-@code{bzip2} is not research work, in the sense that it doesn't present
-any new ideas. Rather, it's an engineering exercise based on existing
-ideas.
-
-Four documents describe essentially all the ideas behind @code{bzip2}:
-@example
-Michael Burrows and D. J. Wheeler:
- "A block-sorting lossless data compression algorithm"
- 10th May 1994.
- Digital SRC Research Report 124.
- ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz
- If you have trouble finding it, try searching at the
- New Zealand Digital Library, http://www.nzdl.org.
-
-Daniel S. Hirschberg and Debra A. LeLewer
- "Efficient Decoding of Prefix Codes"
- Communications of the ACM, April 1990, Vol 33, Number 4.
- You might be able to get an electronic copy of this
- from the ACM Digital Library.
-
-David J. Wheeler
- Program bred3.c and accompanying document bred3.ps.
- This contains the idea behind the multi-table Huffman
- coding scheme.
- ftp://ftp.cl.cam.ac.uk/users/djw3/
-
-Jon L. Bentley and Robert Sedgewick
- "Fast Algorithms for Sorting and Searching Strings"
- Available from Sedgewick's web page,
- www.cs.princeton.edu/~rs
-@end example
-The following paper gives valuable additional insights into the
-algorithm, but is not immediately the basis of any code
-used in bzip2.
-@example
-Peter Fenwick:
- Block Sorting Text Compression
- Proceedings of the 19th Australasian Computer Science Conference,
- Melbourne, Australia. Jan 31 - Feb 2, 1996.
- ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps
-@end example
-Kunihiko Sadakane's sorting algorithm, mentioned above,
-is available from:
-@example
-http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz
-@end example
-The Manber-Myers suffix array construction
-algorithm is described in a paper
-available from:
-@example
-http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps
-@end example
-Finally, the following paper documents some recent investigations
-I made into the performance of sorting algorithms:
-@example
-Julian Seward:
- On the Performance of BWT Sorting Algorithms
- Proceedings of the IEEE Data Compression Conference 2000
- Snowbird, Utah. 28-30 March 2000.
-@end example
-
-
-@contents
-
-@bye
-