--------------------- Arithmetic Coding Example Routines ------------------
Copyright 1990 Massachusetts Institute of Technology
All rights reserved.
Permission to use, copy, or modify this software and its documentation
for educational and research purposes only and without fee is hereby
granted, provided that this copyright notice appear on all copies and
supporting documentation. For any other uses of this software, in
original or modified form, including but not limited to distribution
in whole or in part, specific prior permission must be obtained from
M.I.T. and the author. These programs shall not be used, rewritten,
or adapted as the basis of a commercial software or hardware product
without first obtaining appropriate licenses from M.I.T. M.I.T. and
the author make no representations about the suitability of this
software for any purpose. It is provided "as is" without express or
implied warranty.
-----------------------------------------------------------------------
For a demo, type:
make acms_example
acms_example 100 4.0
The file "acms.c" contains C language subroutines to perform K-ary
arithmetic coding, an efficient form of data compression. The details
of the technique are provided in chapter 3 of:
Popat, Ashok C., "Scalar Quantization with Arithmetic Coding,"
SM thesis, MIT Dept. of Elec. Eng. and Comp. Sci., Cambridge,
Massachusetts, June 1990.
The thesis, in postscript form, is available by anonymous ftp from
media-lab.media.mit.edu (18.85.0.2):
pub/k-arith-code/doc?.ps.Z
If you can't ftp, let me know and I'll email it to you in small
chunks.
The algorithm doesn't require any division. Encoding requires two
multiplications, two additions, and one barrel-shift per source letter.
Decoding requires one barrel-shift and not more than ceiling(log_2(K))+1
multiplications and additions per source letter, where K is the
alphabet size.
Currently, the routines "encode" and "decode" are set up to allow
efficient encoding of a sequence of independent identically
distributed (iid) discrete random variables. It is not set up for
adaptive coding, although extension to the adaptive case is
straightforward. Basically, you'd set up a probability matrix rather
than a probability vector. You'd switch from row to row of the
matrix, according to some state variable in your model. Every
row would have to have been rounded to admissible values using,
for example, the supplied routine optrnd().
Usage:
Here, the source random variables will be called "letters," and a
particular sequence of them will be called a "message". N will denote
the number of letters in the message; that is, its length. The
"alphabet" for the source is the set of integers between 0 to K-1,
inclusive. Thus, the alphabet size is K. The letters, being
identically distributed, are described by a single probability mass
function p[k], k=0,...,K-1.
To encode a message:
encode(K,p,message,N,16,15,codebuf,&nbits);
where
Inputs:
- K: the alphabet size, type "int".
- p: the probability mass function array, type "double *"
- message: an array of letters to be encoded, type = "int *"
- N: the number of letters in the message, type = "int"
- (the values 16 and 15 shouldn't be touched for now)
Outputs:
- codebuf: the resulting array of code bits, type = "int *"
- nbits: set to the number of bits in codebuf, type = "int *"
To recover the message from codebuf, the following call to "decode"
can be used:
decode(K,p,recovered_message,nlet,16,15,codebuf,nbits);
where
Inputs:
- K: the alphabet size, type "int".
- p: the probability mass function array, type "double *"
- N: the number of letters in the message, type = "int"
- (the values 16 and 15 shouldn't be touched for now)
- codebuf: the array of code bits, type = "int *"
- nbits: the number of bits in codebuf, type = "int"
Outputs:
- recovered_message: the decoded array of letters, type = "int *"
EXAMPLE: See the file "acms_example.c" for an example of how to use
the routines.
See my thesis for an explanation of what's really being done.
Good luck. If you find mistakes in the thesis or code, or if you make
any improvements to the code, I'd like to hear about it.
Kris Popat
popat@media-lab.mit.edu
MIT Rm. E15-355
Cambridge, MA 02139