This project explores entropy-based compression techniques using Huffman Coding, Fano Coding, and an innovative Second-order Adaptive Markov Encoding (AME) scheme. Implementations and evaluations are performed using MATLAB and Python.
Data compression is essential for efficient storage and transmission. The project focuses on comparing Huffman and Fano coding, widely used lossless entropy-based methods, and introduces AME coding, which enhances compression by leveraging predictive modeling.
- Huffman Coding:
- Assigns shorter codes to frequently occurring symbols.
- MATLAB implementation with tree visualization.
- Fano Coding:
- Similar to Huffman but uses frequency balancing.
- MATLAB implementation with tree visualization.
- Second-order Adaptive Markov Encoding (AME):
- Pre-processes text to predict and compress based on structural patterns.
- Python implementation with adaptive tree modeling.
- Analyze character frequencies in
original.txt. - Perform Huffman and Fano coding.
- Compute performance metrics:
- Average Code Length (L̄)
- Code Rate (R)
- Efficiency (η)
- Compression Ratio (ξ).
- Pre-process
original.txtusing AME. - Apply Huffman and Fano coding to the AME-processed text.
- Evaluate binary file size improvements.
| Metric | Huffman | Fano |
|---|---|---|
| Avg. Code Len (L̄) | 4.5023 | 8.4244 |
| Code Rate (R) | 0.9909 | 0.5282 |
| Efficiency (η) | 0.9909 | 0.5282 |
| Compression Ratio (ξ) | 0.5628 | 1.0530 |
- AME improves binary file size:
- Huffman: 5.25% reduction.
- Fano: 4.37% reduction.
- Efficiency increases with longer texts.
- Visualized for
original.txt.
- Visualized for
processed.txt.
original.txt: Source text (first 3 chapters of Game of Thrones).
binary_huffman.txt,binary_fano.txt: Encoded binary files.processed.txt: AME-encoded file.received_huffman.txt,received_fano.txt: Decoded files.
- Python:
adaptive_markov_encode.py: AME encoding.adaptive_markov_decode.py: AME decoding.
- MATLAB:
huffman_encode.m,fano_encode.m: Encoding procedures.huffman_decode.m,fano_decode.m: Decoding procedures.
- Huffman and Fano trees, character frequency distributions.
Please contact us if you need help:
- Jinming Ren (marcobisky@outlook.com)
- Yuhao Liu