Foreword

Recently, I have been learning how to perform JPEG encoding. I searched many articles online and found that few articles explain every detail clearly, leading to many pitfalls during programming. Therefore, I plan to write an article that covers the details as much as possible, combined with Python code. For the specific program, you can refer to my open-source project on GitHub.

Of course, this introduction and the code are not perfect, and may even contain some errors. It can only serve as an introductory guide. Please forgive me.

Various Markers in JPEG Files

Many articles have introduced the markers in JPEG files. I have also uploaded a document that annotates an actual image (click to download) for reference.

All markers start with 0xff (255 in hexadecimal), followed by the number of bytes representing the data of this block and data describing the block information, as shown in the figure below:

CodeBlock Loading...

At this point, we only have the image data part left to write. But how exactly is the image data encoded, and how are the quantization and Huffman encoding mentioned above implemented? Please see the introduction in the next section.

JPEG Encoding Process

Since JPEG encoding requires dividing the image into 8x8 blocks, the height and width of the image must be multiples of 8. Therefore, we can use image interpolation or subsampling to slightly modify the image so that its height and width become multiples of 8. For an image with thousands of pixels, this operation will not significantly change the overall aspect ratio.

CodeBlock Loading...

Color Space Conversion

JPEG images uniformly use the YCbCr color space because the human eye is more sensitive to luminance and less sensitive to chrominance. Therefore, we selectively increase the compression of the Cb and Cr components, which can maintain the visual quality while reducing the file size to a greater extent. After converting to the YCbCr space, we can subsample the Cb and Cr color components to reduce their number of points, achieving greater compression. Common sampling formats are 4:4:4, 4:2:2, and 4:2:0. This corresponds to the horizontal and vertical sampling factors in the SOF0 marker. For simplicity, all sampling factors in this article are 1, meaning no subsampling, with one Y component corresponding to one Cb and one Cr component (4:4:4). In 4:2:2, two Y components correspond to one Cb and one Cr component; in 4:2:0, four Y components correspond to one Cb and one Cr component. As shown in the figure below, each cell corresponds to a Y component, and the blue squares are the pixels where Cb and Cr components are sampled.

The formula for color space conversion is:

Y = 0.299*R + 0.587*G + 0.114*B

Cb = -0.1687*R - 0.3313*G + 0.5*B + 128

Cr = 0.5*R - 0.4187*G - 0.0813*B + 128

The above operations are all rounded to integers. In a 24-bit RGB BMP image, the range of the R, G, B components is 0-255. Through simple mathematical relationships, we can easily find that the range of the Y, Cb, Cr components is also 0-255. In JPEG images, we usually need to subtract 128 from each component to make the range include both positive and negative values.

In Python, you can use functions from the OpenCV library to perform color space conversion:

CodeBlock Loading...

8x8 Block Division

In JPEG encoding, each 8x8 block is processed in order from top to bottom and left to right, and finally the data of each block is combined. For the Y, Cb, and Cr color components of each block, the same operations are performed in the order of Y, Cb, Cr (the quantization tables and Huffman tables used may differ).

CodeBlock Loading...

DCT Transform

DCT (Discrete Cosine Transform) converts spatial domain data to the frequency domain for computation. This allows us to selectively reduce high-frequency component data in the frequency domain without significantly affecting the visual quality. Compared to the Discrete Fourier Transform, the Discrete Cosine Transform operates entirely in the real number domain, which is more conducive to computer computation. The formula for the Discrete Cosine Transform is as follows:

F(u,v)=\frac2{\sqrt{MN}}\sum_{x=0}^{M-1}\sum_{y=0}^{N-1}f(x,y)C(u)C(v)\cos\frac{(2x+1)u\pi}{2M}\cos\frac{(2y+1)v\pi}{2N}

where $C(u)=\begin{cases}\frac{1}{\sqrt{2}}&u=0\\1&u\neq0\end{cases}$ . In JPEG, $M=N=8$ .

Of course, you can also use functions from the OpenCV library:

CodeBlock Loading...

Quantization

After the DCT transform, the DC component is the first element of the 8x8 block, low-frequency components are concentrated in the upper-left corner, and high-frequency components are concentrated in the lower-right corner. To selectively remove high-frequency components, we can perform quantization, which essentially divides each element in the 8x8 block by a fixed value. In the quantization table, the elements in the upper-left corner are smaller, while those in the lower-right corner are larger. An example of a set of quantization tables is shown below (different quantization tables are used for the Y component and the Cb/Cr components):

CodeBlock Loading...

Quantization process code:

CodeBlock Loading...

After quantization, many zeros appear in the lower-right corner of the 8x8 block. To concentrate these zeros and reduce the amount of data for run-length encoding, we next perform zigzag scanning.

Zigzag Scanning

The so-called zigzag scanning is actually converting the 8x8 block into a list of 64 items in the following order.

Finally, we obtain a list of length 64 like this: (41, -8, -6, -5, 13, 11, -1, 1, 2, -2, -3, -5, 1, 1, -5, 1, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0). The following operations will use this list as an example.

It should be noted that when storing the quantization table, we also need to perform zigzag scanning on the quantization table accordingly. Storing it in this form is necessary for the image viewer to decode the correct image (I spent a lot of debugging time on this detail initially). This can be seen in the code for writing markers at the beginning of this article.

CodeBlock Loading...

Differential Encoding (DC Component)

The value of the DC component is often large, and the DC components of adjacent 8x8 blocks are often very similar. Therefore, using differential encoding can save space to a greater extent. Differential encoding stores the difference between the DC component of the current block and that of the previous block, while the first block stores its own value. It should be noted that differential encoding is performed separately for the Y, Cb, and Cr components, meaning each component is subtracted from its corresponding previous value. The encoding and storage of the DC component nowblockdc will be introduced later.

CodeBlock Loading...

Run-Length Encoding of Zeros (AC Component)

After zigzag scanning, many zeros are concentrated together. The list of AC components is: (-8, -6, -5, 13, 11, -1, 1, 2, -2, -3, -5, 1, 1, -5, 1, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 1, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0).

Run-length encoding of zeros stores two numbers each time: the second number is a non-zero value, and the first number is the count of zeros preceding that non-zero value. Finally, two zeros are used as an end marker (especially note that when the input data does not end with a zero, two zeros are not needed as an end marker; this bug took me a long time to find, see line 23 of the code below). For the above list, after run-length encoding we get: (0, -8), (0, -6), (0, -5), (0, 13), (0, 11), (0, -1), (0, 1), (0, 2), (0, -2), (0, -3), (0, -5), (0, 1), (0, 1), (0, -5), (0, 1), (3, -1), (6, 1), (0, 1), (0, -1),(27, 1), (0, 0). The length of this data is 42, which is a slight reduction compared to the original 63. Of course, this is a special data set; actual data will have more zeros, or even all zeros, and the encoded size can be even smaller.

Perhaps you noticed that (27, 1) is marked in red. This is because in the encoding of part 8, the first number is stored as a 4-bit value, so its range is 0~15. Here it obviously exceeds that, so we need to split it into (15, 0), (11, 1), where (15, 0) represents 16 zeros, and (11, 1) represents 11 zeros followed by a 1.

CodeBlock Loading...

JPEG Special Binary Encoding

After the above groundwork, this section will truly introduce how the encoded DC and AC components are written to the file as a data stream.

In JPEG encoding, there is the following binary encoding format:

CodeBlock Loading...

For a number to be stored, we need to obtain the bit length and the actual binary value to be stored according to the above format. Observing the pattern, it is easy to see that for positive numbers, the stored value is its actual binary representation, and the bit length is its actual bit length. For the corresponding negative numbers, the bit length is the same, and the binary value is the bitwise negation of the positive value. Zero does not need to be stored.

CodeBlock Loading...

For the DC component, suppose the value after differential encoding is -41. According to the above operation, we can get its bit length as 6, and the stored binary data stream is 010110. For the value 6, we need to use canonical Huffman encoding to store its binary data stream, which will be introduced in part 9. Let's assume the binary data stream stored for 6 is 100. Then the DC component of a certain color component of this 8x8 block is stored as 100010110.

After writing the binary data stream of the DC component to the file, we then encode the AC components of this color component of the 8x8 block. The values obtained after run-length encoding are: (0, -8), (0, -6), (0, -5), (0, 13), (0, 11), (0, -1), (0, 1), (0, 2), (0, -2), (0, -3), (0, -5), (0, 1), (0, 1), (0, -5), (0, 1), (3, -1), (6, 1), (0, 1), (0, -1),(15, 0), (11, 1) , (0, 0).

First, store (0, -8). For the second number, perform the same operation to get 4 bits and 0111. However, unlike the DC component, we need to perform canonical Huffman encoding on 0x04, where the upper four bits are the first number of (0, -8) and the lower four bits are the bit length of the second number. Assuming the canonical Huffman encoding of 0x04 is stored as 1011, then (0, -8) is stored as 10110111. Next, perform the same operation for (0, -6) etc., and write the resulting data stream to the file sequentially.

Another example: (6, 1). Here, 1 is stored as 1, 1 bit, so we perform canonical Huffman encoding on 0x61. Assuming it is 1111011, then (6, 1) is stored as 11110111. For (15, 0), only the canonical Huffman encoding value of 0xf0 is stored.

After writing the data of one color component (say Y) following the above process, we then write the data of the Cb color component of this 8x8 block, and then the Cr component. After writing the data of each 8x8 block in the same way from left to right and top to bottom, we write the EOI marker (0xffd9) as the end of the image.

Note: During the data writing process, we need to check if the byte being written is 0xff. To prevent marker conflicts, we need to append 0x00 after it.

CodeBlock Loading...

Canonical Huffman Encoding

The canonical Huffman encoding introduced in this article has four encoding tables, used for luminance DC component, chrominance DC component, luminance AC component, and chrominance AC component, respectively.

CodeBlock Loading...

In the above code, stdhuffmanDC0 etc. are the values actually stored in the markers, as can be seen in the code for marker introduction. Among these numbers, the first 16 numbers (0, 0, 7, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) represent how many codes there are for each length from 1 to 16 bits. The following 12 numbers are exactly the sum of the first 16 numbers. What stdhuffmanDC0 describes is actually the following figure:

Now we only know the encoded data length for each original data, but not its actual value.

Canonical Huffman encoding has its own set of rules:

The code of the first number with the minimum code length is 0;
Codes with the same code length are consecutive;
The code a of the first number of the next code length (assume j) depends on the code b of the last number of the previous code length (assume i), i.e., a=(b+1)<<(j-i).

From rule 1, we can know that the code for 4 is 000. From rule 2, the code for 5 is 001, for 3 is 010, for 2 is 011..., for 0 is 110. From rule 3, the code for 7 is 1110, for 8 is 11110...

CodeBlock Loading...

The final Huffman dictionary is quite long and can be viewed in my GitHub project. By finding the pattern, you can understand how the dictionary index in the write_num function is obtained that way.

JPEG-encode10

JPEG编解码从零开始实现(python JPEG codec)

Python

Foreword

This blog was originally published on 2021-08-22 on CSDN. It is copied here and some formatting issues have been corrected along the way.

JPEG-encode10

JPEG编解码从零开始实现(python JPEG codec)

Python

Of course, this introduction and the code are not perfect, and may even contain some errors. It can only serve as an introductory guide. Please forgive me.

Various Markers in JPEG Files

Many articles have introduced the markers in JPEG files. I have also uploaded a document that annotates an actual image (click to download) for reference.

All markers start with 0xff (255 in hexadecimal), followed by the number of bytes representing the data of this block and data describing the block information, as shown in the figure below:

PYTHON

# 写入jpeg格式的译码信息
# filename: 输出文件名
# h: 图片高度
# w: 图片宽度
def write_head(filename, h, w):
    # 二进制写入形式打开文件(覆盖)
    fp = open(filename, "wb")
 
    # SOI
    fp.write(pack(">H", 0xffd8))
    # APP0
    fp.write(pack(">H", 0xffe0))
    fp.write(pack(">H", 16))            # APP0字节数
    fp.write(pack(">L", 0x4a464946))    # JFIF
    fp.write(pack(">B", 0))                # 0
    fp.write(pack(">H", 0x0101))        # 版本号: 1.1
    fp.write(pack(">B", 0x01))            # 像素密度单位: 像素/英寸
    fp.write(pack(">L", 0x00480048))    # XY方向像素密度
    fp.write(pack(">H", 0x0000))        # 无缩略图信息
    # DQT_0
    fp.write(pack(">H", 0xffdb))
    fp.write(pack(">H", 64+3))            # 量化表字节数
    fp.write(pack(">B", 0x00))            # 量化表精度: 8bit(0)  量化表ID: 0
    tbl = block2zz(std_luminance_quant_tbl)
    for item in tbl:
        fp.write(pack(">B", item))    # 量化表0内容
    # DQT_1
    fp.write(pack(">H", 0xffdb))
    fp.write(pack(">H", 64+3))            # 量化表字节数
    fp.write(pack(">B", 0x01))            # 量化表精度: 8bit(0)  量化表ID: 1
    tbl = block2zz(std_chrominance_quant_tbl)
    for item in tbl:
        fp.write(pack(">B", item))    # 量化表1内容
    # SOF0
    fp.write(pack(">H", 0xffc0))
    fp.write(pack(">H", 17))            # 帧图像信息字节数
    fp.write(pack(">B", 8))                # 精度: 8bit
    fp.write(pack(">H", h))                # 图像高度
    fp.write(pack(">H", w))                # 图像宽度
    fp.write(pack(">B", 3))                # 颜色分量数: 3(YCrCb)
    fp.write(pack(">B", 1))                # 颜色分量ID: 1
    fp.write(pack(">H", 0x1100))        # 水平垂直采样因子: 1  使用的量化表ID: 0
    fp.write(pack(">B", 2))                # 颜色分量ID: 2
    fp.write(pack(">H", 0x1101))        # 水平垂直采样因子: 1  使用的量化表ID: 1
    fp.write(pack(">B", 3))                # 颜色分量ID: 3
    fp.write(pack(">H", 0x1101))        # 水平垂直采样因子: 1  使用的量化表ID: 1
    # DHT_DC0
    fp.write(pack(">H", 0xffc4))
    fp.write(pack(">H", len(std_huffman_DC0)+3))    # 哈夫曼表字节数
    fp.write(pack(">B", 0x00))                        # DC0
    for item in std_huffman_DC0:
        fp.write(pack(">B", item))                    # 哈夫曼表内容
    # DHT_AC0
    fp.write(pack(">H", 0xffc4))
    fp.write(pack(">H", len(std_huffman_AC0)+3))    # 哈夫曼表字节数
    fp.write(pack(">B", 0x10))                        # AC0
    for item in std_huffman_AC0:
        fp.write(pack(">B", item))                    # 哈夫曼表内容
    # DHT_DC1
    fp.write(pack(">H", 0xffc4))
    fp.write(pack(">H", len(std_huffman_DC1)+3))    # 哈夫曼表字节数
    fp.write(pack(">B", 0x01))                        # DC1
    for item in std_huffman_DC1:
        fp.write(pack(">B", item))                    # 哈夫曼表内容
    # DHT_AC1
    fp.write(pack(">H", 0xffc4))
    fp.write(pack(">H", len(std_huffman_AC1)+3))    # 哈夫曼表字节数
    fp.write(pack(">B", 0x11))                        # AC1
    for item in std_huffman_AC1:
        fp.write(pack(">B", item))                    # 哈夫曼表内容
    # SOS
    fp.write(pack(">H", 0xffda))
    fp.write(pack(">H", 12))            # 扫描开始信息字节数
    fp.write(pack(">B", 3))                # 颜色分量数: 3
    fp.write(pack(">H", 0x0100))        # 颜色分量1 DC、AC使用的哈夫曼表ID
    fp.write(pack(">H", 0x0211))        # 颜色分量2 DC、AC使用的哈夫曼表ID
    fp.write(pack(">H", 0x0311))        # 颜色分量3 DC、AC使用的哈夫曼表ID
    fp.write(pack(">B", 0x00))
    fp.write(pack(">B", 0x3f))
    fp.write(pack(">B", 0x00))            # 固定值
    fp.close()

CodeBlock Loading...

JPEG Encoding Process

PYTHON

# 转换图片大小，必须能被切分成8*8的小块
if((h % 8 == 0) and (w % 8 == 0)):
    nblock = int(h * w / 64)
else:
    h = h // 8 * 8
    w = w // 8 * 8
    YCrCb = cv2.resize(YCrCb, [h, w], cv2.INTER_CUBIC)
    nblock = int(h * w / 64)

CodeBlock Loading...

Color Space Conversion

The formula for color space conversion is:

Y = 0.299*R + 0.587*G + 0.114*B

Cb = -0.1687*R - 0.3313*G + 0.5*B + 128

Cr = 0.5*R - 0.4187*G - 0.0813*B + 128

In Python, you can use functions from the OpenCV library to perform color space conversion:

PYTHON

YCrCb = cv2.cvtColor(BGR, cv2.COLOR_BGR2YCrCb)
npdata = np.array(YCrCb, np.int16)

CodeBlock Loading...

8x8 Block Division

PYTHON

for i in range(0, h, 8):
    for j in range(0, w, 8):
        ...

CodeBlock Loading...

DCT Transform

F(u,v)=\frac2{\sqrt{MN}}\sum_{x=0}^{M-1}\sum_{y=0}^{N-1}f(x,y)C(u)C(v)\cos\frac{(2x+1)u\pi}{2M}\cos\frac{(2y+1)v\pi}{2N}

where $C(u)=\begin{cases}\frac{1}{\sqrt{2}}&u=0\\1&u\neq0\end{cases}$ . In JPEG, $M=N=8$ .

Of course, you can also use functions from the OpenCV library:

PYTHON

now_block = npdata[i:i+8, j:j+8, 0] - 128        # 取出一个8*8块并减去128 Y分量
now_block = npdata[i:i+8, j:j+8, 2] - 128        # 取出一个8*8块并减去128 Cb分量
now_block = npdata[i:i+8, j:j+8, 1] - 128        # 取出一个8*8块并减去128 Cr分量
now_block_dct = cv2.dct(np.float32(now_block))    # DCT变换

CodeBlock Loading...

Quantization

PYTHON

# 亮度量化表
std_luminance_quant_tbl = np.array(
    [
        [16, 11, 10, 16, 24, 40, 51, 61],
        [12, 12, 14, 19, 26, 58, 60, 55],
        [14, 13, 16, 24, 40, 57, 69, 56],
        [14, 17, 22, 29, 51, 87, 80, 62],
        [18, 22, 37, 56, 68,109,103, 77],
        [24, 35, 55, 64, 81,104,113, 92],
        [49, 64, 78, 87,103,121,120,101],
        [72, 92, 95, 98,112,100,103, 99]
    ],
    np.uint8
)
# 色度量化表
std_chrominance_quant_tbl = np.array(
    [
        [17, 18, 24, 47, 99, 99, 99, 99],
        [18, 21, 26, 66, 99, 99, 99, 99],
        [24, 26, 56, 99, 99, 99, 99, 99],
        [47, 66, 99, 99, 99, 99, 99, 99],
        [99, 99, 99, 99, 99, 99, 99, 99],
        [99, 99, 99, 99, 99, 99, 99, 99],
        [99, 99, 99, 99, 99, 99, 99, 99],
        [99, 99, 99, 99, 99, 99, 99, 99]
    ],
    np.uint8
)

CodeBlock Loading...

Quantization process code:

PYTHON

now_block_qut = quantize(now_block_dct, 0)        # Y分量 量化
now_block_qut = quantize(now_block_dct, 2)        # Cb分量 量化
now_block_qut = quantize(now_block_dct, 1)        # Cr分量 量化

# 量化
# block: 当前8*8块的数据
# dim: 维度  0:Y  1:Cr  2:Cb
def quantize(block, dim):
    if(dim == 0):
        # 使用亮度量化表
        qarr = std_luminance_quant_tbl
    else:
        # 使用色度量化表
        qarr = std_chrominance_quant_tbl
    return (block / qarr).round().astype(np.int16)

CodeBlock Loading...

After quantization, many zeros appear in the lower-right corner of the 8x8 block. To concentrate these zeros and reduce the amount of data for run-length encoding, we next perform zigzag scanning.

Zigzag Scanning

The so-called zigzag scanning is actually converting the 8x8 block into a list of 64 items in the following order.

PYTHON

now_block_zz = block2zz(now_block_qut)            # zigzag扫描

# zigzag扫描
# block: 当前8*8块的数据
def block2zz(block):
    re = np.empty(64, np.int16)
    # 当前在block的位置
    pos = np.array([0, 0])
    # 定义四个扫描方向
    R = np.array([0, 1])
    LD = np.array([1, -1])
    D = np.array([1, 0])
    RU = np.array([-1, 1])
    for i in range(0, 64):
        re[i] = block[pos[0], pos[1]]
        if(((pos[0] == 0) or (pos[0] == 7)) and (pos[1] % 2 == 0)):
            pos = pos + R
        elif(((pos[1] == 0) or (pos[1] == 7)) and (pos[0] % 2 == 1)):
            pos = pos + D
        elif((pos[0] + pos[1]) % 2 == 0):
            pos = pos + RU
        else:
            pos = pos + LD
    return re

CodeBlock Loading...

Differential Encoding (DC Component)

PYTHON

last_block_ydc = 0
last_block_cbdc = 0
last_block_crdc = 0

now_block_dc = now_block_zz[0] - last_block_ydc # 直流分量差分形式记录
last_block_ydc = now_block_zz[0]                # 记录本次量

now_block_dc = now_block_zz[0] - last_block_cbdc
last_block_cbdc = now_block_zz[0]

now_block_dc = now_block_zz[0] - last_block_crdc
last_block_crdc = now_block_zz[0]

CodeBlock Loading...

Run-Length Encoding of Zeros (AC Component)

PYTHON

now_block_ac = RLE(now_block_zz[1:])

# 0的行程编码
# AClist: 要编码的交流数据
def RLE(AClist: np.ndarray) -> np.ndarray:
    re = []
    cnt = 0
    for i in range(0, 63):
        if(AClist[i] == 0 and cnt != 15):
            cnt += 1
        else:
            re.append(cnt)
            re.append(AClist[i])
            cnt = 0
    # 删除末尾的所有[15 0]
    while(re[-1] == 0):
        re.pop()
        re.pop()
        if(len(re) == 0):
            break
    # 在结尾添加两个0作为结束标记
    if(AClist[-1] == 0):
        re.extend([0, 0])
    return np.array(re, np.int16)

CodeBlock Loading...

JPEG Special Binary Encoding

After the above groundwork, this section will truly introduce how the encoded DC and AC components are written to the file as a data stream.

In JPEG encoding, there is the following binary encoding format:

             Value               Bit Length        Actual Stored Value
              0                   0                    None
            -1,1                  1                  0,1
         -3,-2,2,3                2              00,01,10,11
   -7,-6,-5,-4,4,5,6,7            3    000,001,010,011,100,101,110,111
     -15,..,-8,8,..,15            4       0000,..,0111,1000,..,1111
    -31,..,-16,16,..,31           5     00000,..,01111,10000,..,11111
    -63,..,-32,32,..,63           6                  ...
   -127,..,-64,64,..,127          7                  ...
  -255,..,-128,128,..,255         8                  ...
  -511,..,-256,256,..,511         9                  ...
 -1023,..,-512,512,..,1023       10                  ...
-2047,..,-1024,1024,..,2047      11                  ...

CodeBlock Loading...

PYTHON

# 特殊的二进制编码格式
# num: 待编码的数字
def tobin(num):
    s = ""
    if(num > 0):
        while(num != 0):
            s += '0' if(num % 2 == 0) else '1'
            num = int(num / 2)
        s = s[::-1]    # 反向
    elif(num < 0):
        num = -num
        while(num != 0):
            s += '1' if(num % 2 == 0) else '0'
            num = int(num / 2)
        s = s[::-1]
    return s

CodeBlock Loading...

Note: During the data writing process, we need to check if the byte being written is 0xff. To prevent marker conflicts, we need to append 0x00 after it.

PYTHON

s = write_num(s, -1, now_block_dc, DC0)            # 根据编码方式写入直流数据
for l in range(0, len(now_block_ac), 2):        # 写入交流数据
    s = write_num(s, now_block_ac[l], now_block_ac[l+1], AC0)
    while(len(s) >= 8):                            # 记录数据太长会导致爆内存
        num = int(s[0:8], 2)                    # 运行速度变慢
        fp.write(pack(">B", num))
        if(num == 0xff):                        # 为防止标志冲突
            fp.write(pack(">B", 0))                # 数据中出现0xff需要在后面补两个0x00
        s = s[8:len(s)]

# 根据编码方式写入数据
# s: 未写入文件的二进制数据
# n: 数据前面0的个数(-1代表DC)
# num: 待写入的数据
# tbl: 范式哈夫曼编码字典
def write_num(s, n, num, tbl):
    bit = 0
    tnum = num
    while(tnum != 0):
        bit += 1
        tnum = int(tnum / 2)
    if(n == -1):                    # DC
        tnum = bit
        if(tnum > 11):
            print("Write DC data Error")
            exit()
    else:                            # AC
        if((n > 15) or (bit > 11) or (((n != 0) and (n != 15)) and (bit == 0))):
            print("Write AC data Error")
            exit()
        tnum = n * 10 + bit + (0 if(n != 15) else 1)
    # 范式哈夫曼编码记录0的个数(AC)以及num的bit长度
    s += tbl[tnum].str_code
    # 特殊形式的数据存储num
    s += tobin(num)
    return s

CodeBlock Loading...

Canonical Huffman Encoding

PYTHON

# 亮度直流量范式哈夫曼编码表
std_huffman_DC0 = np.array(
    [0, 0, 7, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
     4, 5, 3, 2, 6, 1, 0, 7, 8, 9, 10, 11],
    np.uint8
)
...
# 换算出哈夫曼字典
DC0 = DHT2tbl(std_huffman_DC0)    # 亮度直流分量
DC1 = DHT2tbl(std_huffman_DC1)    # 色度直流分量
AC0 = DHT2tbl(std_huffman_AC0)    # 亮度交流分量
AC1 = DHT2tbl(std_huffman_AC1)    # 色度交流分量

CodeBlock Loading...

Now we only know the encoded data length for each original data, but not its actual value.

Canonical Huffman encoding has its own set of rules:

The code of the first number with the minimum code length is 0;
Codes with the same code length are consecutive;
The code a of the first number of the next code length (assume j) depends on the code b of the last number of the previous code length (assume i), i.e., a=(b+1)<<(j-i).

From rule 1, we can know that the code for 4 is 000. From rule 2, the code for 5 is 001, for 3 is 010, for 2 is 011..., for 0 is 110. From rule 3, the code for 7 is 1110, for 8 is 11110...

PYTHON

# 记录哈夫曼字典的类
# symbol: 原始数据
# code: 对应的编码数据
# n_bit: 编码的二进制位数
# str_code: 编码的二进制数据
class Sym_Code():
    def __init__(self, symbol, code, n_bit):
        self.symbol = symbol
        self.code = code
        str_code=''
        mask = 1 << (n_bit - 1)
        for i in range(0, n_bit):
            if(mask & code):
                str_code += '1'
            else:
                str_code += '0'
            mask >>= 1
        self.str_code = str_code
    """定义输出形式"""
    def __str__(self):
        return "0x{:0>2x}    |  {}".format(self.symbol, self.str_code)
    """定义排序依据"""
    def __eq__(self, other):
        return self.symbol == other.symbol
    def __le__(self, other):
        return self.symbol < other.symbol
    def __gt__(self, other):
        return self.symbol > other.symbol
 
 
# 将范式哈夫曼编码表转换为哈夫曼字典
# data: 定义的范式哈夫曼编码表
def DHT2tbl(data):
    numbers = data[0:16]                # 1~16bit长度的编码对应的个数
    symbols = data[16:len(data)]        # 原数据
    if(sum(numbers) != len(symbols)):    # 判断是否为正确的范式哈夫曼编码表
        print("Wrong DHT!")
        exit()
    code = 0
    SC = []                                # 记录字典的列表
    for n_bit in range(1, 17):
        # 按范式哈夫曼编码规则换算出字典
        for symbol in symbols[sum(numbers[0:n_bit-1]):sum(numbers[0:n_bit])]:
            SC.append(Sym_Code(symbol, code, n_bit))
            code += 1
        code <<= 1
    return sorted(SC)

CodeBlock Loading...

The final Huffman dictionary is quite long and can be viewed in my GitHub project. By finding the pattern, you can understand how the dictionary index in the write_num function is obtained that way.

JPEG-encode10

JPEG编解码从零开始实现(python JPEG codec)

Python

JPEG Encoding Details

JPEG Encoding Details

Foreword

Various Markers in JPEG Files

JPEG Encoding Process

Color Space Conversion

8x8 Block Division

DCT Transform

Quantization

Zigzag Scanning

Differential Encoding (DC Component)

Run-Length Encoding of Zeros (AC Component)

JPEG Special Binary Encoding

Canonical Huffman Encoding

Foreword

Various Markers in JPEG Files

JPEG Encoding Process

Color Space Conversion

8x8 Block Division

DCT Transform

Quantization

Zigzag Scanning

Differential Encoding (DC Component)

Run-Length Encoding of Zeros (AC Component)

JPEG Special Binary Encoding

Canonical Huffman Encoding