TDM 40100: Project 9 — 2022
Motivation: Images are everywhere, and images are data! We will take some time to dig more into working with images as data in this series of projects.
Context: In the previous project, we worked with images and implemented edge detection, with some pretty cool results! In these next couple of projects, we will continue to work with images as we learn how to compress images from scratch. This is the first in a series of 2 projects where we will implement a variation of jpeg image compression!
Scope: Python, images, JAX
Dataset(s)
The following questions will use the following dataset(s):
-
/anvil/projects/tdm/data/images/drward.jpg
-
/anvil/projects/tdm/data/images/barn.jpg
Questions
Some helpful links that were really useful.
|
Question 1
In the previous project, we were able to isolate and display the various YCbCr components of our barn.jpg
image. In addition, we were able to use the discrete cosine transformation to convert each of the channels of our image (Y, Cb, and Cr) to signal data.
Per mathworks, the discrete cosine transform has the property that visually significant information about an image is concentrated in just a few coefficients of the resulting signal data. Meaning, if we are able to capture the majority of the visually-important data from just a few coefficients, there is a lot of opportunity to reduce the amount of data we need to keep!
Start from the end of the previous project. Load up some libraries.
from skimage import io, color, viewer
from matplotlib.pyplot import imshow
import numpy as np
import jax
import jax.numpy as jnp
import hashlib
from IPython import display
import scipy
img = io.imread("/anvil/projects/tdm/data/images/barn.jpg")
In addition, load up your dct2
function, and create a numpy ndarray called freq
that holds the image data (for barn.jpg
) converted using the discrete cosine transform.
Let’s take a step back, and clarify a couple things.
-
We will not actually be compressing our image, but we will be demonstrating how we can store the images data with less space, and very little loss of image detail.
-
We will still use a simple method to estimate about how much space we would save if we did compress our image.
-
We will display a "compressed" version of the image. What this means is that you will be able to view the jpeg after it has lost the data it would normally lose during the compression process.
Okay, begin by taking the original RGB img
and displaying the first 8x8 block of data for each of the R, G, and B channels. Next, display the first 8x8 block of data for the Y, Cb, and Cr channels. Finally, use dct2
to create the freq
ndarray (like you did in the previous project). Display the first 8x8 block of the Y, Cb, and Cr channels after the transformation.
When we say "display 8x8 blocks" we do not mean show an image — we mean show the numeric data in the form of a numpy array. An 8x8 numpy array printed out using |
By default, numpy arrays don’t print very nicely. Use
|
To get you started, this would be how you print the R, G, and B channels first 8x8 block.
|
The following are
|
If you did not complete the previous project, no worries, please check out question (5). This will provide you with code that lets you efficiently loop through 8x8 blocks for each channel. This is important for creating the
|
-
Code used to solve this problem.
-
Output from running the code.
-
The output should be 3 sets of 3 8x8 numeric numpy matrices.
-
The first matrix should be the R, G, and B channels.
-
The second matrix should be the Y, Cb, and Cr channels.
-
The third matrix should be the Y, Cb, and Cr channels after being converted to frequency data using
dct2
.
Question 2
Take a close look at the final set of 8x8 blocks in the previous question — the blocks after the DCT was applied. You’ll notice the top left corner value is much different than the rest. This is the DC coefficiant. The rest are called AC coefficients.
We forgot an important step. Before we apply the dct2
, we need to shift the our data to be centered around 0 instead of 127. We can do this by subtracting 127 from every value before applying DCT.
Re-print the first 8x8 block of freq
after centering — do the results look much different? According to wikipedia, this step reduces the dynamic range requirements in the DCT processing stage.
-
Code used to solve this problem.
-
Output from running the code.
-
The output should be 1 set of 3 8x8 numeric numpy matrices.
-
The output should be very close to the third matrix from question (1), but we center the data before applying dct.
Question 3
Okay, great! The next step in this process is to quantize our freq
signal data. You can read more about quantization here. Apparently, the human brain is not very good at distinguishing changes in high frequency parts of our data, but good at distinguishing low frequency changes.
We can use a quantization matrix to filter out the higher frequency data and maintain the lower frequency data. One of the more common quantization matrices is the following.
q1 = np.array([[16,11,10,16,24,40,51,61],
[12,12,14,19,26,28,60,55],
[14,13,16,24,40,57,69,56],
[14,17,22,29,51,87,80,62],
[18,22,37,56,68,109,103,77],
[24,35,55,64,81,104,113,92],
[49,64,78,87,103,121,120,101],
[72,92,95,98,112,100,103,99]])
print(np.array_str(q1, precision=2, suppress_small=True))
The quantization matrix is designed to provide more resolution to more perceivable frequency components over less perceivable components (usually lower frequencies over high frequencies) in addition to transforming as many components to 0, which can be encoded with greatest efficiency.
Take the freq
signal data and divide the first 8x8 block by the quantization matrix. Use np.round
to immediately round the values to the nearest integer. Use np.array_str
to once again, display the resulting, quantized 8x8 block, for each of the 3 channels.
Wow! The results are interesting, and this is where the majority of the actual data loss (and compression) takes place. Let’s take a minute to explain what would happen next.
-
The data would be encoded by first using run-length encoding
-
Then, the data would be encoded by using Huffman coding.
The details are beyond this course, however, it is not too inaccurate to say that the zeros essentially don’t need to be stored anymore. So for our first 8x8 block, we went from needing to store about 64 values to only 1, for each channel for a total of 192 to 3.
-
The encoded data, and all of the information (huffman tables, quantization tables, etc.) needed to reverse the process and restore the image would be structure carefully and stored as a jpeg file.
Then, when some goes to open the image, the jpeg file contains all of the information needed to reverse the process and the image is displayed!
You may be wondering — wait, you are saying we can take those super sparse matrices we just printed and get back to our original RGB values? Nope! But we can recover the "important stuff" that creates an image that looks visually identical to our original image! This would be, in effect, the same image we would see if we implemented the entire algorithm and displayed the resulting image!
-
Code used to solve this problem.
-
Output from running the code.
-
Output should be 1 set of 3 8x8 matrices that apply the quantization matrix and rounding after dct.
Question 4
Use the following idct2
function (the inverse of dct2
) and print out the first 8x8 for each channel after the process has been inversed. Starting with the quantized freq
data from the previous question, the inverse process would be the following.
-
Multiply by the quantization table.
-
Use the
idct2
function to reverse the dct. -
Add 127 to the final result to undo the shift highlighted in question (2).
Use np.array_str
to print the first 8x8 block for each channel. Do the results look fairly close to the original YCbCr channel values? Impressive!
|
-
Code used to solve this problem.
-
Output from running the code.
-
Output should be 1 set of 3 8x8 matrices that apply the quantization matrix, round, de-apply quantization matrix, perform inverse dct, and de-shift the values. These matrices should be nearly the same as the original YCbCr values from question (1).
Question 5
Let’s put it all together! While we aren’t fully implementing the compression algorithm, we do implement the parts that cause loss (hence jpeg is a lossy algorithm). Since we implement those parts, we should also be able to view the lossy version of the image to see if we can perceive a difference! In addition, we could also count the number of non-zero values in our image data before we process anything, and re-count immediately after the quantization and rounding, where many zeros appear in our matrices. This will quite roughly tell us the savings if we were to implement the entire algorithm!
You can use np.count_nonzero to count the non-zero values of an array. |
For our barn.jpg
image, walk through the entire algorithm (excluding the encoding parts). Reverse the process after quantization and rounding, all the way back to saving and displaying the lossy image. Since this has been a bit of a roller coaster project, we will provide some skeleton code for you to complete.
img = io.imread("/anvil/projects/tdm/data/images/barn.jpg")
# TODO: count the nonzero values before anything
original_nonzero =
q1 = np.array([[16,11,10,16,24,40,51,61],
[12,12,14,19,26,28,60,55],
[14,13,16,24,40,57,69,56],
[14,17,22,29,51,87,80,62],
[18,22,37,56,68,109,103,77],
[24,35,55,64,81,104,113,92],
[49,64,78,87,103,121,120,101],
[72,92,95,98,112,100,103,99]]).astype(np.int16)
# convert to YCbCr
img = color.rgb2ycbcr(img)
img = img.astype(np.int16)
# TODO: shift values to center around 0, for each channel
s = img.shape
freq = np.zeros(s)
# downsample <- from previous project
img[:,:,1] = 2*np.round(img[:,:,1]/2)
img[:,:,2] = 2*np.round(img[:,:,2]/2)
# variable to store number of non-zero values
nonzero = 0
for channel in range(3):
for i in np.r_[:s[0]:8]:
for j in np.r_[:s[1]:8]:
# Example: printing a 8x8 block
# Note: this can (and should) be deleted
print(freq[i:(i+8), j:(j+8), channel])
# TODO: apply dct to current 8x8 block
# TODO: apply quantization to current 8x8 block
# TODO: round values of the current 8x8 block
# TODO: increment our count of non-zero values
# TODO: de-quantize the current 8x8 block
# TODO: apply inverse dct to current 8x8 block
# TODO: de-shift the values that were previous shifted, for each channel
# convert back to RGB
img = color.ycbcr2rgb(freq)
# print the number of nonzero values immediately post-quantization
print(f"Non-zero values: {nonzero}")
# print the _very_ approximate reduction of space for this image
print(f"Reduction: {nonzero/original_nonzero}")
# multiply image by 255 to rescale values to be between 0 and 255 instead of 0 and 1
img = img*255
# TODO: clip values greater than 255 and set those values equal to 255
# TODO: clip values less than 0 and set those values equal to 0
# save the "compressed" image so we can display it
# NOTE: The file won't _actually_ be compressed, but it will be visually identical to a compressed image
# since the lossy parts of the algorithm (the parts of the algorithm where we lose "unimportant" pieces of data)
# have already taken place.
io.imsave("compressed.jpg", img, quality=100)
with open("compressed.jpg", "rb") as f:
my_bytes = f.read()
m = hashlib.sha256()
m.update(my_bytes)
print(m.hexdigest())
display.Image("compressed.jpg")
# display the original image, for comparison
display.Image("/anvil/projects/tdm/data/images/barn.jpg")
The hash I got was the following. hash
bc004579948c5b699b0df52eb69ce168147481a2430d828939cfa791f59783e7 |
-
Code used to solve this problem.
-
Output from running the code.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |