Video

Updated . Started .

Video is images in motion.

Definitions

Images

An image is a 3D matrix where one dimension is for color and the other two are for position.

For example, in the Red dimension, I have a rectangualar plane of points. At each point, or pixel, I store a number for how intensely Red the point should be drawn. The same concept extends to the Green and Blue dimensions.

The pixels must be stored. Bit depth is the number of bits required to store one pixel of an image. If we have 3 planes of color, and 8 bits to represent an intensity, it takes 3 * 8 or 24 bits to store one pixel. Bit depth is also referred to as color depth, because it hints at how many colors we can create. A bit depth of 24 implies 2^24 or ~16.7 million color possibilities at each pixel.

The resolution of an image is how many pixels fit in the position dimensions, usually stated as Width times Height. For example, 1280x720 implies 1280 pixels in the horizontal dimension and 720 pixels in the vertical dimension.

Following from resolution, we have the aspect ratio, which is the ratio of width to height in an image or video. The 1280x720 resolution has an aspect ratio of 16:9.

Video

Video is images, or frames, over time. We extend the matrix concept to 4D to store video.

Frame rate is the number of frames per second (FPS) shown in a video. For example, 24 FPS, 30 FPS, and 60 FPS.

Because each frame is an image we can use bit depth to understand storing videos. The bit rate is how many bits per second are needed to represent a video. For example, a video with 60 frames per second, 24 bit depth, and 1280x720 resolution will need 60 _ 24 _ 1280 * 720 = 1,327,104,000 bits per second or 1327.104 Mbps to be stored without compression. This example assumes a constant bit rate or CBR. Some videos are stored with variable bit rate or VBR, which can save space.

Compression

Video takes up a lot of space. If we had a 15 minute long video with a constant bit rate of 1327.104 Mbps, it would take up 1327.104 _ 15 _ 60 = 1194393.6 Megabits or 149.2992 GB.

Space can be saved in a few ways:

  • Chrome subsampling
  • Luma compression
  • I frames, P frames, and B frames
  • Motion block compensation
  • Intra prediction

Chroma subsampling

Counterintuitively, between color and brightness, the human eye is better at perceiving changes in brightness. Rod cells in the eye, responsible for brightness, outnumber cone cells 20 to 1. Read more about Photoreceptor cells on Wikipedia.

Brightness and color are also called luma and chroma, respectively.

To model luma and chroma we try a different approach than RGB. One of the most popular is YCbCr, which is a luma of Y, a chroma blue of Cb, and a chrome red of Cr. Converting from RGB has a formula:

Y = 0.299R + 0.587G + 0.114B

Cb = 0.564(B - Y)

Cr = 0.713(R - Y)

Converting back:

R = Y + 1.402Cr

G = Y - 0.344Cb - 0.714Cr

B = Y + 1.772Cb

Because we are more sensitive to light, we can store maximum luma and minimum chroma data, which is known as chroma subsampling.

Ratios for subsampling are often expressed in 3 parts J:a:b where J is the horizontal luma sampling, a is the count of chroma samples in first row of J pixels, and b is the count of changes of chroma samples between first and second row of J pixels. Common ratios include:

  • 4:4:4 (no subsampling)
  • 4:2:2
  • 4:1:1
  • 4:2:0
  • 4:1:0
  • 3:1:1

Another way to read the ratio is for every J pixels of luma, take a pixels of chroma, and on the next row take b more pixels of chroma.

Codecs & Containers

A video codec is software to compress and decompress a video using a bundle of strategies. Common codecs like H.264/AVC and AV1 are different ways to reduce the size of videos.

A video codec is different from a video container, which is a wrapper format with metadata, audio, and the compressed video as its payload. This is usually seen in the file format, e.g .mp4 for MPEG-4 Part 14 and .mkv for matroska. Containers tell how to playback the contained video and audio.

Reading