Video Processing
- Video Frames: Common types include I-frames (keyframes, containing a full picture, hence large data size), P-frames (predictive frames, referencing image information encoded in preceding I-frames), and B-frames (bi-predictive frames, referencing preceding I-frames, preceding P-frames, and subsequent I-frames). Have you ever noticed a slight rewind of 1-2 seconds when dragging the progress bar while watching videos online? This happens because the current frame at that position is not an I-frame, meaning it lacks a complete image.
- Resolution: Refers to the size or dimensions of the image, such as 720, 1080, 2k, or 4k. The suffix "p" represents progressive scan, while "i" represents interlaced scan.
- Bitrate (Data Rate): The number of bits (units of information) played per second of media (including video and audio). The file size calculation formula is:
File Size () = Bitrate () × Duration () - Frame Rate (FPS): The number of frames transmitted per second, measured in fps (frames per second) or in "Hertz" (Hz). The range perceptible to the human eye is typically 15–75 fps.
- Refresh Rate: The number of times the screen refreshes (redraws the image) per second, measured in Hertz (Hz).
- Bit Depth: Refers to the number of bits used to represent each sample, impacting the quality and detail of images or audio. For example, if RGB uses one byte to represent a single color, the bit depth is 8 bits.
- DPI (Dots Per Inch): Represents the number of pixels per inch. While not strictly a video parameter, it is often used in printing. For example, 300 dpi is standard for high-quality posters.
- PTS (Presentation Time Stamp): Indicates when a particular frame or sample should be presented or rendered to the user.
- DTS (Decoding Time Stamp): Represents the time at which a frame or sample should be decoded.
The limitation on bitrate is essentially a limitation on data size. For live streaming and other streaming media, a maximum bitrate is usually set to prevent stuttering caused by insufficient bandwidth on the client side.
Encoders use maximum bitrate settings to perform lossy compression on the video.
I-frames, P-frames, and B-frames are compression methods used in codecs like h264
/h265
. These concepts may not exist in other encoding formats.
- GOP (Group of Pictures): A complete group of video frames. Each group must start with an I-frame, though other frames within the group may also be I-frames. GOP is typically configured for live streaming and other streaming media to mitigate visual artifacts caused by network issues.
- GOP is generally set to 1–2 times the frame rate.
Audio Processing
- Sampling Rate:
- Defines the number of samples extracted from the audio signal per second. Common sampling rates include 44.1 kHz (CD quality) and 48 kHz (commonly used in video production).
- Bit Depth:
- Specifies the number of bits per audio sample, determining the dynamic range of the audio signal. Common bit depths include 16-bit and 24-bit.
- Channels:
- Refers to independent paths of audio signal transmission. Mono audio consists of a single sound path, while stereo audio includes both left and right channels.
- Codec:
- Algorithms or devices used to encode audio signals into a digital format or decode them back into audio signals. Common audio codecs include MP3, AAC, and FLAC.
- Frequency:
- Represents the vibration frequency of sound waves, usually measured in Hertz (Hz). Humans can hear frequencies ranging from approximately 20 Hz to 20,000 Hz.
- Waveform:
- Represents the graphical shape of a sound, used to visualize audio signals. Common waveforms include sine waves, square waves, and sawtooth waves.
- Acoustic Model:
- In speech recognition, it associates sounds with speech units (phonemes) using statistical models.
- Mixing:
- The process of combining multiple audio signals into a single output. Mixing often involves balancing volume and channels.
- Echo Cancellation:
- Technology to reduce or eliminate echoes during communication, commonly used in voice calls and audio-video conferencing.
- Audio Effects:
- Various processes to modify or enhance audio signals, such as equalization, reverb, and chorus effects.
- Real-time Audio Processing:
- Techniques used to process audio in real-time applications, such as live audio stream processing or real-time audio effects.
- MIDI (Musical Instrument Digital Interface):
- A digital communication protocol used for controlling audio equipment, instruments, and computers, widely applied in music production.
- Audio Frames: Since storing a timestamp for each sample is inefficient, audio frames are introduced. Each audio frame is a collection of multiple audio samples, and the playback time depends on the PTS. The PTS represents the time the first sample in the audio frame begins playback.
- Bitrate: Audio also has a bitrate, commonly 128 Kbps.
It is generally accepted that smooth, distortion-free audio requires a sampling rate of at least 40 kHz.
Common Audio Sampling Rates:
- 8 kHz: Audio calls and surveillance recordings
- 22.05 kHz, 24 kHz: FM radio broadcasts
- 44.1 kHz: CD quality
- 48 kHz: Common for online videos and movies