Progress
Currently, I'm building the basic video player to practice media processing. At this stage, the player now could handle video pausing and resizing (though resizing may still need some work).
Here is a video resizing issue I encountered.
Additionally, I've switched from using GLFW
to SDL2
for managing interface.
This change is necessary because Apple has deprecated some OpenGL functions, forcing user
to use the freking shader. Since I'm not ready to dive into that right now,
I opted for SDL2
to have a consistent developer experience on both Windows and macOS.
Here's what I've learned during this week's development:
Concepts
PTS, DTS, and AVRational
If you don't control the decoding and display speed in your code, the playback will be extremely fast,
which is undesirable. To solve this, we need to
introduce it to two key concepts: the PTS
and DTS
.
PTS(Presentation Timestamp)
PTS represents the timestamp when this single frame should be displayed.
It is important to note that the time unit of PTS and DTS is NOT
in real-world seconds or milliseconds. Instead, it uses the time unit called
TimeScale
, aka Tick
, which can vary between videos.
But we can always get a tick's length by using this formula:
Here, timebase
is a AVRational
object which defines how many ticks are there in a second.
This allows us to determine the TimeScale
for a video. The timebase
can be obtained from the AVStream
object.
DTS(Decode Timestamp)
DTS indicates the timestamp when the frame was decoded.
Sampling and Audio Playback
Just like video, audio in the media is also encoded. Typically,
it is compressed using an AAC
encoder, which can reduce the audio file
size up to 10 times.
Sampling
When recoding audio, we divide 1 second into many ticks, and we capture the audio data at specific time points. The number of times we capture data per second is called Sample Rate. For example, an audio file with a sample rate of 44.1 kHz means it captures 44,100 distinct sound samples per second.
Besides, the audio data can be stored in different formats, like float32
and int16
,
this is called Format.
Audio Playback
When we configured the SDL_AudioSpec
in our code, the system will periodcally request some audio data to play. So we could write a callback function,
when it needs data, we decode and give it.
void audioCallback(void *userdata, Uint8 *stream, int len) {
auto vr = static_cast<VideoReader *>(userdata);
if (pauseVideo) {
memset(stream, 0, len);
return;
}
while (vr->audioFrameBuffer_.empty()) {
if (vr->decodeAudioPacket()) {
vr->resampleAudioFrame();
break;
}
}
auto &bufPos = vr->audioCurBufferPos_;
int bytesCopied = 0;
while (len > 0) {
if (bufPos >= vr->audioFrameBuffer_.size()) {
bufPos -= vr->audioFrameBuffer_.size();
vr->audioFrameBuffer_.clear();
}
while (vr->audioFrameBuffer_.empty()) {
if (vr->decodeAudioPacket()) {
vr->resampleAudioFrame();
break;
}
}
auto &curBuf = vr->audioFrameBuffer_;
bytesCopied = std::min(curBuf.size() - bufPos, (std::size_t)len);
memcpy(stream, curBuf.data() + bufPos, bytesCopied);
len -= bytesCopied;
stream += bytesCopied;
bufPos += bytesCopied;
}
}
Miscellaneous
Bugs
At this point, the program feels like one giant bug. When it works as expected, it's pretty cool, but the functionality is still far from perfect.
On top of that, there's a memory leak issue, where about 0.4MB of memory is leaked per second. I suspect I may have missed unref some packets or frames, but I'm still investigating…