-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtutorial03.html
399 lines (372 loc) · 19.6 KB
/
tutorial03.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<!-- Mirrored from dranger.com/ffmpeg/tutorial03.html by HTTrack Website Copier/3.x [XR&CO'2014], Fri, 05 Jun 2020 09:18:26 GMT -->
<head>
<title>ffmpeg tutorial</title>
<link href="ffmpeg.css" rel="stylesheet" type="text/css">
</head>
<body>
<h2 class="center">An ffmpeg and SDL Tutorial</h2><div class="nav">
<span class="left">Page
<a href="tutorial01.html">1</a>
<a href="tutorial02.html">2</a>
3
<a href="tutorial04.html">4</a>
<a href="tutorial05.html">5</a>
<a href="tutorial06.html">6</a>
<a href="tutorial07.html">7</a>
<a href="end.html">End</a>
</span>
<span class="right">
<a href="tutorial02.html">Prev</a>
<a href="ffmpeg.html">Home</a>
<a href="tutorial04.html">Next</a>
</span>
<span class="clear"> </span>
</div><div>
<a href="tutorial03.txt">Text version</a></div>
<h2>Tutorial 03: Playing Sound</h2>
<span class="codelink">Code: <a href="tutorial03.c">tutorial03.c</a></span>
<h3>Audio</h3>
<p>
So now we want to play sound. SDL also gives us methods for outputting sound. The <tt><a href="functions.html#SDL_OpenAudio">SDL_OpenAudio</a>()</tt> function is used to open the audio device itself. It takes as arguments an <tt><a href="data.html#SDL_AudioSpec">SDL_AudioSpec</a></tt> struct, which contains all the information about the audio we are going to output.
</p>
<p>
Before we show how you set this up, let's explain first about how audio is handled by computers. Digital audio consists of a long stream of <b>samples</b>. Each sample represents a value of the audio <a href="http://en.wikipedia.org/wiki/Waveform">waveform</a>. Sounds are recorded at a certain <b>sample rate</b>, which simply says how fast to play each sample, and is measured in number of samples per second. Example sample rates are 22,050 and 44,100 samples per second, which are the rates used for radio and CD respectively. In addition, most audio can have more than one channel for stereo or surround, so for example, if the sample is in stereo, the samples will come 2 at a time. When we get data from a movie file, we don't know how many samples we will get, but ffmpeg will not give us partial samples - that also means that it will not split a stereo sample up, either.
</p>
<p>
SDL's method for playing audio is this: you set up your audio options: the sample rate (called "freq" for <b>frequency</b> in the SDL struct), number of channels, and so forth, and we also set a callback function and userdata. When we begin playing audio, SDL will continually call this callback function and ask it to fill the audio buffer with a certain number of bytes. After we put this information in the <tt><a href="data.html#SDL_AudioSpec">SDL_AudioSpec</a></tt> struct, we call <tt><a href="functions.html#SDL_OpenAudio">SDL_OpenAudio</a>()</tt>, which will open the audio device and give us back <i>another</i> AudioSpec struct. These are the specs we will <i>actually</i> be using — we are not guaranteed to get what we asked for!
</p>
<h3>Setting Up the Audio</h3>
<p>Keep that all in your head for the moment, because we don't actually have any information yet about the audio streams yet! Let's go back to the place in our code where we found the video stream and find which stream is the audio stream.
<pre>
// Find the first video stream
videoStream=-1;
audioStream=-1;
for(i=0; i < pFormatCtx->nb_streams; i++) {
if(pFormatCtx->streams[i]->codec->codec_type==AVMEDIA_TYPE_VIDEO
&&
videoStream < 0) {
videoStream=i;
}
if(pFormatCtx->streams[i]->codec->codec_type==AVMEDIA_TYPE_AUDIO &&
audioStream < 0) {
audioStream=i;
}
}
if(videoStream==-1)
return -1; // Didn't find a video stream
if(audioStream==-1)
return -1;
</pre>
From here we can get all the info we want from the <tt><a href="data.html#AVCodecContext">AVCodecContext</a></tt> from the stream, just like we did with the video stream:
<pre>
<a href="data.html#AVCodecContext">AVCodecContext</a> *aCodecCtxOrig;
<a href="data.html#AVCodecContext">AVCodecContext</a> *aCodecCtx;
aCodecCtxOrig=pFormatCtx->streams[audioStream]->codec;
</pre>
</p>
<p>
If you remember from the previous tutorials, we still need to open the audio codec itself. This is straightforward:
<pre>
AVCodec *aCodec;
aCodec = <a href="functions.html#avcodec_find_decoder">avcodec_find_decoder</a>(aCodecCtx->codec_id);
if(!aCodec) {
fprintf(stderr, "Unsupported codec!\n");
return -1;
}
// Copy context
aCodecCtx = avcodec_alloc_context3(aCodec);
if(avcodec_copy_context(aCodecCtx, aCodecCtxOrig) != 0) {
fprintf(stderr, "Couldn't copy codec context");
return -1; // Error copying codec context
}
/* set up SDL Audio here */
<a href="functions.html#avcodec_open">avcodec_open2</a>(aCodecCtx, aCodec, NULL);
</pre>
</p>
<p>
Contained within the codec context is all the information we need to set up our audio:
<pre>
wanted_spec.freq = aCodecCtx->sample_rate;
wanted_spec.format = AUDIO_S16SYS;
wanted_spec.channels = aCodecCtx->channels;
wanted_spec.silence = 0;
wanted_spec.samples = SDL_AUDIO_BUFFER_SIZE;
wanted_spec.callback = audio_callback;
wanted_spec.userdata = aCodecCtx;
if(<a href="functions.html#SDL_OpenAudio">SDL_OpenAudio</a>(&wanted_spec, &spec) < 0) {
fprintf(stderr, "<a href="functions.html#SDL_OpenAudio">SDL_OpenAudio</a>: %s\n", SDL_GetError());
return -1;
}
</pre>
Let's go through these:
<ul>
<li><tt>freq</tt>: The sample rate, as explained earlier.</li>
<li><tt>format</tt>: This tells SDL what format we will be giving it. The "S" in "S16SYS" stands for "signed", the 16 says that each sample is 16 bits long, and "SYS" means that the endian-order will depend on the system you are on. This is the format that <tt><a href="functions.html#avcodec_decode_audio2">avcodec_decode_audio2</a></tt> will give us the audio in.</li>
<li><tt>channels</tt>: Number of audio channels.</li>
<li><tt>silence</tt>: This is the value that indicated silence. Since the audio is signed, 0 is of course the usual value.</li>
<li><tt>samples</tt>: This is the size of the audio buffer that we would like SDL to give us when it asks for more audio. A good value here is between 512 and 8192; ffplay uses 1024.</li>
<li><tt>callback</tt>: Here's where we pass the actual callback function. We'll talk more about the callback function later.</li>
<li><tt>userdata</tt>: SDL will give our callback a void pointer to any user data that we want our callback function to have. We want to let it know about our codec context; you'll see why.</li>
</ul>
Finally, we open the audio with <tt><a href="functions.html#SDL_OpenAudio">SDL_OpenAudio</a></tt>.
</p>
<h3>Queues</h3>
<p>
There! Now we're ready to start pulling audio information from the stream. But what do we do with that information? We are going to be continuously getting packets from the movie file, but at the same time SDL is going to call the callback function! The solution is going to be to create some kind of global structure that we can stuff audio packets in so our <tt>audio_callback</tt> has something to get audio data from! So what we're going to do is to create a <b>queue</b> of packets. ffmpeg even comes with a structure to help us with this: <tt><a href="data.html#AVPacketList">AVPacketList</a></tt>, which is just a linked list for packets. Here's our queue structure:
</p>
<pre>
typedef struct PacketQueue {
<a href="data.html#AVPacketList">AVPacketList</a> *first_pkt, *last_pkt;
int nb_packets;
int size;
<a href="data.html#SDL_mutex">SDL_mutex</a> *mutex;
<a href="data.html#SDL_cond">SDL_cond</a> *cond;
} PacketQueue;
</pre>
First, we should point out that <tt>nb_packets</tt> is not the same as <tt>size</tt> — <tt>size</tt> refers to a byte size that we get from <tt>packet->size</tt>. You'll notice that we have a <a href="http://en.wikipedia.org/wiki/Mutual_exclusion">mutex</a> and a <a href="http://en.wikipedia.org/wiki/Condition_variable">condtion variable</a> in there. This is because SDL is running the audio process as a separate thread. If we don't lock the queue properly, we could really mess up our data. We'll see how in the implementation of the queue. Every programmer should know how to make a queue, but we're including this so you can learn the SDL functions.
</p>
<p>
First we make a function to initialize the queue:
<pre>
void packet_queue_init(PacketQueue *q) {
memset(q, 0, sizeof(PacketQueue));
q->mutex = <a href="functions.html#SDL_CreateMutex">SDL_CreateMutex</a>();
q->cond = <a href="functions.html#SDL_CreateCond">SDL_CreateCond</a>();
}
</pre>
Then we will make a function to put stuff in our queue:
<pre>
int packet_queue_put(PacketQueue *q, <a href="data.html#AVPacket">AVPacket</a> *pkt) {
<a href="data.html#AVPacketList">AVPacketList</a> *pkt1;
if(<a href="functions.html#av_dup_packet">av_dup_packet</a>(pkt) < 0) {
return -1;
}
pkt1 = <a href="functions.html#av_malloc">av_malloc</a>(sizeof(<a href="data.html#AVPacketList">AVPacketList</a>));
if (!pkt1)
return -1;
pkt1->pkt = *pkt;
pkt1->next = NULL;
<a href="functions.html#SDL_LockMutex">SDL_LockMutex</a>(q->mutex);
if (!q->last_pkt)
q->first_pkt = pkt1;
else
q->last_pkt->next = pkt1;
q->last_pkt = pkt1;
q->nb_packets++;
q->size += pkt1->pkt.size;
<a href="functions.html#SDL_CondSignal">SDL_CondSignal</a>(q->cond);
<a href="functions.html#SDL_UnlockMutex">SDL_UnlockMutex</a>(q->mutex);
return 0;
}
</pre>
<tt><a href="functions.html#SDL_LockMutex">SDL_LockMutex</a>()</tt> locks the mutex in the queue so we can add something to it, and then <tt><a href="functions.html#SDL_CondSignal">SDL_CondSignal</a>()</tt> sends a signal to our get function (if it is waiting) through our condition variable to tell it that there is data and it can proceed, then unlocks the mutex to let it go.
</p>
<p>
Here's the corresponding get function. Notice how <tt><a href="functions.html#SDL_CondWait">SDL_CondWait</a>()</tt> makes the function <b>block</b> (i.e. pause until we get data) if we tell it to.
<pre>
int quit = 0;
static int packet_queue_get(PacketQueue *q, <a href="data.html#AVPacket">AVPacket</a> *pkt, int block) {
<a href="data.html#AVPacketList">AVPacketList</a> *pkt1;
int ret;
<a href="functions.html#SDL_LockMutex">SDL_LockMutex</a>(q->mutex);
for(;;) {
if(quit) {
ret = -1;
break;
}
pkt1 = q->first_pkt;
if (pkt1) {
q->first_pkt = pkt1->next;
if (!q->first_pkt)
q->last_pkt = NULL;
q->nb_packets--;
q->size -= pkt1->pkt.size;
*pkt = pkt1->pkt;
<a href="functions.html#av_free">av_free</a>(pkt1);
ret = 1;
break;
} else if (!block) {
ret = 0;
break;
} else {
<a href="functions.html#SDL_CondWait">SDL_CondWait</a>(q->cond, q->mutex);
}
}
<a href="functions.html#SDL_UnlockMutex">SDL_UnlockMutex</a>(q->mutex);
return ret;
}
</pre>
As you can see, we've wrapped the function in a forever loop so we will be sure to get some data if we want to block. We avoid looping forever by making use of SDL's <t><a href="functions.html#SDL_CondWait">SDL_CondWait</a>()</tt> function. Basically, all CondWait does is wait for a signal from <tt><a href="functions.html#SDL_CondSignal">SDL_CondSignal</a>()</tt> (or <tt>SDL_CondBroadcast()</tt>) and then continue. However, it looks as though we've trapped it within our mutex — if we hold the lock, our put function can't put anything in the queue! However, what <tt>SDL_CondWait()</tt> also does for us is to unlock the mutex we give it and then attempt to lock it again once we get the signal.
</p>
<h3>In Case of Fire</h3>
<p>
You'll also notice that we have a global <tt>quit</tt> variable that we check to make sure that we haven't set the program a quit signal (SDL automatically handles TERM signals and the like). Otherwise, the thread will continue forever and we'll have to <tt>kill -9</tt> the program.
<pre>
SDL_PollEvent(&event);
switch(event.type) {
case SDL_QUIT:
quit = 1;
</pre>
We make sure to set the <tt>quit</tt> flag to 1.
</p>
<h3>Feeding Packets</h3>
<p>
The only thing left is to set up our queue:
<pre>
PacketQueue audioq;
main() {
...
<a href="functions.html#avcodec_open">avcodec_open2</a>(aCodecCtx, aCodec, NULL);
packet_queue_init(&audioq);
<a href="functions.html#SDL_PauseAudio">SDL_PauseAudio</a>(0);
</pre>
<tt><a href="functions.html#SDL_PauseAudio">SDL_PauseAudio</a>()</tt> finally starts the audio device. It plays silence if it doesn't get data; which it won't right away.
</p>
<p>
So, we've got our queue set up, now we're ready to start feeding it packets. We go to our packet-reading loop:
<pre>
while(<a href="functions.html#av_read_frame">av_read_frame</a>(pFormatCtx, &packet)>=0) {
// Is this a packet from the video stream?
if(packet.stream_index==videoStream) {
// Decode video frame
....
}
} else if(packet.stream_index==audioStream) {
packet_queue_put(&audioq, &packet);
} else {
<a href="functions.html#av_free_packet">av_free_packet</a>(&packet);
}
</pre>
Note that we don't free the packet after we put it in the queue. We'll free it later when we decode it.
</p>
<h3>Fetching Packets</h3>
<p>
Now let's finally make our <tt>audio_callback</tt> function to fetch the packets on the queue. The callback has to be of the form <tt>void callback(void *userdata, Uint8 *stream, int len)</tt>, where <tt>userdata</tt> of course is the pointer we gave to SDL, <tt>stream</tt> is the buffer we will be writing audio data to, and <tt>len</tt> is the size of that buffer. Here's the code:
<pre>
void audio_callback(void *userdata, Uint8 *stream, int len) {
<a href="data.html#AVCodecContext">AVCodecContext</a> *aCodecCtx = (AVCodecContext *)userdata;
int len1, audio_size;
static uint8_t audio_buf[(AVCODEC_MAX_AUDIO_FRAME_SIZE * 3) / 2];
static unsigned int audio_buf_size = 0;
static unsigned int audio_buf_index = 0;
while(len > 0) {
if(audio_buf_index >= audio_buf_size) {
/* We have already sent all our data; get more */
audio_size = audio_decode_frame(aCodecCtx, audio_buf,
sizeof(audio_buf));
if(audio_size < 0) {
/* If error, output silence */
audio_buf_size = 1024;
memset(audio_buf, 0, audio_buf_size);
} else {
audio_buf_size = audio_size;
}
audio_buf_index = 0;
}
len1 = audio_buf_size - audio_buf_index;
if(len1 > len)
len1 = len;
memcpy(stream, (uint8_t *)audio_buf + audio_buf_index, len1);
len -= len1;
stream += len1;
audio_buf_index += len1;
}
}
</pre>
This is basically a simple loop that will pull in data from another function we will write, <tt>audio_decode_frame()</tt>, store the result in an intermediary buffer, attempt to write <tt>len</tt> bytes to <tt>stream</tt>, and get more data if we don't have enough yet, or save it for later if we have some left over. The size of <tt>audio_buf</tt> is 1.5 times the size of the largest audio frame that ffmpeg will give us, which gives us a nice cushion.
</p>
<h3>Finally Decoding the Audio</h3>
<p>
Let's get to the real meat of the decoder, <tt>audio_decode_frame</tt>:
<pre>
int audio_decode_frame(<a href="data.html#AVCodecContext">AVCodecContext</a> *aCodecCtx, uint8_t *audio_buf,
int buf_size) {
static <a href="data.html#AVPacket">AVPacket</a> pkt;
static uint8_t *audio_pkt_data = NULL;
static int audio_pkt_size = 0;
static <a href="data.html#AVFrame">AVFrame</a> frame;
int len1, data_size = 0;
for(;;) {
while(audio_pkt_size > 0) {
int got_frame = 0;
len1 = <a href="functions.html#avcoded_decode_audio4">avcodec_decode_audio4</a>(aCodecCtx, &frame, &got_frame, &pkt);
if(len1 < 0) {
/* if error, skip frame */
audio_pkt_size = 0;
break;
}
audio_pkt_data += len1;
audio_pkt_size -= len1;
data_size = 0;
if(got_frame) {
data_size = av_samples_get_buffer_size(NULL,
aCodecCtx->channels,
frame.nb_samples,
aCodecCtx->sample_fmt,
1);
assert(data_size <= buf_size);
memcpy(audio_buf, frame.data[0], data_size);
}
if(data_size <= 0) {
/* No data yet, get more frames */
continue;
}
/* We have data, return it and come back for more later */
return data_size;
}
if(pkt.data)
av_free_packet(&pkt);
if(quit) {
return -1;
}
if(packet_queue_get(&audioq, &pkt, 1) < 0) {
return -1;
}
audio_pkt_data = pkt.data;
audio_pkt_size = pkt.size;
}
}
</pre>
This whole process actually starts towards the end of the function, where we call <tt>packet_queue_get()</tt>. We pick the packet up off the queue, and save its information. Then, once we have a packet to work with, we call <tt><a href="functions.html#avcodec_decode_audio4">avcodec_decode_audio4</a>()</tt>, which acts a lot like its sister function, <tt><a href="functions.html#avcodec_decode_video">avcodec_decode_video</a>()</tt>, except in this case, a packet might have more than one frame, so you may have to call it several times to get all the data out of the packet.
Once we have the frame, we simply copy it to our audio buffer, making sure the data_size is smaller than our audio buffer.
Also, remember the cast to audio_buf, because SDL gives an 8 bit int buffer, and ffmpeg gives us data in a 16 bit int buffer. You should also notice the difference between <tt>len1</tt> and <tt>data_size</tt>. <tt>len1</tt> is how much of the packet we've used, and <tt>data_size</tt> is the amount of raw data returned.
</p>
<p>
When we've got some data, we immediately return to see if we still need to get more data from the queue, or if we are done. If we still had more of the packet to process, we save it for later. If we finish up a packet, we finally get to free that packet.
</p>
<p>
So that's it! We've got audio being carried from the main read loop to the queue, which is then read by the <tt>audio_callback</tt> function, which hands that data to SDL, which SDL beams to your sound card. Go ahead and compile:
<pre>
gcc -o tutorial03 tutorial03.c -lavutil -lavformat -lavcodec -lswscale -lz -lm \
`sdl-config --cflags --libs`
</pre>
Hooray! The video is still going as fast as possible, but the audio is playing in time. Why is this? That's because the audio information has a sample rate — we're pumping out audio information as fast as we can, but the audio simply plays from that stream at its leisure according to the sample rate.
</p>
<p>
We're almost ready to start syncing video and audio ourselves, but first we need to do a little program reorganization. The method of queueing up audio and playing it using a separate thread worked very well: it made the code more managable and more modular. Before we start syncing the video to the audio, we need to make our code easier to deal with. Next time: Spawning Threads!
</p>
<p>
<em><b>>></b> <a href="tutorial04.html">Spawning Threads</a></em>
</p>
<hr>
<div class="links">
<a href="functions.html">Function Reference</a><br>
<a href="data.html">Data Reference</a>
</div>
<div class="footer">
<table>
<tr><th>email:</th> <td>dranger at gmail dot com</td></tr>
</table>
<a href="http://www.dranger.com/">Back to dranger.com</a>
</div>
<span class="fineprint">This work is licensed under the Creative Commons Attribution-Share Alike 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.5/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
<br />
<br />
Code examples are based off of FFplay, Copyright (c) 2003 Fabrice Bellard, and a tutorial by Martin Bohme.
</span>
</body>
<!-- Mirrored from dranger.com/ffmpeg/tutorial03.html by HTTrack Website Copier/3.x [XR&CO'2014], Fri, 05 Jun 2020 09:18:28 GMT -->
</html>