Firstly, I will mention major resources, from which one can get all basic information about protocol:
Protocol specification - http://www.adobe.com/devnet/f4v.html
Description of manifest file - (f4m format)
And, of course, basic resource, where it is very clearly, without ink-horn terms, explained how to prepare in practice files for broadcasting using this technology - http://www.thekuroko.com/http-dynamic-streaming-getting-started/
I think there is no sense to repeat the information provided in above blog, so I'm just going to summarize it and point out nuances, which I encountered and have not found any information about it anywhere in the network.
How video streaming playback works? Initially, player requests a manifest file (f4m), from which it gets general information about video stream. Further, if it is written in the manifest, other files, which contain additional management information may be requested. After that, player makes consecutive fragments’ requests. The fragment is the same mp4 file (or more exactly, file of Adobe format, which structure is very close to mp4 structure), but much smaller and contains audio and video data, usually sufficient only for a few seconds of playing. I.e. player requested a fragment, played it, and requested the next fragment, played it, etc.
The first difficulty that I have encountered was that tool from Adobe - f4fpackager, which is used in files preparation on server works exactly with files. We need fragments, because we have live streaming video and at the time of sending video data to player we still have no idea what kind of data we will have in a minute. What is the fragment and what is it’s structure - this information is not available. Only in format specifications those atoms (i.e., such blocks of data in file, with a strictly certain information), that may be present in fragments and are absent in preparation of regular files, are briefly mentioned. But I have not found any detailed description or examples. I.e. the information that I was able to find was absolutely insufficient for development. Thus I had to experiment.
And here, in studying process of atoms’ structure, first surprise was waiting for me. While Microsoft uses structure of atoms in its technology of fragments’ creation, which is very close to mp4 structure, in Adobe everything goes in a different way. For example, in mp4 files in mdat atoms only audio/video data is contained, without any management information. In Adobe most part of management information is present just in mdat atom, with audio/video data jointly. And it is present for a reason, it encapsulate RTMP protocol in itself. I.e. my position, according to which I thought that while examine the format of adaptive streaming from Adobe I will not require knowledge of RTMP protocol, turned out to be mistaken. I had to study RTMP. However, not in full volume, but I still had to. Fortunately, a lot of experimental attempts, general information analysis did its’ part and I have successfully coped with the structure of mdat atom.
Then it was necessary to understand how to form fragments, because as I have already mentioned, f4fpackager works with files. Fortunately, through trial and error, I have successfully coped with this task, as a result, broadcast started to work.
The next scourge was waiting for me here, which, as I have noticed, many broadcasting system encounter – it is video and audio synchronization. I have devoted to this issue a special note — audio and video synchronization
The last step is remained.
I have analyzed the data, which were sending to player now. One could see with the naked eye, that absolutely unimportant management information, sometimes just unnecessary information takes a considerable data part. This information was transferred with every fragment, giving tangible addition to bitrate. I don't know why this was done, maybe there were some objective reasons for that, but I just had a desire to get rid of this unnecessary information. And indeed, after a series of experiments, I have reduced management information for 70 per cent and it absolutely did not affect player’s playback. In particular, this can be considered as an advantage over professional systems of live streaming from Adobe.
Igor, October 2012.
|