RTP由IETF的音视频传输工作组在1996年提出,当时的协议版本是RFC1889,后来经修订改为RFC3550.
RTP旨在提供一套在网络上传输音频,视频的协议。
RTP往往和RTCP联合来提供网络音视频的传输。RTP主要传输数据,而RTCP则是检测RTP在网络上的传输情况,和同步音视频。
RTP和RTCP协议标准都在RFC3550
可以参看
http://tools.ietf.org/pdf/rfc3550.pdf
如果只是对RTP包的结构做个了结的话,可以参看下面的图示:
我懒得翻译了,就直接copy英文过来了,大家凑乎着看吧。
The RTP header has a minimum size of 12 bytes. After the header, optional header extensions may be present. This is followed by the RTP payload, the format of which is determined by the particular class of application.[18]
The fields in the header are as follows:
- Version: (2 bits) Indicates the version of the protocol. Current version is 2.[19]
- P (Padding): (1 bit) Used to indicate if there are extra padding bytes at the end of the RTP packet. A padding might be used to fill up a block of certain size, for example as required by an encryption algorithm.[19]
- X (Extension): (1 bit) Indicates presence of an Extension header between standard header and payload data. This is application or profile specific.[19]
- CC (CSRC Count): (4 bits) Contains the number of CSRC identifiers (defined below) that follow the fixed header.[20]
- M (Marker): (1 bit) Used at the application level and defined by a profile. If it is set, it means that the current data has some special relevance for the application.[20]
- PT (Payload Type): (7 bits) Indicates the format of the payload and determines its interpretation by the application. This is specified by an RTP profile. For example, seeRTP Profile for audio and video conferences with minimal control
(RFC 3551).[21] - Sequence Number: (16 bits) The sequence number is incremented by one for each RTP data packet sent and is to be used by the receiver to detect packet loss and to restore packet sequence. The RTP does not specify any action on packet loss;
it is left to the application to take appropriate action. For example, video applications may play the last known frame in place of the missing frame.[22]
According to
RFC 3550, the initial value of the sequence number should be random to make
known-plaintext attacks on
encryption more difficult.[20] RTP provides no guarantee of delivery, but
the presence of sequence numbers makes it possible to detect missing packets.[3] - Timestamp: (32 bits) Used to enable the receiver to play back the received samples at appropriate intervals. When several media streams are present, the timestamps are independent in each stream, and may not be relied upon for media synchronization.
The granularity of the timing is application specific. For example, an audio application that samples data once every 125 µs (8 kHz, a common sample rate in digital telephony) could use that value as its clock resolution. The clock granularity is one of the
details that is specified in the RTP profile for an application.[22] - SSRC: (32 bits) Synchronization source identifier uniquely identifies the source of a stream. The synchronization sources within the same RTP session will be unique.[20]
- CSRC: Contributing source IDs enumerate contributing sources to a stream which has been generated from multiple sources.[20]
- Extension header: (optional) The first 32-bit word contains a profile-specific identifier (16 bits) and a length specifier (16 bits) that indicates the length of the extension (EHL=extension header length) in 32-bit units, excluding the
32 bits of the extension header.[20]
版权所有,禁止转载. 如需转载,请先征得博主的同意,并且表明文章出处,否则按侵权处理.