SIP Codecs and Voice Media Introduction

This article will introduce some basic knowledge about how voice is captured and transmitted in the VoIP environment.

How can we convert analog voice signal to digital voice signal?

The steps are: Sampling -> Quantization -> Compression 


The headset or the microphone gather the voice and the program takes samples and quantize the data. Before transmitting the data to the other side, we need to compress it by the algorithm of compression which is called codec. The codec will be encapsulated in the RTP packets, and RTP packet by UDP to the other side called media stream.

What is Codec?

Codec(Coder-DECoder) is used to compress and decompress the voice signal.

  • It compress the quantization signal to reduce the consumption of network bandwidth so that it can be transmitted through the internet successfully.
  • It converts compressed digital signal back to its original analog signal

Codecs is a very import part in VOIP technology, it optimizes the media stream based on application requirements and network bandwidth. Some implementation rely on narrowband and compressed speech, while others support high fidelity stereo codecs. 

VOIP endpoints usually support several kinds of codecs, they use the SDP to negotiate when establishing calls.

We will introduce some popular codecs in the following.


G.711 has two compress algorithm, A-Law and U-Law. U-Law is used in the USA and Japan. A-Law is used in Europe and other places in the world.

The bitrate is 64 kbit/s for one direction, so a call consumes 128 kbit/s.

it works best in the local area networks where we have a lot of bandwidth available.


G.729 is the original codec using a high-complexity algorithm.

G.729A is a version has a medium complexity, and is compatible with G.729. It provides a slightly lower voice quality.

Because of its low bandwidth requirements, G.729 is mostly used in the VoIP environment.

The bitrate is 8 kbit/s for one direction, so a call consumes 16 kbit/s.


iLBC is short for Internet Low Bitrate Codec is an open source narrowband audio codec.

If you in a environment where a lot of packet loss and the voice quality is bad, iLBC is a good choice for you. It will provide a good quality when lost frames and delayed happens.

The bitrate is 15.2 kbit/s for 20 ms frames, 13.33 kbit/s for 30 ms frames.

For Yeastar PBX, you can see the codecs supported in the following figure which allows you to select as required.


Voice Media

Media is the digitized voice data transmitted between two endpoints when the call established.

The voice media protocols used in the Voip environment are below:


  • RTP(Real-Time Transport Protocol)

RTP can be sent over UDP or TCP, usually we use UDP transport to transmit the RTP packets. Because UDP is faster then TCP and we want less delay in voice as well as the algorithm will handle the packet loss and delay. The RTP carries the digitized voice samples. 

  • RTCP(Real-Time Control Protocol)

RTCP is a companion protocol to RTP, RTCP carries information about the RTP stream such as latency, jitter and so on. RTCP will be sent during the session.

  • SRTP

SRTP use (AES) as the default cipher, and the payload in the RTP is encrypted. So even when some one capture this data stream he cannot play it unless he got the key. And the key is stored in the SIP SDP message, so if you want to use SRTP you must use TLS for SIP at the same time.


SRTCP securely provides the same features to RTCP.

Below are the screen captures for SRTP settings in the Yeastar PBX, you can enable SRTP for trunks and extensions.






Have more questions? Submit a request


Please sign in to leave a comment.