Standards And Protocols

Introduction

Two standards compete for IP telephony signaling. The older and more widely accepted standard is the ITU (International Telecommunication Union) recommendation H.323, which defines a multimedia communications system over packet-switched networks, including IP networks.The other standard, Session Initiation Protocol (SIP), comes from the IETF (Internet Engineering Task Force) working group. Another well established protocol by the name of MGCP (Also comes from the IETF) will be described in details along with the other mentioned protocols.

H.323

H.323 is a standard approved by the International Telecommunication Union (ITU) in 1996 to promote compatibility in videoconference transmissions over IP networks. H.323 was originally promoted as a way to provide consistency in audio, video and data packet transmissions in the event that a local area network (LAN) did not provide guaranteed service quality (QOS). Although it was doubtful at first whether manufacturers would adopt H.323, it is now considered to be the standard for interoperability in audio, video and data transmissions as well as Internet phone and voice-over-IP because it addresses call control and management for both point-to-point and multipoint conferences as well as gateway administration of media traffic, bandwidth and user participation.

H.323, which describes how multimedia communications occur between terminals, network equipment and services, is part of a larger group of ITU recommendations for multi-media interoperability called H.3x. The latest of these recommendations, H.248, is a recommendation to provide a single standard for the control of gateway devices in multi-media packet transmissions to allow calls to connect from a LAN to a Public Switched Telephone Network (PSTN), as well as to other standards-based terminals. This recommendation was announced in August 2000, by the ITU-TU Study Group 16 and the Megaco Working Group of the Internet Engineering Task Force (IETF).

The H.323 standard specifies 4 kinds of components, which, when networked together, provide the point-to-point and point-to-multipoint multimedia-communication services: Terminals, Gateways, Gatekeepers, MCUs (Multi-point Control Unit)

Call Control

The call control functions are the heart of the H.323 terminal. These functions include signaling for call setup, capability exchange, signaling of commands and indications, and messages to open and describe the content of logical channels. All audio, video, and control signals pass through a control layer that formats the data streams into messages for output to the network interface. The reverse process takes place for incoming streams. This layer also performs logical framing, sequence numbering, error detection, and error correction as appropriate to each media type. Overall system control is provided by three separate signaling functions: the H.245 Control Channel, the Q.931 Call Signalling Channel, and the RAS Channel.

H.225 Call Signaling

The H.255 standard defines a layer that formats the transmitted video, audio, data, and control streams for output to the network, and retrieves the corresponding streams from the network. As part of audio and video transmissions, H.225 uses the packet format specified RTP and RTCP specifications for the following tasks: After initiating a call, one or more RTP or RTCP connections are established:
RTP (Real Time Transport Protocol) provides end-to-end delivery services of real-time audio and video. RTP is typically used to transport data via UDP. RTP, together with UDP, provides transport-protocol functionality: payload-type identification, sequence numbering, time stamping, and delivery monitoring.
RTCP (Real Time Control Protocol) provides control services, and mainly feedback on the quality of the data distribution.
The Call Signalling Channel uses Q.931 to establish the connection between two terminals:

Q.931

This protocol defines how each H.323 layer interacts with peer layers, so that participants can interoperate with agreed upon formats. The Q.931 protocol resides within H.225. As part of H.323 call control, Q.931 is a link layer protocol for establishing connections and framing data. Q.931 provides a method for defining logical channels inside of a larger channel. Q.931 messages contain a protocol discriminator that identifies each unique message with a call reference value and a message type. The H.225.0 layer then specifies how these Q.931 messages are received and processed.
H.255 Registration, Admission, and Status (RAS)
The H.225.0 standard also includes registration, admission, and status (RAS) control. RAS is the protocol between endpoints (terminals and gateways) and gatekeepers which makes the connections between them available. The RAS is used to perform registration, admission control, bandwidth changes, status, and disengage procedures between endpoints and gatekeepers.
RAS is not used if a Gatekeeper is not present.

H.245 Control Signaling

This standard provides the call control mechanism that allows H.323-compatible terminals to connect to each other.
The H.245 Control Channel is a reliable channel that carries control messages governing operation of the H.323 endpoint, These control messages carry information related to the following:
There is only one H.245 Control Channel per call.

RTP / RTCP

RTP is the Internet-standard protocol for the transport of real-time data, including audio and video. It can be used for media-on-demand as well as interactive services such as Internet telephony. RTP consists of a data and a control part. The latter is called RTCP.
The Real-Time Protocol (RTP) and the Real-Time Control Protocol (RTCP) work in concert, with RTCP monitoring RTP. These protocols (with H.245) also work with IP Multicast to guarantee timing, not data integrity, for UDP.
RTP handles timing issues by time stamping and sequencing every UDP packet transmitted and including information on the synchronization of audio and video streams, expected data rate, expected packet rate, and distance in time to sender. The receiver, with appropriate buffering, can eliminate duplicate packets, reorder out-of-sequence packets, and synchronize sound, video, and data. Thus, when delays occur, the receiver can play back information that is consistently spaced in time and recover from jitter or other timing skews introduced by the network. Manufacturers use the RTCP sender and receiver QoS reports (listing statistics about lost packets, sequencing, and jitter) to detect network congestion and take corrective action, such as reducing media stream data rates.

Name Protocol Description
H.323 Specification of the system
H.225.0 Call control (RAS), call setup (Q.931-like protocol), and packetization and synchronization of media stream
H.235 Security protocol for authentication, integrity, privacy, etc.
H.245 Capability exchange communication and mode switching
H.450 Supplementary services including call holding, transfer, forwarding, etc
H.246 Interoperability with circuit-switched services
H.332 For large size conferencing
H.26x Video codecs including H.261 and H.263
G.7xx Audio codecs including G.711, G.723, G.729, G.728, etc
Table 1: ITU-T recommendations that are part of the H.323 specification.

For more details regarding the H.323 suite of protocols, please refer to an H.323 project at:
H.323 project

SIP

IETF has also specified a multimedia communications protocol suite. In the IETF architecture, the media flows are carried using RTP, just like in H.323. Therefore, the main difference between H.323 and IETF specifications is how the call signaling and control is achieved.
The primary protocol that handles call signaling and control in the IETF specification is SIP. SIP is an application layer control protocol that can establish, modify and terminate multimedia sessions or calls. There are two major architectural elements to SIP: the user agent (UA), and the network server. The UA resides at the SIP end stations, and contains two components: a user agent client (UAC) which is responsible for issuing SIP requests, and a user agent server (UAS), which responds to such requests. There are three different network server types, a redirect server, a proxy server, and a registrar. A basic SIP call does not need servers, but some of the more powerful features depend upon them. To the first degree of approximation, the SIP User Agent is equivalent to a H.323 terminal (or the packet-network side of a gateway), and the SIP network servers are equivalent to a H.323 gatekeeper.
The most generic SIP operation involves a SIP UAC issuing a request, a SIP proxy server acting as end-user location discovery agent and a SIP UAS accepting the call. A successful SIP invitation consists of two requests: INVITE followed by ACK. The INVITE message contains session description that informs the called party what type of media the caller can accept and where it wishes the media data to be sent. SIP addresses are referred to as SIP Uniform Resource Locators (SIP-URLs), which are of the form sip:user@host.domain. SIP message format is based on the Hyper Text Transport Protocol (HTTP) message format, which uses a human-readable, text-based encoding.
Redirect servers process an INVITE message by sending back the SIP-URL where the callee is reachable. Proxy servers perform application layer routing of the SIP requests and responses. A proxy server can either be stateful or stateless. A stateful proxy holds information about the call during the entire time the call is up, while a stateless proxy processes a message and then forgets everything about the call until the next message arrives. Furthermore, proxies can either be forking or non-forking. A forking proxy can, for example, ring several phones at once until somebody takes the call. Registrar servers are used to record the SIP address (called a SIP URL) and the associated IP address. The
most common use of a registrar server is to register after start-up, so that when an INVITE request arrives for the SIP URL used in the REGISTER message, the proxy or redirect server forwards the request correctly. Note that usually a SIP network server implements a combination of different types of servers.
SIP is used to establish, modify, and terminate multimedia sessions. However, it only handles the communication between the caller and the callee, the endpoint addressing, and user location. There needs to be a description about a multimedia session within a SIP request and response message, as well as an announcement for a session. IETF Session Description Protocol (SDP)5 is used together with SIP to accomplish all the call signaling functions in IP telephony. Roughly speaking, SIP is the equivalent of RAS and the Q.931-like protocol in H.323. SDP is the equivalent of H.245.

MGCP and its Variations

Gateways reside at the boundaries between and within the telephony, Internet, broadband, and wireless network infrastructures. Trunking media gateways operate at the boundary between the Internet and the PSTN, and perform the conditioning of circuit- and packet-based media streams. Other types of media gateways perform similar functions at the boundary between businesses, residences, access points, and the network Gateways may be decomposed into the Media Gateway Controller (MGC), Signaling Gateway (SG) and Media Gateway (MG). The SG acts as the relay for signaling information between the Internet and PSTN. The "intelligent" Media Gateway Controller maintains the overall call state and controls the "dumb" Media Gateway by sending commands via the Media Gateway Control Protocol (MGCP), a fairly simple protocol consisting of a small set of messages and procedures. The MG executes the commands given by the MGC to control and condition the circuit and packet media streams. Currently, the intelligence controlling the Internet (in the form of name servers, gatekeepers, and media gateway controllers) and the PSTN (in the form of service control platforms and location registers) is distributed throughout the network. In the future, the access and core network intelligence will reside in softswitches and backend application servers. These softswitches will incorporate MGC functionality and control the network elements using a variety of protocols including MGCP.
MGCP and its related protocols have undergone a swift market driven evolution. Unfortunately, as a result of differing market needs the IETF, CableLabs, and the ITU have fragmented the MGCP standard into four variations: MGCP, NCS/MGCP, MEGACO, and H.248. There are a number of differences between these variations and there can even be different implementations within the individual variations. This has created confusion in the market and raised a number of interoperability and interworking issues for communications equipment manufacturers and their customers, the network operators and service providers. MGCP is driven by the IETF and supports voice connections, while NCS/MGCP is driven by the CableLabs PacketCable initiative and is designed to support multimedia connections and operate efficiently in the cable infrastructure's multipoint nvironment.
MEGACO and H.248, on the other hand, are essentially two names for the same thing. Though these started as separate initiatives, the IETF MEGACO Working Group and ITU Study Group 16 have agreed to align their activities and publish a single document for MEGACO/H.248, which promises a much richer set of capabilities, with a corresponding increase in protocol complexity. The transaction model has changed from endpoints and connections to contexts and terminations, which in turn will provide a greater deal of transaction flexibility.
MEGACO/H.248 uses text or binary messages and can support multiple actions and commands per transaction. Importantly, MEGACO/H.248 also supports improved failover procedures to provide high availability in case of equipment or network failure. In addition, MEGACO/H.248 provides easier-to-use and extensible packages to support the events, signals, and statistics necessary to control and manage the MGs. Unfortunately, these capabilities come at the expense of a command and response structure that is different from that of MGCP and NCS/MGCP.
Due to these differences, MGCP is not completely compatible with NCS/MGCP and neither of these is compatible with MEGACO/H.248. Although MGCP and NCS/MGCP have dominant market share at this point in time, new equipment and devices are being developed to support MEGACO/H.248. The incompatibilities among MGCP's different variations will require the development of gateways to translate between the installed bases of the different variations. In addition to the differences in the variations, there are differences in the individual implementations, which means each softswitch will need to support the specific command and response sets of the MGs under its control.
In the short term, interoperability problems can be solved if a single vendor supplies the MGC and MGs. This, however, is not a long-term solution. From a risk management and competitive perspective, network operators and service providers will find it essential to move towards multi-vendor solutions that will provide the flexibility to choose the best feature, performance, and cost solutions to meet their particular needs. However, true multi-vendor solutions will not be possible until more extensive and industry-wide interoperability testing is performed. The multiple standards and the limited number of interoperability tests performed to date, highlight how far the industry still must go to ensure the long-term success of MGCP and the decomposed network architecture.

Audio CODECs

Voice channels occupy 64 Kbps using PCM (pulse code modulation) coding when carried over T1/E1 links. Applications such as VoIP are not always used over links with bandwidth of 64Kbps or higher. Thus, to allow for a smooth operation of VoIP applications over links with low bandwidth, compression techniques were developed allowing a reduction in the required bandwidth while preserving voice quality. Such techniques are implemented as CODECs. Different compression techniques can be compared using the following parameters: The following table compares popular CODECs according to these parameters:
Compression schemeCompressed rate (Kbps)Required CPU resourcesResultant Voice QualityAdded Delay
G.711 PCM64 (no compression)Non requiredExcellentN/A
G.723 MP-MLQ6.4/5.3ModerateGood (6.4)
Fair (5.3)
High
G.726 ADPCM 40/32/24LowGood (40)
Fair (24)
Very low
G.728 LD-CELP16Very highGoodLow
G.729 CS-ACELP8HighGoodLow

There is no "right CODEC". The choice of what compression scheme to use depends on what parameters are more important for a specific installation. In practice, G.723 and G.729 are more popular that G.726 and G.728.

Silence Suppression

Silence suppression takes advantage of prolonged periods of silence in conversations to reduce the number of packets even more. In a normal interactive conversation, each speaker typically listens for about half the time, so it is not necessary to transmit packets carrying the speaker's silence. Many vendors take advantage of this to reduce the bandwidth and number of packets on a link. Instead of simply transmitting packets carrying silence, a single packet can be sent to the remote destination specifying the required duration of silence that occurred. Once arrived, that packet is turned back into real silence time when being decoded.


More about the advantages and disadvantages of VOIP on VOIP - Pros and Cons