Intel® IPP: Implementing Wideband Codec VoIP Solutions

Author: Intel® Software Network
Published On: Wednesday, October 04, 2006 | Last Modified On: Wednesday, May 07, 2008
Introduction
By Karthik Krishnan

Discover how Intel® Integrated Performance Primitives (Intel® IPP) can provide the building blocks to develop a VoIP application with advanced features. Get the building blocks to build a complete softphone application.

Voice-over Internet Protocol (VoIP) is revolutionizing the telecommunications industry by merging voice and data onto one IP network. Intel offers a range of products, services, and building blocks to enable VoIP solutions over various domains. Intel® Integrated Performance Primitives (Intel® IPP) is a software library that offers various highly optimized functions including multimedia and speech codecs. This article provides reference points on using Intel® IPP for speech codecs along with a complete implementation of a VoIP softphone. The sample application has been built using Windows Sockets* for network communication, DirectSound* for audio capture and playback, and Wide Band Codecs (GSM-AMR adaptive multi-rate) using Intel IPP.


Intel® Integrated Performance Primitives
Intel IPP is a highly optimized cross-platform library that includes various functionalities related to multimedia and communication software. The G.168, G.167, G.711, G.722, G.722.1, G.722.2, AMRWB, G.723.1, G.726, G.728, G.729, GSM-AMR, and GSM-FR are international standards promoted by International Telecommunication Union (ITU)*, European Telecommunications Standards Institute (ETSI)*, 3GPP* and other organizations. Below is a list of speech coding samples built with Intel Integrated Performance Primitives as the building blocks that are bit-exact with the standard.

Speech Coding Samples Windows* Linux*
G.722.1
GSM/WMR WB / G.722.2
G.723.1
G.726  
G.728
G.729
GSM-AMR
GSM-FR  
Note that implementations of these standards or the standard-enabled platforms may require licenses from various entities, including Intel Corporation. This paper uses ITU GSM-AMR (adaptive multi-rate) as the reference codec to be used during VoIP call.

Linkage Models
Intel IPP provides various mechanisms to link an application code with the library such as Static Linkage, Dynamic Linkage, and Automatic Dispatching. For detailed information please refer to Linkage Models (PDF 231 KB). The softphone application included (see link in Additional Resources section) uses dynamic linkage with automatic dispatching.

GSM-AMR
GSM-AMR has 16-bit per sample, 16 KHz sampling rate and supports various output bit rates (6.6kbps, 8.85kbps, etc.) The table below lists all the supported bit rates available from Intel IPP and the corresponding output size per frame (i.e. 20ms audio input in 600 bytes).

Frame Type GSM AMR-WB (bitrate in kbps) Output bits per frame
0 6.6 132
1 8.85 177
2 12.65 253
3 14.25 285
4 15.85 317
5 18.25 365
6 19.85 397
7 23.05 461

Unified Speech Codec APIs
The speech codec samples use Intel IPP as building blocks and include a complete implementation of all the supported codecs that are bit-exact per the standard. The sample code in Intel IPP 5.0 also includes a unified approach that facilitates the integration of all the codecs. The following section provides some pointers on using the Unified Speech Codec (USC) approach to integrate the encoding and decoding functionality of GSM-AMR codec.

USC Initialization APIs
#ifdef __cplusplus
extern "C" {
#endif
extern USC_Fxns USC_AMRWB_Fxns;
#ifdef __cplusplus
}
#endif
//USC_xxx_Fxns is the template for all codecs
static USC_Fxns *USC_Codec_Fxn = &USC_AMRWB_Fxns; 
static int nbanksEnc = 0,nbanksDec = 0;
static USC_MemBank* pBanksEnc = NULL;
static USC_MemBank* pBanksDec = NULL;
static USC_Handle hUSCEncoder;
static USC_Handle hUSCDecoder;
static USC_CodecInfo pInfo;


//This will allocate memory and initialize the AMR-WB encoder/decoder 
//handles. 

int InitializeCodec(int bitrate)
{
	int i;
	FreeCodecMemory();

	ippStaticInitBest(); //choose the most optimized code  

	
	/* Get the Gxxx codec info */

	if (USC_NoError != USC_Codec_Fxn->std.GetInfo(
							(USC_Handle)NULL,    &pInfo))
		return -1;

	/*
	encoder instance creation
	*/
	pInfo.params.direction = 0;             /* Direction: encode */
	pInfo.params.modes.vad = 0;  /* Suppress a silence compression */
	pInfo.params.law = 0;                    /* Linear PCM input */
	pInfo.params.modes.bitrate = bitrate;
	
	/* Learn how many memory block needed  for the encoder */
	if(USC_NoError != USC_Codec_Fxn->std.NumAlloc(
							&pInfo.params, &nbanksEnc))
		return -1;

	/* allocate memory for memory bank table */
	pBanksEnc = (USC_MemBank*)malloc(sizeof(USC_MemBank)*nbanksEnc);

	
	/* Query how big has to be each block */
	if(USC_NoError != USC_Codec_Fxn->std.MemAlloc(
							&pInfo.params, pBanksEnc)) 
		return -1;


	/* allocate memory for each block */
	for(i=0; i<nbanksEnc;i++)
	{
		pBanksEnc[i].pMem = (char*)malloc(pBanksEnc[i].nbytes);
	}

	/* Create encoder instance */
	if(USC_NoError != USC_Codec_Fxn->std.Init(
				&pInfo.params, pBanksEnc, &hUSCEncoder)) 
		return -1;



	/*
	decoder instance creation
	*/
	pInfo.params.direction = 1;              /* Direction: decode */


	/* Learn how many memory block needed for the decoder */
	if(USC_NoError != USC_Codec_Fxn->std.NumAlloc(
					  &pInfo.params, &nbanksDec))
		return -1;


	/* allocate memory for memory bank table */
	pBanksDec = (USC_MemBank*)malloc(sizeof(USC_MemBank)*nbanksDec);


	/* Query how big has to be each block */
	if(USC_NoError != USC_Codec_Fxn->std.MemAlloc(&pInfo.params, pBanksDec))
		return -1;


	/* allocate memory for each block */
	for(i=0; i<nbanksDec;i++)
	{
		pBanksDec[i].pMem = ( char*)malloc(pBanksDec[i].nbytes);
	}
	/* Create decoder instance */
	if(USC_NoError != USC_Codec_Fxn->std.Init(
		  &pInfo.params, pBanksDec, &hUSCDecoder))
		return -1;

	return 1;
}	  
      
USC Encode API
/* assumes initialization is complete. Sample softphone does not change the bit rate once the VoIP call has been initiated. It is straightforward to modify to support variable bit rate per frame. */

int EncodeOneFrame(char *src,char *dst) //rate already decided
{
	USC_PCMStream in;
	USC_Bitstream out;


	in.pBuffer = src;
	out.pBuffer = dst;

	in.bitrate = pInfo.params.modes.bitrate;
	in.nbytes = pInfo.framesize;
	in.pcmType.bitPerSample = pInfo.pcmType.bitPerSample;
	in.pcmType.sample_frequency = pInfo.pcmType.sample_frequency;

	
	/* Encode a frame  */
	if(USC_NoError != USC_Codec_Fxn->std.Encode (hUSCEncoder, 
&in, &out)) 
	{
		DebugBreak(); //should not happen
		return -1;
	}

	return out.frametype;
}      
USC Decode API
int DecodeOneFrame(char *src,char *dst,int frametype) 
{
	USC_Bitstream in;
	USC_PCMStream out;

	in.pBuffer = src;
	in.frametype = frametype;//RX_SPEECH_GOOD ;
	in.bitrate = pInfo.params.modes.bitrate;
	/* EvaluateEncodedByteSize should return the output byte size for 
	   bitrate supported. Note that it should be rounded to the 
	   nearest short boundary 
	*/
	in.nbytes = EvaluateEncodedByteSize(in.bitrate);
	
	out.pBuffer = dst;
	out.pcmType.bitPerSample = pInfo.pcmType.bitPerSample;
	out.pcmType.sample_frequency = pInfo.pcmType.sample_frequency;
	out.bitrate = pInfo.params.modes.bitrate;


	if(USC_NoError != USC_Codec_Fxn->std.Decode (
							hUSCDecoder, &in, &out))
		return -1;


	return out.nbytes;

}  
      

Audio Capture/Playback
The encoder takes 16-bit Linear PCM data input which is the pure, uncompressed binary code representation of the value of an analog signal (e.g. voice) after digitization. The decoder takes the compressed data from the encoder as input and outputs the raw PCM file. This section explains using DirectSound* to capture and play back raw PCM files at the desired sampling frequency (16KHz for GSM-AMR).

Microsoft DirectSound provides various APIs to capture and play audio content. The softphone application enclosed uses the sample code available from Microsoft Platform SDK for audio capture and playback. This section explains high-level implementation details.

Audio capture works by creating a circular buffer to hold the captured audio data in raw PCM format. The sampling rate, bit size per sample (16 KHz, 16-bit), and the total size of capture buffer are set and allocated during the initialization phase. DirectSound also provides a way to trigger event objects every time a certain amount of audio data gets captured in the buffer. Audio capture is typically handled in a separate thread, and the audio extraction thread periodically waits on these event objects to extract the captured audio data. The following provides the control flow of audio capture using DirectSound.

DirectSound* Audio Capture Thread Control Flow



Audio Data Extraction Thread Control Flow
The captured audio data needs to be extracted periodically (typically every frame) to be passed on to the encoder. The extraction functionality could be periodically executed (for example, every 20ms) using timeSetEvent() API. The following provides an overview of the extraction functionality.



Note that extracting the raw PCM data is done in two phases since the buffer is circular, and the captured data might have wrapped around the end of the allocated memory. The capture and extraction functionality run in separate threads. It is also possible that some of these notification events may not be waited upon and could be "missed." In such cases, the corresponding PCM data is extracted on the next signal.


Audio Playback Control Flow
The playback code works in a way similar to that of the audio capture. The playback buffer needs to be filled with the raw PCM audio content from other speakers. Once the encoded packets are received, they are passed on to the decoder to extract the source PCM data. The playback buffer is locked, and the PCM data copied to the buffer and played. Jittering effects could be taken into account by maintaining a threshold before playing the PCM data.





Network Layer
The sample application uses Windows Sockets using TCP/IP as the network transport to transfer the voice packets between nodes. The softphone allows multi-user conferencing and supports GSM-AMR codec with various bitrates. The initiator (or the host) functions like a server, listening to all the incoming calls at a specified port. The other VoIP speakers connect to the host at a specified port. The host waits for the incoming connections until it times out. The host broadcasts the IP addresses of all the connected nodes after the time-out period. A star network connecting all the VoIP participants with each other is established after this.


Putting it Together
The following flow chart provides the high-level implementation details of the complete application.



The code was developed using C++ on an Intel® Pentium® M processor-based system with Intel® Centrino® mobile technology running the Windows XP Professional* operating system. The UI of the softphone is shown below.


Fig 1. User Interface of the softphone application


Summary
The white paper discussed here demonstrates how to use the Intel® Integrated Performance Primitives to develop a VoIP application. The intention has been to provide building blocks that could be used to build a complete softphone application with advanced features. VoIP features such as Jitter Buffering, Frame Compaction and RTP transport have not been discussed, but given the framework provided by the sample code, it may be useful to experiment with these next.


About the Author
Karthik Krishnan is an applications engineer working for Intel's Software and Solutions group. He joined Intel in 2001 and has been working with various software vendors to optimize their products on Intel® Mobile and desktop platforms. Prior to joining Intel, he has worked for Fluent Inc. as a software developer dealing with parallel programming.


Additional Resources
Download Information
  • Free evaluation copies of Intel IPP can be downloaded here.
  • The speech codec samples are available for download here.
  • The demo source code is available for download here [ZIP 18MB].



Post a comment If you have any questions, please contact our support team.