«A dissertation submitted to the Department Of Computer Science, Faculty of Science at the University Of Cape Town in partial fulfilment of the ...»
Some codecs also take into account the fact that video frames are interrelated in time. Interframe or temporal compression looks a set of frames over time and finds ways to remove repetitive information that is similar between consecutive video frames. Often very little changes from frame to frame. Interframe compression works by looking at a group of frames, the first frame, or key frame, is stored in full, but for the subsequent frames the codec only stores the differences between the frame and its predecessors. Interframe compression becomes less efficient the more motion is present in the video. Relatively static video sequences have the best temporal compression efficiency. In for example the MPEG standard, an initial, self-contained picture provides the starting point from which following frames can be encoded by looking at pixel differences between successive frames. The MPEG standard includes the original MPEG-1 standard, which was superseded by the MPEG-2 standard used in DVD discs, as well as the MPEG-4 standard used in Blu-ray discs .
Returning to bit rates for a moment, for some codecs the same amount of data is stored for every frame, regardless of motion and details. This is constant bit rate (CBR) compression. Other codecs allow the bit rate to adjust depending on the shot. These are variable bit rate (VBR) codecs, and allows for higher bit rates during shots that are complex or active, and reduce the bit rate for static less complex shots. VBR compression, though, requires more processing power during the compression of the video.
A wide variety of proprietary and standardised video compression algorithms have been developed over the years; the most important of these are published by recognised standardisation bodies, such as the International Telecommunication Union (ITU) and the Motion Picture Expert Group.
The ITU H.263 video codec was developed by the ITU Telecommunication Standardization Sector (ITU-T) Video Coding Experts Group for use as low-bit rate compressed format for video conferencing. The standard was further improved by the H.263+ and H.263++ standards approved in 1998 and 2000 respectively. H.263+ added optional features to improve compression efficiency and allow for quality, bitrate, and complexity scalability .
MPEG-4 Part 2 is H.263 compatible, and partially based on ITU-T H.263. It is similar to previous standards such as MPEG-1 and MPEG-2. DivX is an example an implementation of this standard. Most often reference to MPEG-4 refers to MPEG-4 Part 2 Simple Profile .
MPEG-4 Part 10, also known as MPEG-4 AVC (Advanced Video Coding) or H.264, is widely used in such applications as Blu-ray discs and direct broadcast satellite television services. It was designed as a standard to provide good video quality at substantially lower bit rates than previous standards, such as MPEG-2, H.263, or MPEG-4 Part 2.
2.4.6 Video container formats A video file, for example an Audio Video Interleave (AVI) file or MP4 file is just a container format. The container format only defines how to store information inside them, and not what kinds of data are stored. A video file usually contains multiple tracks, a video track without audio, one or more audio tracks (without video), and multiple subtitle tracks and so on. The tracks are usually interrelated enabling the synchronisation of the different media tracks.
3GP is a multimedia container format used on third generation (3G) mobile phones and stores video streams as MPEG-4 Part 2 or H.263 or MPEG-4 Part 10 (AVC/H.264), and audio as Advanced Audio Codec (AAC) or Adaptive Multi-Rate (AMR). Most 3G capable phones support the recording and playback of video in 3GP format. The file extension is either.3gp for GSM-based phones or.3g2 for Code Division Multiple Access (CDMA) based phones .
The MPEG-4 Part 14 file format is a multimedia container format designed as part of the MPEG-4. It is based on the QuickTime format specification, and in addition to audio and video streams can store other data such as subtitles and still images. The file extension used is.mp4 .
Table 2-3: Examples of supported encoding profiles and parameters on the Android platform.
The MediaRecorder class is used to record audio and video. The developer has control over the bit rate, video frame rate and the video resolution. On devices that have auto-frame rate the specified frame rate will be taken as the maximum frame rate and not a constant frame rate. The specified bit rate might be clipped to ensure that video recording can proceed smoothly based on the capabilities of the platform .
iOS The iPhone operating system provides the least flexibility. A predefined collection of presets are made available, shown in Table 2-4.
The resolution and bit rate for the output depend on the capture session’s preset. The video encoding is typically H.264 and audio encoding AAC. The actual values vary by device, as illustrated in the following table.
In iOS 4.0 and later, you can record from a device’s camera and display the incoming data live on screen.
You use AVCaptureSession to manage data flow from inputs represented by AVCaptureInput objects (which mediate input from an AVCaptureDevice) to outputs represented by AVCaptureOutput .
2.4.8 Real-time mobile video communications challenges To provide real-time Sign Language video communications using mobile phones one needs to overcome three main challenges, namely low bandwidth, low processing speed and limited battery life.
For a mobile video conversation to happen, the video data has to be sent from one phone to the other, transmitted over the cellular network. The quality of the video and the time to transmit the data is thus limited by the speed at which the required data can be sent and received by the cell phone. Mobile bandwidth capacity is improving, and in South Africa we are in the privileged situation that our networks are still relatively young and based on modern technology, with continued aggressive expansion and upgrading of the network infrastructure. The networks generally support speeds of 7.2 Mbps and 14.4 Mbps, with speeds of 42 Mbps possible in select areas. But a high speed network does not guarantee bandwidth. The two phones used in this research, the Nokia N96 and Vodafone 858 Smart for example only support 3.6 Mbps communications, that is 3.6 Mbps while receiving data, and only 384 kbps when transmitting.
Meanwhile depending on network, location and signal strength the user might be limited to only GPRS (32 – 48 kbps) or EDGE (maximum 384 kbps) speeds .
The need for portability in a cell phone means limits in available battery capacity to power the processor used in the cell phone. The Nokia N96 cell phone is powered by a dual core 264 MHz processor and 128 MB of RAM , while the more recent but entry level Vodafone 858 Smart is powered by a 528 MHz processor . Neither is anywhere near as powerful as the processors used in current laptop computers.
The low processing power available on a cell phone limits the use of the cell phone as a video communications device in two ways. First limited processing power limits the video resolution and amount of compression that can be handled by the processor before the processing of the video starts introducing delays and affecting the intelligibility of the video. Secondly recording, compressing and decompressing video requires intensive use of the processor in the mobile phone.
In addition to the processor the energy stored in the battery is further drained by the backlit screen as well as the data connection to the cellular network. All of this adds up to a very negative scenario for battery life.
The Nokia N96 cell phone for example has a stand-by time on 3G of 200 hours, a talk time on 3G of 160 minutes (2 hours and 40 minutes), and lists an offline video playback battery life of 5 hours . For sign language video communications we do not want to only playback video, we want to record, compress, transmit, receive and play back video all in real time. In the end the battery is the most limiting constraint in mobile video communications.
2.4.9 Sign language specific video compression techniques Some studies have been done to attempt to overcome the above limitations. Sperling et al.  provides a good overview of early attempts at compressing American Sign Language (ASL) images, and evaluates three basic image transformations for intelligibility, namely gray-scale subsample transformations, two-level intensity transformations converting the grey scale images to black and white images, and lastly taking it even further by converting the images into black and white outline drawings.
The goal of the MobileASL  project running at the University of Washington is to enable Deaf people to use mobile phones for communicating in Sign Language in real-time. Several H.264 compliant encoders were developed in an effort to lower the required resources while at the same time maintaining ASL intelligibility.
Variable frame rate was used to save processing cycles and battery life. By automatically detecting when the user is signing, the frame rate is adjusted on the fly, from the highest possible frame rate when the user is signing, down to 1 frame per second while the user is not signing.
Earlier research , found through the use of eye-movement tracking experiments, that Deaf people fixate mostly on the facial region of the signer to pick up the small movements and details in the facial expression and lip shapes of the signer. Peripheral vision is used to process the larger body and hand movements of the signer. The research concluded that increasing compression quality in the important regions of the video image, may improve bandwidth usage, as well as the quality of the sign language video as perceived by the Deaf.
Using these findings the MobileASL project approached the limited bandwidth problem by using dynamic skin region-of interest encoding. This meant that the visible skin areas of the video image was compressed at a higher quality at the expense of the remainder of the frame.
2.5 Synchronous and asynchronous communication Synchronous communication is direct, live communication where all participants involved in the communication are present at the same time and respond to each other in real time. Examples of synchronous communication are telephone conversations and instant messaging. On the other hand, asynchronous communication does not require all participants to be present at the same time, such as email messages, discussions boards and text messaging over cell phones, and there can be a delay between when a person receives a message and a response is sent.
2.5.1 Synchronous Video over Internet There are various freely available synchronous video solutions accessible to anyone with a PC, webcam and network connection, tools such as Skype and CamFrog. Because of network constraints video quality is limited. Skype video quality for example was found to be sufficient when used over a local area network (LAN), but not satisfactory over a 512kbps Asymmetric Digital Subscriber Line (ADSL) link .
Third generation cell phone network does support video calls, but these calls are limited in resolution and frame rate, and are primarily designed to support spoken communications, and not as a primary communications channel.
2.5.2 Asynchronous Video over Internet As seen above, in low bandwidth environments synchronous video communications is only possible by reducing video quality or specialised compression. An ITU application profile  details the requirements for sign language video communications at a minimum common intermediate format (CIF) resolution (352 x 288 pixels) and a frame rate of at least 25 frames per second. In addition the video needs to cover enough area to include the detailed movements of the signer. These requirements can be met by modern video codecs, but only at high bit rates. When the bit rate falls below 200 kilobits per second, the picture quality needs to be sacrificed, video size reduced and frame rate dropped. This leads to the Deaf user of the system to compensate for these problems, by for example slowing down signing and exaggerating movements. Even the improved efficiency of the H.264 codec may not provide acceptable video quality for sign language communications at low bit rates .
As mentioned earlier, delays are more tolerated in an instant messaging environment while at the same time allowing the communications to appear synchronous when both users are actively involved . This provides the opportunity to use asynchronous video at higher quality, lessening the limitation in video quality at limited bit rates.
Ma and Tucker (2007)  found that asynchronous video over internet protocol (IP) was a valid solution; offering improved video quality regardless of bandwidth constraints. Although there still were issues to consider, such as reducing the inherent transmission delays, as well as improving the user interface to provide feedback to the user to alleviate these delays in the conversation.
Follow up research  focused on improving these issues, settling on the x264 video codec to provide low latency, high quality video for asynchronous video telephony. The research focused on finding the optimal settings to be used with the x264 codec to provide fast compression, small resulting file size to minimize transmission time and high quality playback with less complicated calculations. Further the user interface was simplified, as well as incorporating better notification of events to the user.
2.6 Sign Language video quality requirements Which brings us to the question: What are the quality requirements for the capture and transmission of Sign Language?
2.6.1 ITU specifications For the successful use of video for telecommunications via a visual language, such as Sign Language, certain quality requirements must be met. Currently the minimum quality requirements for Sign Language and lip-reading video material are specified in the ITU-T Series H Supplement 1 (05/99)  document, released by the ITU.
The ITU is the United Nations Specialized Agency in the field of telecommunications. The ITUT on its part is a permanent organ of the ITU, responsible for studying technical, operating and tariff questions issuing Recommendations on them with a view to standardizing telecommunications on a worldwide basis.