«A dissertation submitted to the Department Of Computer Science, Faculty of Science at the University Of Cape Town in partial fulfilment of the ...»
This chapter looks at the current telecommunications options, both deaf-to-hearing as well as deaf-to-deaf, available to the Deaf community, including the work still in research phase, which is not widely available yet.
2.1 Relay Services For the Deaf to use the Public Switched Telephone Network (PSTN) there needs to be a visual to audio translation phase added. One way is a Voice Relay Service (VRS), which adds a live operator to assist through translation. This can take the form of a text relay, where the Deaf user types a text message that is received by an operator, who reads out loud the message to the hearing caller on behalf of the Deaf caller, and then types out the hearing caller’s response enabling the Deaf caller to read the response .
This basic idea of an operator translating between hearing and Deaf caller can be extended to sign language through the use of a video link, for example using a webcam connected to a personal computer (PC). However these services require advanced infrastructure and qualified translators to be available 24 hours a day, an expensive proposition, resulting in these services not being universally available, and even where available being cancelled because of financial constraints .
A third option is captioned telephony in which a computer based gateway is set up to handle the translation, alleviating the need for the 24 hour availability of a live translator. The captioned telephony system uses text to speech technology to translate what the Deaf user typed to speech that is relayed through to the hearing recipient, and then uses speech recognition technology to translate the spoken response back into text to be read by the Deaf user. An example of a captioned telephony system is the South African developed Telgo323 , although the Telgo323 only worked in one direction, from text to speech, and not in the reverse direction.
There has also been research into automated sign language translation. But this is a wide ranging problem combining knowledge and technology from multiple fields including computer vision, neural networks, sign recognition methods, 3D animation and natural language processing. Not only does the system need to translate between spoken, text based language to a sign language with a very different grammatical structure, but also generate the equivalent 3D avatar animation of the gestures. In addition there is the need to recognize gestures, including facial expressions, and translate those back to spoken language .
One such project  looked at enabling communication between hearing and Deaf by sending avatar based animations obtained through automatic interpretation of text to sign language, using Multimedia Messaging Service (MMS). The usefulness of the system though is limited by it enabling communications in only one direction, from text to sign language. There are no bidirectional communications available.
One of the latest systems was developed by the Science and Technology Research Laboratories of the NHK (Japan Broadcasting Corporation) . The work focused on adapting Television Mark-Up Language (TVML) to produce Japanese Sign Language animation. TVML is a text-based computer language that enables the production of computer graphics animated video content by simply writing a script in TVML. The user is able to specify in the TVML script the words spoken, the movements and even the facial expressions.
The researchers extended the existing TVML facilities with the aim of generating sign language animation by developing high-quality computer graphics models and an improved TVML player that can render the manual movements of sign language. In addition a Japanese-to-Japanese Sign Language dictionary was developed, and in the latest work they are focusing on a way to combine optical motion captured data to generate sign language sentences. They are now able to translate a set of texts automatically into a string of sign language words. The range of sentences that can be translated is currently still limited, and the generated animation still lacks fluency as the automatic transitions between different signs is not as smooth as what would be expected from a human signer.
2.2 Deaf-to-Hearing Text Based Telecommunications Relay services prevent the Deaf user from being in direct communications with the hearing person in the conversation. There is always an intermediate translation step, be it via a live operator or an automated system.
Email has been around for a long time and is widely used, including being accepted for official and business communications. Email enables the distribution of electronic documents, as well as audio and video through attachments. But email does not enable interactive, conversational communications.
Even the cheapest cell phone supports the SMS providing easy access to affordable mobile, textbased communications. With the deep penetration of cell phones into the South African population, SMS provides the Deaf community with easy access to a large section of the community without the need for any special intervention. Yet SMS is not an effective channel for conversational communications. It is possible to receive delivery receipts, showing that the message was delivered to the recipient’s phone, but there is no way of knowing if the recipient has read the message or is busy replying to the message.
Instant messaging overcomes some of the shortfalls of using SMS and email, enabling near synchronous text based communications. Tucker  describes the unique advantages of the instant messaging system. With good connectivity and both users actively involved instant messaging can appear synchronous, while at the same time allowing the communications to be temporarily or even extensively interrupted. Delays are more tolerated in an instant messaging environment where true synchronous communications are not expected.
2.3 Deaf-to-Deaf Text Based Telecommunications From a technical perspective, the purely text based telecommunications options are the simplest to implement within the Deaf community.
In South Africa the Teldem device , designed especially for people with hearing difficulties, was available from Telkom. The latest Telkom tariff list (1 August 2011), lists the Teldem service as no longer available, with rental of the device only available to existing customers . This device is a portable text telephone, with a QWERTY keyboard and alphanumeric display, which can be connected to any telephone and can communicate point-to-point with any other Teldem or TTY terminal. The major drawback of the Teldem is the fact that the Teldem can only exchange text with another Teldem device.
In addition, there are a wide range of generic internet and cell phone based text communications solutions used widely by the hearing community, as was shown earlier in this document, also available to the Deaf community. These solutions, such as email, SMS and instant messaging services, such as Skype and MXit, are as usable for Deaf-to-Deaf communications as for Deaf-toHearing communications and have the same advantages and disadvantages.
For the Deaf to communicate through text is forcing them to communicate in a second language, putting them at a disadvantage.
2.4 Digital Video The digital video camera, like the one inside a cell phone, consists of a lens that focuses an image of the world onto a light sensitive electronic chip. This is the same setup as in any digital stills camera.
To enable the capture of movement a sequence of still images, or frames, are captured one after the other in rapid succession. If this sequence of still images are then displayed on a screen, one after the other and the number of frames per second is not too low, the brain perceives smooth, realistic motion.
To store this sequence of still images that were captured, a wide variety of digital video formats have been developed. A video format refers to, among other things, how many pixels form an image, how may frames were recorded per second, how colour was recorded and how the video information was compressed. These formats will now be discussed further, as well as definitions of video terms.
2.4.1 Video Resolution A digital video consists of multiple images or frames. Each frame formed by a rectangular grid of pixels, or picture element, each representing the colour of that specific part of the image. Phase alternating line (PAL) standard definition digital versatile disc (DVD) would have images consisting of 576 horizontal lines of 720 pixels each, giving the image, often written as 720 x 576.
High definition video for example has a resolution of 1920 x 1080 (Each image being 1920 pixels wide by 1080 pixels high) .
The more pixels there are in the image, the higher the resolution of the image, and the clearer and sharper the picture. As the resolution is reduced the fewer details are captured, and the overall fuzziness of the image will increase.
2.4.2 Video Frame Rate The video frame rate is the number of still images, or frames, captured, stored and displayed per second. The number of frames recorded each second affects how motion appears on the screen.
At lower frame rates motion artefacts are introduced into the video. If an object moves across the screen quickly it will be blurred while it is in motion. The motion is not perfectly continuous, and can seem to jump or stutter from one position to the next, as the object moves a bigger distance between frames than at a higher frame rate.
2.4.3 Colour Depth Every colour that the human eye sees is a mix of red, green and blue light in different proportions.
The sensor inside the digital camera also measures the relative amounts of red, green and blue light in the image. In single-chip colour cameras this is accomplished through tiny red, green and blue filters over individual pixels.
Colour depth refers to the number of bits used to represent the colour of a single pixel. The more bits used the broader the range of distinct colours that can be represented and stored, and the more accurately the image is represented.
2.4.4 Data rate or Bit rate For a video file the bit rate refers to the number of bits used per unit of playback time after data compression, if any. Standard resolution DVD video contents for example have an average bit rate of 3.8 megabits per second (Mbps). That is 3800 kilobits of data stored per second of video. Values range from heavy Motion Picture Experts Group (MPEG) MPEG-2 compression of 2 Mbps to highquality compression of 6 Mbps .
The actual data rate of digital video contents depends in part on the size of the frame, frame rate, as well how much the video is compressed (if at all) before it is recorded. The higher the video resolution and frame rate, the more data is captured per second and the higher the bit rate of the corresponding video data stream. If the data rate is limited, video quality has to be sacrificed at higher resolutions and frame rates, to keep to the specified data rate. Either the compression ratio has to be increased, adding compression artefacts, or frames will have to be dropped, either way the visual quality of the video will drop to try and keep within the limitations.
2.4.5 Video Compression The higher the resolution and frame rate, the more digital data has to be captured, stored and transmitted. This increases the cost of working with the video because that requires big storage devices, and high-speed connections for distribution. To balance the need for high quality video on the one side, and cost of high bandwidth data on the other we have digital compression.
Digital compression aims to shrink the video data down to a smaller size while maintaining picture quality. This means the same video material takes up less storage space, and can be transmitted over the same connection in less time, and thus at lower cost. To watch the video it has to be decompressed, and the objective is to have to decompressed video look as closely to the original uncompressed video material as possible. Compression schemes are called codecs (coder/decoder).
With lossless compression the decompressed video frames are identical to the original frames before compression. Lossy compression, on the other hand, throws information away during the compression process, and it is impossible ever to restore the original frames as they were before compression. By taking into account human perception, the requirements for exact reconstruction can be relaxed. A picture, or in this case one frame of the video, may contain more detail than the human eye can perceive, and by dispensing with this extraneous data the picture can be degraded without the user noticing and in the process less storage is needed for the picture. Almost all codecs make use of lossy compression. It is possible with some codecs to adjust the amount of compression, but usually the heavier the compression the worse the compressed video looks .
All video codecs start by compressing each individual video frame. This is called intraframe or spatial compression. Each frame is compressed/decompressed on its own, independent from the frames before and after it, speeding up the compression/decompression process. Intraframe compression becomes less efficient the more complex and detailed the picture becomes. MotionJPEG (M-JPEG) is an example of an intraframe video codec, adapting the Joint Photographic Experts Group (JPEG) algorithm used for lossy compression of still images, for compressing motion video. Each frame of the M-JPEG compressed video sequence is a self-contained compressed picture, achieving compression ratios ranging from about 2:1 to about 20:1.