«A dissertation submitted to the Department Of Computer Science, Faculty of Science at the University Of Cape Town in partial fulfilment of the ...»
Video quality requirements for South African Sign Language
communications over mobile phones
A dissertation submitted to the Department Of Computer Science, Faculty of Science at the
University Of Cape Town in partial fulfilment of the requirements for the degree of
Master of Science (in Information Technology).
Prof Edwin H. Blake
Department of University of
Computer Science Cape Town ii Acknowledgements This dissertation could not be completed without the help and support of the following people. I would like to take this opportunity to thank them for their help with this dissertation.
Thank you to my supervisor Prof. Edwin Blake for his guidance, support and patience throughout this research.
My thanks to my mother, father, sister and Marchelle for their continued support and encouragement, especially when it felt as if this dissertation will never get finished.
I would also like to thank everyone at DCCT, especially Meryl Glaser, as well as all the participants that gave me their time and valuable feedback. Lastly thank you to Michelle Lombard and Betty Mokoena for their assistance as Sign Language interpreters.
iii iv Abstract The Deaf community in South Africa currently can make use of the mobile communications networks through text-based means only, using services such as SMS, MXit and email. But this robs the Deaf from the opportunity to communicate in their first language, Sign Language. Sign Language, being a visual language, does not translate well to the text-based communications available to them. To enable the Deaf community to also share in the use of the mobile communications infrastructure means mobile video communications. With increasing network speeds, more affordable bandwidth and more capable and affordable mobile phones this is becoming a reality. This project aims to find the minimum video resolution and frame rate that supports intelligible cell phone based video communications in South African Sign Language.
v Table of Contents Acknowledgements
Table of Contents
List of Tables
List of Figures
1.2 Aims and Expected Outcomes
1.3 Dissertation Outline
2 Background and related work
2.1 Relay Services
2.2 Deaf-to-Hearing Text Based Telecommunications
2.3 Deaf-to-Deaf Text Based Telecommunications
2.4 Digital Video
2.4.1 Video Resolution
2.4.2 Video Frame Rate
2.4.3 Colour Depth
2.4.4 Data rate or Bit rate
2.4.5 Video Compression
2.4.6 Video container formats
2.4.7 Cell phone video capture support
2.4.8 Real-time mobile video communications challenges
2.4.9 Sign language specific video compression techniques
2.5 Synchronous and asynchronous communication
2.5.1 Synchronous Video over Internet
2.5.2 Asynchronous Video over Internet
2.6 Sign Language video quality requirements
2.6.1 ITU specifications
2.6.2 Subjective and Objective evaluation of video quality
3 Pilot user study (Experiment 1)
3.2.1 Video Resolution
3.2.2 Video Frame Rate
3.2.3 Video Compression
3.3.2 Experimental setup
3.3.3 Cell phones
3.3.4 Video clips
4 Follow-up Pilot User Study (Experiment 2)
vi 4.2.2 Experimental Setup
4.2.3 Cell phones
4.2.4 Video clips
5 Intelligibility Study (Experiment 3)
5.2.2 Experimental Setup
5.2.3 Cell phones
5.2.4 Video clips
6.3 Future work
Appendix A Experiment 1
A.2 Experiment 1 Questionnaire captures
Appendix B Experiment 2
B.2 Experiment 2 Questionnaire captures
Appendix C Experiment 3
C.2 Experiment 3 Questionnaire captures
vii List of Tables Table 2-1: Video settings supported by the QtMultimediaKit library on Nokia phones.
Table 2-2: Core video and codec support of the Android platform
Table 2-3: Examples of supported encoding profiles and parameters on the Android platform.......... 9 Table 2-4: iOS capture session presets
Table 3-1: Experiment 1 – Video clip specification.
Table 3-2 : Statistical analysis for the intelligibility measures of Experiment 1.
Table 4-1: Experiment 2 - Video clip specifications
Table 4-2 : Statistical analysis for the intelligibility measures of Experiment 2.
Table 5-1: Experiment 3 - Video clip specifications
Table 5-2: Statistical analysis for the intelligibility measures of Experiment 3.
Table A-1: Experiment 1 – Captured questionnaire A
Table A-2: Experiment 1 – Captured questionnaire B
Table A-3: Experiment 1 - Captured questionnaire C
Table A-4: Experiment 1 - Captured questionnaire D
Table A-5: Experiment 1 - Captured questionnaire E
Table A-6: Experiment 1 - Video clip details
Table B-1: Experiment 2 - Captured questionnaire A
Table B-2: Experiment 2 - Captured questionnaire B
Table B-3: Experiment 2 - Captured questionnaire C
Table B-4: Experiment 2 - Captured questionnaire D
Table B-5: Experiment 2 - Captured questionnaire E
Table B-6: Experiment 2 - Captured questionnaire F
Table B-7: Experiment 2 – Video clip details
Table C-1: Experiment 3 - Captured questionnaires.
Table C-2: Experiment 3 – Video clip details
viii List of Figures Figure 3-1: A Nokia N96 cell phone.
Figure 3-2: Example frame from sign language video clip.
Figure 3-3: Qualitative results for Question 2.
Figure 3-4: Qualitative results for Question 3.
Figure 3-5: Qualitative results for Question 4.
Figure 3-6: Qualitative results for Question 5.
Figure 3-7: Qualitative results for Question 6.
Figure 3-8: Overall mean participant response and across all questions.
Figure 3-9: Letterboxed video frame, as used in Experiment 1.
Figure 3-10: Cropped video frame, as should have been used in Experiment 1.
Figure 4-1: Qualitative results for Question 2.
Figure 4-2: Qualitative results for Question 3.
Figure 4-3: Qualitative results for Question 4.1.
Figure 4-4: Qualitative results for Question 4.2.
Figure 4-5: Qualitative results for Question 4.3.
Figure 4-6: Qualitative results for Question 4.4.
Figure 4-7: Qualitative results for Question 4.5.
Figure 4-8: Overall mean participant response across all questions.
Figure 5-1: A Vodafone 858 Smart
Figure 5-2: The qualitative results for Question 14
Figure 5-3: The qualitative results for Question 18
Figure 5-4: The qualitative results for Question 3
Figure 5-5: The qualitative results for Question 8
Figure 5-6: The qualitative results for Question 10
Figure 5-7: The qualitative results for Question 12
Figure 5-8: The qualitative results for Question 5
Figure 5-9: The qualitative results for Question 9
Figure 5-10: The qualitative results for Question 4
Figure 5-11: The qualitative results for Question 7
Figure 5-12: The qualitative results for Question 16
Figure 5-13: The qualitative results for Question 17
Figure 5-14: The qualitative results for Question 6
Figure 5-15: The qualitative results for Question 15
Figure 5-16: The qualitative results for Question 19
Figure 5-17: The qualitative results for Question 11
Figure 5-18: The qualitative results for Question 13
Figure 5-19: The qualitative results for Question 2
Figure 5-20: Estimated marginal means across all questions.
x 1 Introduction
1.1 Background According to the National Institute for the Deaf there are just over 400 000 profoundly deaf people and just over 1 200 000 extremely hard-of-hearing people in South Africa . Sign Language is the first language for people who were born deaf or became deaf before acquiring language, and as such is the language wherein they can communicate best. The Deaf sees themselves as a cultural group with their own language. Bilingualism is encouraged, especially for the Deaf to become part of the wider community. The second language, such as Afrikaans, English or Xhosa is learned mainly as a reading and writing language, while basic speech is learned to complement signs in communicating with hearing persons.
Sign Language is a visual form of communication, conveying meaning through a combination of hand shapes, movement of the hands and arms, in addition to facial expressions. The majority of the signs in sign language are formed in a “signing space”, which includes the signer’s head and chest, extending down to the hips. The grammar of Sign Language is markedly different from that of spoken languages and hence the written text of many Deaf users is often not grammatically correct .
The visual nature of Sign Language is not well supported by modern mobile communication, which is based primarily on voice communication and in increasing amounts on text based (written language) communication through services such as Short Message Service (SMS) and email.
Mobile text based communications are an option for the Deaf community. It is already implemented and supported by even the cheapest cell phone on the market. But for a Deaf person to communicate with another Deaf person through text would be the equivalent of two Afrikaans first language speakers being forced to speak English to each other when using a cell phone. Why must a person be forced to communicate in a second language?
The third generation cell phone networks support video calls, but these calls are limited in resolution and frame rate, and are primarily designed to support spoken communications, and not video as a primary communications channel.
This research work aims to assist in bringing mobile communications to the Deaf community by determining the minimum video quality, frame rate and resolution needed for South African Sign Language (SASL) video material playback on a cell phone to be still intelligible in a conversational context.
Throughout this research real users were used. The experimental participants were all native signers and have used SASL as their principal mode of communications most, if not all, of their lives and had English, Afrikaans or Xhosa as their language of literacy, regardless of what their hearing families used. The experimental work was completed with the assistance of The Deaf Community of Cape Town (DCCT), a grassroots non-governmental welfare organization (NGO) founded in 1987 and run by Deaf people to serve the needs of the disadvantaged Deaf community in Cape Town. They are based at the Bastion of the Deaf in Newlands, Cape Town.
Multiple studies were conducted with the help of the Deaf community to evaluate sign language videos, viewed on a cell phone, for intelligibility. Various SASL video sequences were shown to the long time SASL users at different video resolutions and frame rates with each clip being evaluated for intelligibility.
1.2 Aims and Expected Outcomes In giving the South African Deaf community access to the telecommunications infrastructure and helping members of the community communicate in their first language, cell phones could play a very important role providing affordable access to video based communication. In reaching this objective of affordable first language telecommunications for the Deaf community affordability and practicality is of the essence. The lower the quality of video, while still supporting an intelligible Sign Language conversation, the lower the cost of the bandwidth and the lower the required specification of the cell phone and thus cost of the cell phone.
The main question that was asked by this research was:
What is the lowest video resolution and frame rate that would provide intelligible SASL video on a cell phone?
With the secondary question:
How does one measure intelligibility of Sign Language video material?
The collected information could be used in the future development of video communications over mobile phones for the Deaf community using SASL. The ultimate goal is the development of a usable video communications application on low end smart phones, bringing affordable telecommunications to the South African Deaf community.
1.3 Dissertation Outline The text based telecommunications options available to the Deaf community are described in Sections 2.2 and 2.3. This is followed by a basic introduction to digital video, including current cell phone support for digital video capture, and cell phone specific compression techniques in Section 2.4, before looking at the video based telecommunications options that are available to the Deaf community in Sections 2.6 and 2.7. Section 2.8 finishes off with an overview of Sign Language video quality requirements, and a review of related work.
Chapters 3 and 4 describe the subjective evaluations of SASL video clips at different resolutions and frame rates. The results from these pilot studies are incorporated into the development of the final experiment, described in Chapter 5.
The dissertation is concluded in Chapter 6 by reviewing these discussions and considering future work.
2 Background and related work Telephones are by definition designed for audio communication, spoken words, whereas the Deaf communicate visually through Sign Language, making the telephone inappropriate for the use of the Deaf community without adding to the installed telephone infrastructure.