«A dissertation submitted to the Department Of Computer Science, Faculty of Science at the University Of Cape Town in partial fulfilment of the ...»
5.2.1 Participants Twenty four adult members of the Deaf community (twelve women, twelve men) ranging in age from 20 to 64 (mean = 37) participated in this study. All the participants were native signers and have used SASL as their principle mode of communications for most of their lives, with years of SASL experience ranging from 10 to 60 years (mean = 32). Six of the participants were staff members of DCCT, with the remaining eighteen participants being visitors to The Bastion. Five of the participants participated in one or both of the pilot studies. Of the twenty four, sixteen had English as their language of literacy, two Afrikaans, one Xhosa, three English and Afrikaans and two participants used both English and Xhosa as their reading and writing language. A further five participants completed the experiment, but they were removed from the study because of too many unanswered or wrongly answered questions.
All participants were introduced to the experiment and each signed a consent form to confirm that they fully understand the project, agree to participate and understand that all information provided would be kept confidential.
5.2.2 Experimental Setup The final intelligibility experiment was conducted across two separate days at The Bastion in Newlands, Cape Town. Because of the number of participants involved in the final experiment, they were handled in groups of between four and eight participants at a time, depending on availability.
Each participant was seated at a desk with a pen and a copy of the questionnaire.
All communications between the researcher and participants were interpreted by a certified SASL interpreter. Although the questionnaires were explained in SASL and all queries were answered through the SASL interpreter, the questionnaires were provided in written English and answered in written English.
The participants were introduced to the experiment with the help of the SASL interpreter. It was made clear during the introduction that the focus of the experiment was on evaluating the quality of the video clips and the intelligibility of the SASL in the video clips at different quality settings, and not to evaluate the participants’ proficiency in SASL.
Seeing that written/spoken language is not the participants’ first language, and the questionnaire required the participants to write down what they understood the Sign Language video clip contained, all participants were asked if they are comfortable writing their answers out. They were given the option of giving their responses to the questionnaire through the interpreter.
The questionnaire and how to answer the questions were explained to the group of participants after which they were given the opportunity to read through the questionnaire at their own pace, asking for clarification on any of the questions. With only one video clip to be viewed by each participant and no clear advantage provided by the practice video clip and questionnaire in the second experiment, no practice questionnaire was used.
When all participants in the group were ready, the researcher moved from one participant to the next showing one of the four video clips to each participant using the Vodafone 858 Smart cell phone. Each participant could watch their video clip once, after which they were given the go ahead to complete the questionnaire on the clip they were shown.
The same clip was never shown to two adjacent participants to make sure that no two participants could influence each other’s answers.
5.2.3 Cell phones The Nokia N96 cell phones that were used in the two initial experiments were no longer available by the time the third experiment was conducted. In the final experiment the Nokia N96 was replaced with the Vodafone 858 Smart cell phone, as shown in Figure 5-1 . The Vodafone 858 Smart has a screen size of 2.8” (71 mm) diagonally and a resolution of 240 x 320 pixels, similar to the Nokia N96 in both physical screen size as well as resolution, but where the Nokia is capable of displaying up to 16 million colours, the Vodafone 858 can only display 256K colours. Because of the similar physical screen sizes as well as similar resolutions between the two phones the Vodafone 858 was deemed an equivalent replacement for the Nokia N96 in the third experiment. The Vodafone 858 Smart cell phone runs Android OS, v2.2.1 (Froyo) on a 528 MHz ARM 11 processor with dynamic underclocking and an Adreno 200 GPU. It is equipped with 256 MB RAM, of which 180 MB is accessible to applications .
5.2.4 Video clips Four video clips were used, each showing the same sign language user in the same environment, with consistent lighting, background and distance from camera, signing in SASL.
To simplify the experiment and limit the study to four groups it was decided to focus on only two frame rates, namely 20 and 10 frames per second.
Figure 5-1: A Vodafone 858 Smart.
The cell phone model used in the final experiment.
Four clips were acquired from a DVD, as MPEG-4 files at full resolution and frame rate, and at best possible quality. Each of the clips were then recompressed to the required resolution and frame rate, using the Export (Using QuickTime conversion) feature of Final Cut Express (v4.0.1).
As was done in the follow-up pilot study (Experiment 2) the source video clips were resized to the desired resolutions by cropping the frames. This made sure no space on the cell phone screen was wasted with black bands or unused background area, making much better use of the available screen resolution. And giving an accurate simulation of the screen real estate usage as would be the case when the phone was used for video communication.
A data rate of 5000 kbits/sec was used to minimise the impact of the video compression on the quality of the resulting video clip.
One clip was created for each of the four possible combinations of resolution and frame rate.
The basic details of these four video clips are shown in Table 5-1. The full details of the video clips, including the data rate, file size and duration of each of the video clips are available in Table C-2, in Appendix C.
5.2.5 Questionnaire Each set of questionnaires, as shown in Appendix C, contained a cover page explaining the purpose of the experiment and provided a summary of the experimental procedure, as well as consent form to be signed by each participant to confirm that they understand the project, they agree to participate and that all information provided will be kept confidential.
On the back of this page was a short form to gather background information about each participant, including gender, age, preferred reading and writing language, as well as number of years the participant has been speaking SASL.
The second page contained the questionnaire to be completed by the participant to evaluate the sign language video clip.
Question 1 What was said in this video?
As in both pilot user studies, this question served two purposes. The first was to encourage the participant to pay attention to what is being said in the video, and concentrate on understanding what is said in the video, and secondly to get an idea of how close to the original phrase the participant understood the message.
No numeric value was assigned to the answer.
Question 2 I am sure of my answer to Question 1?
Possible answer strongly neither agree disagree disagree nor disagree agree strongly agree This question functions in conjunction with question 1, and provides an opportunity to check the participants answers. If the participant correctly wrote down the signed phrase in question 1, the answer to this question should show the participant sure of his answer.
Remaining questions The remainder of the questionnaire consisted of seventeen five-level Likert items all using the typical Likert scale, as was used in question 2.
neither agree nor strongly disagree disagree disagree agree strongly agree These questions were grouped into sets, the statements in each set testing the same feature of the video, but one in a confirmative and the other in a negative phrasing. The order of the questions was randomised to limit the answers of one question influencing the other question in the pair.
Sign Language uses two main parts of the body for communications, the face as well as the hands of the speaker. The first four groups of questions focuses on these two areas and attempt to evaluate the impact lowering the frame rate and resolution has on the comprehension of these areas separately.
14. I could clearly see the hands.
18. It was difficult to see the hands.
3. It was difficult to follow the hand gestures in this video.
8. I could clearly see all the hand gestures in this video.
10. I had difficulty seeing the details of the face.
12. I could clearly see the details of the face.
5. I had no problems seeing the facial expressions in this video.
9. It was difficult to follow the facial expressions in this video.
The movement and video speed is focussed on the movement of the hands and arms being blurred, something that is expected to happen at lower frame rates. It evaluates the general feel of the video clip, separate from the specifics of the face and the hands.
4. The movement was blurry.
7. The movement was clear.
6. The video was the right speed.
15. The video was too slow.
19. The video was too fast.
The last two groups of the questionnaire focuses purely on the intelligibility of the video clip, and not on the quality of the video clip. Because of the different dialects in SASL, a sign used in the video clip might be a known sign to one participant, while completely senseless or out of context to another participant speaking a different dialect of SASL.
16. I knew all the signs used in this video.
17. Some signs used in this video were unknown to me.
11. I had difficulty to understand what was said in this video.
13. It was easy to understand what was said in this video.
5.3 Observations The experiment ran smoothly, with only a few misunderstandings and recurring questions.
The Sign Language interpreter’s assistance was needed a few times answering the first question of the questionnaire to help the participants with spelling or finding the written word for a specific sign. This occurred more often than was the case in the two preceding pilot studies because of the wider range of literacy of the participants, compared to the initial groups consisting of all DCCT staff members.
Two questions needed regular explanation. The first being the general information question:
Number of years using South African Sign Language. This question was most problematic to the participants that grew up using Sign Language, and could also have been stated more clearly by asking since what year the participant has been using Sign Language. Question four of the questionnaire was the second recurring problem question, requiring the term “blurry” to be explained often.
Despite looking at the phrases used in the video clips to minimise the chances of using a phrase that might have more than one sign, depending on Sign Language dialect, the phrase “short” as used “He is a short man” was pointed out as an unknown sign by a number of participants, with most of the participants knowing the sign.
Table 5-2 contains the mean participant rating for each video clip, as well as the ANOVA significance value for each of the eighteen questions as well as for the average participant rating over all the questions. As can be seen in the table all of the questions returned a significance level of greater than 0.05 (p 0.05) and, therefore, there is no statistically significant difference in the mean participant rating for each of the video clips. No combination of frame rate and video resolution, either high or low, was preferred significantly more or less than any other combination of frame rate and resolution.
Figure 5-2 to Figure 5-19 show the average participant rating for the each of the questions answered by the participants in the questionnaire, with Figure 5-20 showing the overall average participant rating across all questions.
Figure 5-2: The qualitative results for Question 14 Figure 5-3: The qualitative results for Question 18 “I could clearly see the hands.” for each of the three “It was difficult to see the hands.” for each of the three frame rates and two resolutions. With a significance level frame rates and two resolutions. With a significance level of 0.896 (p =.896) there was no statistically significant of 0.644 (p =.644) there was no statistically significant difference in the average participant rating for each of the difference in the average participant rating for each of the video clips. video clips.
Figure 5-4: The qualitative results for Question 3 Figure 5-5: The qualitative results for Question 8 “It was difficult to follow the hand gestures in this video.” “I could clearly see all the hand gestures in this video.” for each of the three frame rates and two resolutions. With for each of the three frame rates and two resolutions. With a significance level of 0.527 (p =.527) there was no a significance level of 0.228 (p =.228) there was no statistically significant difference in the average statistically significant difference in the average participant rating for each of the video clips. participant rating for each of the video clips.
Figure 5-6: The qualitative results for Question 10 Figure 5-7: The qualitative results for Question 12 “I had difficulty seeing the details of the face.” for each of “I could clearly see the details of the face.” for each of the the three frame rates and two resolutions. With a three frame rates and two resolutions. With a significance significance level of 0.945 (p =.945) there was no level of 0.743 (p =.743) there was no statistically statistically significant difference in the average significant difference in the average participant rating for participant rating for each of the video clips. each of the video clips.