ASL recognition


1.      When a user encounters an unknown sign, the user can perform the sign in front of a webcam, or submit an existing video of that sign. In figure 1, a sign video is imported to the system.


Description: Description: C:\Users\zhong\Desktop\ASL recognition introduction\index_files\image001.jpg

Figure. 1



2.      Then, the system asks the user to mark the start and end frames of the actual sign in the video, and to indicate whether the sign is one-handed or two-handed, and which is the dominant hand. In the START/END FRAME part of figure 2, the user can mark the start and end frames. Figure 2 shows that the sign starts from 82-th frame and ends at 97-th frame.


Description: Description: C:\Users\zhong\Desktop\ASL recognition introduction\index_files\image002.jpg

Figure. 2


3.      At the next step, the system detects the bounding boxes of hands on all frames using features based on skin color and motion. The user views the hand detection results, and can correct those results on any frame. As soon as the user makes a correction, the system propagates information from that correction to improve the detection results in the rest of the frames. Our system asks the user to annotate the locations of hands at the first frame and then the system will detect hands in the rest frames. The left part of figure 3 shows the initialization. Since hand detection is still an open problem in computer vision community, there is no way to guarantee that the hand detector always gives the correct result. Once encountering some wrong detection results, the system will ask the user to correct these mistakes. The right part of figure 3 shows our hand correcting process.


Description: Description: C:\Users\zhong\Desktop\ASL recognition introduction\index_files\image003.jpgDescription: Description: C:\Users\zhong\Desktop\ASL recognition introduction\index_files\image004.jpg

Figure. 3


4.      After hand detection results have been approved by the user, the system computes the similarity between the query sign and all database signs. The system ranks the 1113 distinct signs in decreasing order of similarity to the query. Once the signs have been ranked, the system presents to the user an ordered list of the best matching signs. The user then views the results, starting from the highest-ranked sign, until encountering the video displaying the actual sign of interest. Figure 4 shows the most possible sign is: horde-crowd. The correct sign: adopt ranks at 21. 


Description: Description: C:\Users\zhong\Desktop\ASL recognition introduction\index_files\image005.jpg

Figure. 4


5.      When the user identifies the correct database sign, the user can readily view any additional information associated with that sign. Currently, our signs are labeled with very rough English glosses.