The training set used for learning the digit models consisted of 30 video sequences, 3 sequences from each of 10 users. In each sequence, the user signed once each of the 10 digits, for a total of 30 training examples per digit class. Users wore color gloves in the training sequences. The 30 training sequences were used in the offline learning stage, to learn models, pruning classifiers, and subgesture relations.
We used two test sets for evaluating performance on digit recognition: an "easy" set and a "hard" set. In each test sequence, the user signed once each of the 10 digits, and wore short sleeves (and, naturally, no color gloves). For both test sets, experiments were performed in a user-independent fashion, where the system recognizes test gestures of a particular user using digit models which were learned from training examples collected from other users.
The easy test set contains 30 short sleeve sequences, three from each of 10 users. The hard test set contains 14 short sleeves sequences, two from each of seven users. The sequences in the hard test set contain distractors, in the form of one to three humans (besides the gesturing user) moving back and forth in the background (Fig. \ref{figure_distractors}). The presence of such distractors makes these sequences quite challenging for methods assuming reliable hand tracking and methods relying on global features. In the easy test set there are no distractors.
For each video sequence in the dataset there are three files: