Bibliography:
Wobbrock, Jacob O., Andrew D. Wilson, and Yang Li. "Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes."Proceedings of the 20th annual ACM symposium on User interface software and technology. ACM, 2007.
Link:
http://dl.acm.org/citation.cfm?id=1294238
Summary:
This paper introduces the $1 recognizer which is "easy, cheap, and usable almost anywhere in about 100 lines of code." It aims to have an accessible recognizer for novice programmers to use in UI prototypes. They compare their recognizer against the Dynamic Time Warping recognizer and Rubine classifiers. The $1 recognizer performs just as well as DTW and better than Rubine.
To compare a candidate gesture C to a template gesture Ti they use the distance between each corresponding sample point between C and Ti and create a score based on the minimum path-distance.Before scoring the candidate and template gestures are normalized.
The $1 recognizer uses a 4 step process. First they resample the point path and normalize it in respect to stroke length so that each sample is the same distance apart. Second they rotate the gesture so that the indicative angle is zero degrees from the horizontal. The indicative angle is given by the angle of the line joining the centroid to the first point of the gesture. Third they non-uniformly scale the gesture to a reference square and translate it so that its centroid lies at (0, 0). This ensures that differences between candidate and template points are only due to rotation and not aspect ratio. Finally they rotate the candidate gesture until they find the best score / global minimum of path-distance. Instead of the searching the entire angular space they use the Golden Section Search (GSS) strategy which minimizes the cost of searches between dissimilar gestures.
Since it is scale, rotation, and position invariant the $1 recognizer has limits on recognizing gestures that depend on scale, rotation, and position. The invariance can be removed on a per gesture basis if required.
Comments:
I thought it was pretty amazing This reminds me of the SIFT paper in computer vision for tracking features which is also scale, rotation, and translation invariant. Instead of point distances they compare image gradients in a normalized orientated window.
Research Ideas:
I wonder what other computer vision feature tracking algorithms would be useful in gesture recognition.