Monday, November 24, 2014

Fitts' law as a research and design tool in human-computer interaction (Paper Review)








Bibliography:

MacKenzie, I. Scott. "Fitts' law as a research and design tool in human-computer interaction." Human-computer interaction 7.1 (1992): 91-139.


Link: 


Summary:

This paper analyzes how Fitts' law has been used and the results of subsequent research using Fitts' law or a modified form of it.


Comments:

I'm taking Embodiedment Interaction and seems like this model might fit in with this paradigm.


Research Ideas:

I would be interested to see if using 3D hand gestures could be analyzed using this model.


Wednesday, November 19, 2014

Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams (Paper Report)



Bibliography:


Bhat, Akshay, and Tracy Hammond. "Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams." IJCAI. Vol. 9. 2009.

Link:


http://www.aaai.org/ocs/index.php/IJCAI/IJCAI-09/paper/download/592/906

Summary:


Entropy is used, in the sense of information theory, to classify text from shapes. In this sense text is thought to be more random, have greater entropy, than shapes. To encode the entropy of a stroke the authors use an alphabet which consists of seven letters which describe the angle of a stroke point with its two temporal neighbors (one for the endpoints.) Some preprocessing is done by normalizing the stroke, resampling so that points are equidistant from each other. Strokes are also grouped together according to some threshold in the spatial and temporal dimension. Strokes that are close together in time and space should belong to the same class (text or shape.) Finally probability of given 'letter' is used to calculate the entropy according to the following formula:


To calculate the confidence the authors use:


where b is the confidence of classification of TEXT is 0.5.

They trained on COA drawings and tested classification on mechanics drawings with favorable results. 

Comments:


I think this approach is very clever.  I'm interested in how entropy differs between different users from different locales.

Research Ideas:


Find out what other domains might entropy be relevant in classification.

Wednesday, October 22, 2014

Combining Corners from Multiple Segmenters (Paper Report)











Bibliography:
Wolin, Aaron, Martin Field, and Tracy Hammond. "Combining corners from multiple segmenters." Proceedings of the Eighth Eurographics Symposium on Sketch-Based Interfaces and Modeling. ACM, 2011.

Link: 

Summary:

In this paper the authors implement a corner subset selection with corners detected by five algorithms from different papers (ShortStraw, Doublas-Pecker, Paleo, Sergin, and Kim). In their system the stroke finds all the corners detected from the algorithms and combines them into a master set removing duplicates.

The cull this set using a sequential floating backwards selection. It is a greedy algorithm. From this set, they iteratively remove points and measure the error difference between current and previous sets of points. The point removed is the point that makes the least difference in the error metric. Points are floating so they can be added back in. The error is measured as the squared vertical distance from the original stroke to the current set of proposed polylines. To get the preferred set they used the ratio of the difference of errors and determined when this ratio changed the most, i.e. passed a certain threshold.

The threshold was determined from a training set where the number of corners was known. The ran the algorithm and produced collections of strokes for when the ratio of error differences reflected when the number of points overdetermined the corners (false positives) and when the number of points underdetermined the corners (false negatives.) They used the MAD or meadian absolute deviation of each collection to find the dividing point for the threshold, in between the two.

Comments:

I thought this was a very interesting approach to corner finding, using a combination of other corner finding algorithms, and the fact that the all or nothing accuracy was higher than any other algorithm alone.

Research Ideas:


Monday, October 20, 2014

Sketch Based Interfaces: Early Processing for Sketch Understanding (Paper Report)



Bibliography:


Sezgin, Tevfik Metin, Thomas Stahovich, and Randall Davis. "Sketch based interfaces: early processing for sketch understanding." ACM SIGGRAPH 2006 Courses. ACM, 2006.


Link:


http://dl.acm.org/citation.cfm?id=1185783


Summary:


In this paper the authors present a system designed to be a natural sketch interface for the design of mechanical-engineering drawings. It was designed to recognize what is drawn instead of how something is drawn.

The system follows three steps to create its drawings: approximation, beautification, and basic recognition of the stroke.

To approximate strokes it tries to detect the vertices of the shape. This is done by doing some preprocessing of the stroke. The direction, curvature, and speed graphs of the stroke are computed and used to determine the which points are vertices. Vertices are usually found at the peaks of high curvature. It has also been observed that users slow down at corners, which is signified by valleys in the speed graph. To factor out the noise from the stroke, peaks and valleys are only considered if above/below a certain threshold. These where empirically found to be the mean of curvature graph and %90 of the mean of the speed graph. The mean is used so that the threshold is dependent on the graph data instead of it being a global value. The threshold splits the graph into regions and the extrema are chosen from the satisfying regions.

Usually taking only curvature into consideration is insufficient, and both speed and curvature are taken into account. The certainty that points are vertices are calculated by find the scaled magnitude of curvature in a neighborhood for those found in the curvature graph and the ratio of speed from the speed graph. The initial hybrid fit is the intersection of points that are both peaks and valleys in the curvature and speed graphs respectively. Subsequent hybrid fits are made by adding the next best speed and curvature fits. The error of the fit by finding the orthogonal squared distance error of the stroke to the fit. Points are added until an error threshold is reached and the best fit with the least number of vertices is chosen.

Curves are found by calculating the ratio of arc length between vertices to the euclidean distance between vertices with those with a high enough ration being deemed curves. Arcs are approximated with Bezier curves that are approximated by linear segments. It is subdivided until the squared distance error is below some threshold.

Beautification is done by rotating line segments around their midpoints when they are 'supposed to be linear,' to match nearby segments. This is done by comparing segments to others within a window.

Finally basic recognition is done by testing distance to its bounding box for squares and squared distance error measures for fitted ellipses.

A high level recognizer was implemented to recognize domain specific shapes. Also over-tracing was addressed.

Comments: 


It seems as though how something is drawn,  slowing down before taking corners, can still aid in recognition. Also comparing the arc length and euclidian distance for finding corners is very similar to that of Short Straw's.

Research Ideas:


I wonder what other behavioral features can used to recognize sketches.

Short Straw

Monday, October 13, 2014

Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes (Paper Report)



Bibliography:

Wobbrock, Jacob O., Andrew D. Wilson, and Yang Li. "Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes."Proceedings of the 20th annual ACM symposium on User interface software and technology. ACM, 2007.

Link:

http://dl.acm.org/citation.cfm?id=1294238

Summary:

This paper introduces the $1 recognizer which is "easy, cheap, and usable almost anywhere in about 100 lines of code." It aims to have an accessible recognizer for novice programmers to use in UI prototypes. They compare their recognizer against the Dynamic Time Warping recognizer and Rubine classifiers. The $1 recognizer performs just as well as DTW and better than Rubine.

To compare a candidate gesture C to a template gesture Ti they use the distance between each corresponding sample point between C and Ti and create a score based on the minimum path-distance.Before scoring the candidate and template gestures are normalized.

The $1 recognizer uses a 4 step process. First they resample the point path and normalize it in respect to stroke length so that each sample is the same distance apart. Second they rotate the gesture so that the indicative angle is zero degrees from the horizontal.  The indicative angle is given by the angle of the line joining the centroid to the first point of the gesture. Third they non-uniformly scale the gesture to a reference square and translate it so that its centroid lies at (0, 0).  This ensures that differences between candidate and template points are only due to rotation and not aspect ratio. Finally they rotate the candidate gesture until they find the best score / global minimum of path-distance. Instead of the searching the entire angular space they use the Golden Section Search (GSS) strategy which minimizes the cost of searches between dissimilar gestures.

Since it is scale, rotation, and position invariant the $1 recognizer has limits on recognizing gestures that depend on scale, rotation, and position. The invariance can be removed on a per gesture basis if required.

Comments:

I thought it was pretty amazing This reminds me of the SIFT paper in computer vision for tracking features which is also scale, rotation, and translation invariant. Instead of point distances they compare image gradients in a normalized orientated window.

Research Ideas:

I wonder what other computer vision feature tracking algorithms would be useful in gesture recognition.




Wednesday, October 8, 2014

PaleoSketch: Accurate Primitive Sketch Recognition and Beautification (Paper Report)







Bibliography:
Paulson, Brandon, and Tracy Hammond. "Paleosketch: accurate primitive sketch recognition and beautification." Proceedings of the 13th international conference on Intelligent user interfaces. ACM, 2008.


Link:
http://dl.acm.org/citation.cfm?id=1378775

Summary:

This paper presents a system that improves the recognition of primitives shapes and the complex shapes based on those primitive shapes.  The shapes recognized are lines, poly-lines,  circles, ellipses, arcs, curves, spirals, and helixes, and recognition was %98.56.  It introduces two new features and a ranking system for shape recognition. The two new features are NDDE ( Normalized Distance between Direction Extremes ) and DCR ( Direction Change Ratio ). NDDE measures the length between the smallest and greatest direction values ( change in y over change in x ) and DCR is the ratio between the max change in direction over the average change in direction.


Comments:



Research Ideas:

Monday, October 6, 2014

What!?! No Rubin Features?: Using Geometric-based Features to Produce Normalized Confidence Values for Sketch Recognition (Paper Report)



Bibliography:

Paulson, Brandon, et al. "What!?! no Rubine features?: using geometric-based features to produce normalized confidence values for sketch recognition." HCC Workshop: Sketch Tools for Diagramming. 2008.

Link: 

http://srl.tamu.edu/srlng/research/paper/23?from=/srlng/research/

Summary:


In this paper the authors create and test a system that uses both gesture and geometric-based features for sketch recognition of complex shapes. Gesture recognition depends on how the user draws whereas geometric-based recognition depends on what the user draws. Their aim is to create a recognition system that was user-independent. This system was tested against the Paleosketch recognizer and the differences between the two were shown to be statistically insignificant. The classifier they used was a statistical classifier. The total angle feature from gesture recognition was the only significant feature chosen for the optimal subset of features.

Comments:

It is interesting that these two methods of classification can be combined and used so effectively, drawing on the strengths of both methods. I didn't understand the significance of the way the user groups were split and their effect on the validation. Also the way the subset of optimal features was found.

Research Ideas:

I wonder why there were so few gestural features included in the subset of optimal features.

Wednesday, October 1, 2014

Sketch Recognition Design Considerations and Improvements to Mechanix


10 Principles of Things a Sketch System Should Do:

  1. Be reliable.
  2. Have smooth feedback between where my pen is and what is drawn on screen.
  3. Give feed back if a gesture is or is not accepted.
  4. Have clear instructions.
  5. Have simple gestures.
  6. Adapt to different users.
  7. Gestures should try to be related to what they do if possible.
  8. Not too many gestures.
  9. Should be able to undo easily.
  10. Utilize the strengths of sketching.

10 Principles of Things a Sketch System Should Not Do:

  1. Should not lag.
  2. Have complicated gestures.
  3. Have too many gestures.
  4. Have not enough gestures to do what you want to do.
  5. Accept incorrect gestures / input.
  6. Try to use sketching for something that it would be terrible for.
  7. Have convoluted instructions.
  8. Make too many assumptions about the user.
  9. Reject correct gestures / input.
  10. Have too many gestures that don't relate well to what they do.

Five suggestions for improvement to Mechanix

  1. Improve client-server lag.
  2. Have more robust shape recognition. Shape creation was too dependent on node order and shapes I thought were correct were not recognized.
  3. Able to sketch the labels / letters.
  4. Accept gestures for labeling instead of rollout menu.
  5. Have a mask for selection or input. ...I'm drawing / erasing only (trusses, forces,... labels)

Visual Similarity of Pen Gestures (Paper Report)



Bibliography:

A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, and Joseph Michiels. 2000. Visual similarity of pen gestures. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems (CHI '00). ACM, New York, NY, USA, 360-367. DOI=10.1145/332040.332458 http://doi.acm.org/10.1145/332040.332458

Link:

http://dl.acm.org/citation.cfm?id=332458

Summary:

This paper intended to find out why some gestures were similar while others were not in order to facilitate gesture design. Similar gestures may be easier to learn and remember. The gestures evaluated were single-stroke and iconic, similar to the Apple Newton MessagePad. Psychology research in the area of similarity of geometric shapes revealed that similarity could vary linearly or logarithmically based on such measurements such as area, width, height, tilt, ...etc, but the similarity metrics used not only differed from person to person, but could also differ from shape to shape.

Multi-dimensional scaling looks at a data set and tries to reduce the dimensionality of the set so that distances reflect the similarity and dissimilarity of the objects. When using MDS, the number of dimensions, the distance metric, and the meaning of the axes, are some of the issues that must be decided.

Two experiments were run with two different sets of gestures and two different sets of subjects. The first set of gestures were designed to be as different as possible from each other. Display of animated gestures were used instead of having the subjects draw them in order to have more participants and gestures even though drawing them would give more context in usage. Using regression analysis on the geometric features they were able to create a model to predict similarity of gestures. The second experiment used three sets of similar gestures and a fourth set containing gestures from the previous set. The second experiment was not as successful at predicting similarity.

Comments:

I thought this paper could be very helpful in gesture design and designing experiments for effective gesture design. The fact that many of the features of Rubine's paper were used could signify their importance in gesture classification. It was surprising that the second experiment did not do as well as the first.

Ideas for Research:

I was thinking of extending this experiment into 3D gestures in VR environments.

Wednesday, September 24, 2014

Specifying Gestures by Example (Paper Report)






Bibliography:

Rubine, D. (1991). Specifying gestures by example (Vol. 25, No. 4, pp. 329-337). ACM.


Link:

http://srl.tamu.edu/srlng_media/content/objects/object-1236962325-cefe7476d664dc727f969660eac672cc/bde-GI-FinalVersion.pdf

Summary:

Previously recognition of gestures was hand-coded.
GDP a sketch program that recognizes gestures. 

GRANDMA (Gesture Recognition Automated in Novel Direct Manipulation Architecture). Design application by assigning gestures to view classes and specifying what the gestures do. 

Recognizes gestures by first building a classifier from examples. Gestures are single strokes. A feature vector is created for each stroke.

Classification. Each class of gesture is specified by the weights it gives for each feature. Each feature vector is created from the measurement of a set of 13 features. Features 12 and 13 have a dynamic component. They measure speed and duration.

Training for the classifier is done with a linear discriminator. First they get a sample estimate of the mean feature vector for each class. Using the mean feature vector for each class they calculate the sample estimate of the covariance matrix for each class and then create the sample estimate of the common covariance matrix using all the sample estimate of the covariance matrices for all classes.
Finally they use the inverse of the sample estimate of the common covariance matrix to calculate the weights for the class evaluators.

In the cases of outliers and ambiguous classification, the probability of correct classification and the standard deviation of the input gesture to its classification is used to determine rejection.
Rejection occasionally occurs on gestures that would be acceptable so it should be turned off if the application supports robust undo.

GSCORE.

Comments:

Seems like a good approach for recognizing gestures. Gestures could be tailored for each user.
Mouse for gestures is outdated. Multi-touch gestures(at least direct manipulation) the norm.

Research Ideas:

Can you use 2D gesture recognition techniques in 3D?

Who Dotted that 'i'? (Paper Report)





Bibliography:

Eoff, B. D., & Hammond, T. (2009, May). Who dotted that'i'?: context free user differentiation through pressure and tilt pen data. In Proceedings of Graphics Interface 2009 (pp. 149-156). Canadian Information Processing Society.

Link:

http://srl.tamu.edu/srlng_media/content/objects/object-1236962325-cefe7476d664dc727f969660eac672cc/bde-GI-FinalVersion.pdf

Summary:

This paper tries to solve the problem of how to distinguish the stroke of one user from a set of other users using pen pressure and tilt data.

Comments:


Research Ideas:


Tuesday, September 16, 2014

Mechanix: A Sketch-Based Tutoring and Grading System for Free-Body Diagrams (Paper Report)


Bibliography:

Valentine, S.; Vides, F.; Lucchese, G.; Turner, D.; Kim, H.; Li, W.; Linsey, J.; and Hammond, T. "Mechanix: A Sketch-Based Tutoring and Grading System for Free-Body Diagrams." AI Magazine. Winter 2012. 55-66. Print.

Link:

http://www.aaai.org/ojs/index.php/aimagazine/article/view/2437/2347

Summary:

Mechanix is a system designed to aide in the instruction and grading of statics problems of mechanical and civil engineering students. The system leverages sketching to teach statics by "actively engaging" students in the learning process. In one mode of the program students are asked to sketch out trusses and are given constant feedback in their drawings. Such immediate feedback on hand drawn sketches is impossible in large class sizes. This is where sketch recognition allows real-time feed back and automated instruction.

Other similar systems were either too general, did not incorporate the action of sketching the drawings, or were too strict in their drawing order. Mecanix utilizes a powerful low-level recognizer, PaleoSketch, to recognize high-level complex shapes.

The interface consists of a problem statement, standard edit tools, a step-by-step checklist, notepad area, drawing pane, feedback area, and equation pane. Students can check their answers at anytime by clicking a submit button and feedback is instantly provided to guide students.The interface is similar for instructors except they provide critical information and constraints about the truss and non-truss drawings. The problems are saved to the server where they can be retrieved by students. The interfaces also provides information on students submissions.

Students can draw, move, label, color, edit parts, or totally delete shapes. Visual feedback about completed shapes signal mistakes to students.

Geometric recognition of shapes is handled by a bottom-up approach by recognizing low-level shapes through PaleoSketch, and then recognizing groupings of shapes as high-level complex shapes. Trusses are described as two or more convex polygons connected by shared edges. Shared edges are found by removing edges from a connectivity graph and doing a breadth-first search for alternate paths between the two points that made up that edge. Checking answers includes comparing students drawing to instructors drawings and comparison of forces. Feedback is given in the case of found errors.

Problems are also given in non-truss free-body diagrams which are just closed shapes. The comparison between student and instructor drawing then employ three similarity metrics: Hausdorff distance, a modified Hausdorff distance, and the Tanimoto coefficient. The first two used measured the closest distances between points of the two drawings and the last used a ratio of overlapping points. These were then averaged and used as metric to determine similarity.

Also creative response problems where the answer was "open-ended" was also supported. In this case the validity of answers was handled by an artificial intelligence which creates a linear system of equations from the student drawn truss. These are then compared to a list of instructor given constraints and feedback is given.

The system is distributed on servers and load-balancing is used to mitigate stoppages when students submit answers.

Students who used Mechanix scored 40 percent higher on homework assignments.

Comments:

I believe this system is excellent. The advantages of having feedback and sketching the problems by hand had undeniable positive consequences. Some of the tests similarity were foreign to me. I liked how the authors leveraged an existing system, PaleoSketch, to solve a more complex recognition problem.

Ideas for Research:

One Idea for research would be to create a pre-viz system where you could sketch out storyboards for a film or short and have the system solve for camera placement, lighting, and actor placement.


Monday, September 15, 2014

K-Sketch: A “Kinetic” Sketch Pad for Novice Animators (Paper Report)


Bibliography:

Richard C. Davis, Brien Colwell, and James A. Landay. 2008. K-sketch: a 'kinetic' sketch pad for novice animators. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '08). ACM, New York, NY, USA, 413-422. DOI=10.1145/1357054.1357122 


Link:

http://dl.acm.org/citation.cfm?id=1357122


Summary:

This paper describes a tool that allows 'non-animators' to intuitively create animations. The main focus of this tool is to make animation accessible to the novice. Ease of use and design that followed the users intuition were an emphasis. Teachers hoping to use animation in storytelling as well as engineers desiring to sketch-up a prototype of a concept, such as the motion of a robots treads over obstacles, comprise a few of the 'non-animators' this tool aims to assist. Both 'animators' and 'non-animators' alike were surveyed to discover the minimal set of operations that needed to be implemented to complete the most number of relevant tasks.

The system allows users to sketch objects on the virtual canvas and translate, rotate, and scale that object through time through an onscreen widget that appears over the object once selected.. To transform an object the user must first select it and, while holding the alt-key, click and either move, rotate, or scale the object in time. And as long the mouse button is held down, time will advance and the object will follow the action created by the mouse. Other features include erasing the object at a certain point in time, creating new objects in time, copying the motion of one object to the other, adding relative motion to the objects frame of reference, and allowing the object to follow in the direction of the sketched path.

The system was tested by asking the users to complete a set of tasks with the K-Sketch system and with the more technical Power Point animation system. Users reported that the K-Sketch system was more natural and easier to use, and they were more willing to animate and show their animations in front of an audience as opposed to with Power Point.

Comments:

I wasn't sure how the system worked until I saw the demonstration videos at http://www.k-sketch.org/. I think the system works great and is exactly what they were going for. The interface looks intuitive and easy to use. It was discussed in class and I also agree that it would be nice to add physic based animation to the tool set.

Ideas for Research:

This paper as well as the discussions in class gives me an idea extending the animation system to include physical simulations like having rigid bodies collide and break and attaching objects together with spring like forces.




Wednesday, September 10, 2014

ICanDraw: using sketch recognition and corrective feedback to assist a user in drawing human faces (Paper Report)


Bibliography:

Daniel Dixon, Manoj Prasad, and Tracy Hammond. 2010. iCanDraw: using sketch recognition and corrective feedback to assist a user in drawing human faces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 897-906. DOI=10.1145/1753326.1753459

Link: 

http://dl.acm.org/citation.cfm?id=1753459


Summary:

In this paper, the author presents a system to assist in the act of drawing the human face, a very difficult skill to master. The user is given a photo portrait and is asked to follow a set of steps to complete a drawing of the given face. At each step the system seeks to assist the user in placing the salient features that make up the face. The features to be drawn at each step are focused by dimming the rest of the face. User feedback is given in the form of text feedback that points out errors such as an eye being too small as well as visual feedback that highlights where contours should go or which strokes appear misplaced. A template of the desired face is made by first analyzing the photo portrait with facial recognition algorithms to produce a set of features that define a face. A contour drawing of the points is used to judge the correctness of the strokes the user makes as well as determining the effectiveness of the system by comparing the users processed drawing, thorough the facial recognition system, to the desired drawing.


Comments:

I feel that the intention of the system is in the right direction but some of the methods are more mechanical than I would prefer. There are other aspects of drawing that I feel would be helpful. One thing I would have liked to see addressed is the fear of starting the drawing in the first place. Staring at a blank page is frightening to novice and master alike. Somehow facilitating the first and subsequent marks, I feel, would aid in the process of drawing the face. Also I would like to incorporate defocus attention on the drawing action and more on the 'seeing/drawing' connect.

Ideas for Research:

I think that approach to making marks for facial features would be to turn it into a game where the goal is to slice object down the middle. The slices would then 'draw' out the face. This abstracts the drawing act enough that the users own perceptions of how a face looks doesn't get in the way.