4 Mayıs 2009 Pazartesi

Supporting Feedback and Assessment of Digital Ink Answers to In-Class Exercises
Kimberle Koile, Kevin Chevalier, Michel Rheiz, Adam Rogal, David Singer, Jordan Sorensen, Amanda Smith, Kah Seng Tay, Kenneth Wu -- 2007 (8 Pages)

This article discusses Classroom Learning Partner (CLP), a system for classroom teaching based on Classroom Presenter (CP). CLP is a system that interprets and aggregates students answers on Tablet PCs. The aggregation (also used as "clustering" and "grouping" elsewhere) and interpretation parts employ AI techniques.

The system described in the article has four major parts. The first part is the Instructor Authoring Tool (IAT). The IAT is used by the instructor for preparing the questions during the lectures. While preparing the questions, the instructor can also designate where the students will be drawn and what the type of the answer will be (ie string, number, set, sequence, multiple choice etc).

The second part is the ink interpreter part. This part runs on the students PCs. The goal of the interpreter is to extract semantic information from the student's ink. That is, text, arrows and numbers are differentiated from each other. However, the authors say that the current implementation does not differentiate text from sketches and the interpreter passes those inks directly to handwriting recognizer. The handwriting recognizer distinguishes arrows from text. A new interpreter which will segment text and sketches is reported to be under development.

The handwriting recognizer built in the interpreter works in the following way: An ink segmentation module based on Microsoft's ink analyser segments inks into words or arrows. An error correction module fixes errors such as splitting a word into two words or combining two words into one. Then the strokes are passed to Microsoft English recognizer which outputs hypotheses. These hypotheses are sent to the language model and the stroke is labeled as number, string set, set, sequence or Scheme expression.

The recognition accuracy of the ink interpreter is measured with a modified version of Levenstein distance, as opposed to traditional word error rates.

The third part is the aggregator. It uses semantic information produced by the ink interpreter and builds a histogram of the inputs. Two methods are used for this: top-down and bottom-up. Top-down method places all answers into one group and splits the groups into two according to similarities. Bottom-up starts with each answer in its own group and merges the groups until an instructor-specified number of groups are left. The aggregator is responsible for decisions on answers such as "True, true, t, ..." which have similar meanings.

Other than numbers, strings, sets, sequences, true-false and multiple choice questions; new types can be added to the system by defining their similarity measures. It was said that the string comparisons use Levenshtein distance. Likewise, sequences use a modified version of this measure. Sequences are considered as strings and word by word comparisons are made between two.

A test was done on the aggregator where the results of it was compared with human aggregators. It is reported that the human aggregators found CLP's groupings reasonable.

The final part of the system is the results displayer, where answers to questions are displayed via histograms. The structure of the central database where the data are stored is also given in the article.

The evaluation part lists four hypoteses expected from the CLP: increasing student focus and attentiveness, providing immediate feedback, enabling instructor to adjust course material in real-time based on student needs, increasing student satisfaction. The evaluations were done in MIT's computer science classes. An experimental class was given tablet PC's and CLP was run. As a conclusion, the authors argue that fewer students than expected performed poorly. The lowest performing students were the ones in the non-Tablet PC classes. It is also sais that there is a pattern that the poorer performing students benefit the CLP most, and the top students are not benefiting from the technology. The authors think that top students's benefits are either not measured or they are already learning the material more quickly than their counterparts in the control class (non-tablet PC class).

Further work includes more learning studies and further development of CLP. New types such as sequence of characters are planned to be implemented. The authors say that they are working on interpretation and aggregation of sketched answers and marked answers, such as a circle around a multiple-choice answer on a map. These are different than sketches since they are dependent on background image.

A previous work by Richard Anderson, Craig Prince, Beth Simon ("Using Clustering to Assist Understanding of Digital Ink in Low Attention Environments", 2005) focused on sketch clustering but used a very primitive image based clustering algorithm. This study by Koile is a much more complex one, which works on the sketch strokes and semantics. The most important parts of the system are ink interpreter and aggregator. The recognition rate at the end will mostly depend on these two parts. As a result of using semantic interpretation, the system must be fed with new data types when those data types need to be recognized and grouped. This enables the system to group answers which have same meanings but different visual depictions. This means that it is theoretically possible to group answers "9" and "nine". However, almost every answer type should be introduced to the system manually, by defining the similarity metrics of that type.

The current study seems to be more focused on extracting the semantic meanings of the sketches rather than grouping them. It is not yet clear that this is the most efficient method. Other clustering algorithms based on temporal sketch data may yield better and faster results than this approach.

Furthermore, the classroom studies yield interesting results. It is almost obvious that only the poor performing students benefit from clustering. How the better students may benefit from clustering is not yet known. The current situation suggests that clustering only enables poor performing students to learn faster, and it does not dramatically increase the performance beyond a certain point. This may be attributed to the current clustering performance of the CLP which is poor in some answer types such as sequences.

31 Mart 2009 Salı

Using Clustering to Assist Understanding of Digital Ink in Low Attention Environments
Richard Anderson, Craig Prince, Beth Simon -- 2005

The paper talks about sketch clustering in classroom and/or similar environments. Digital ink is used in these classrooms to aid in learning, by giving tablet PCs to the students and the teacher. The aim is to develop a system which will group students’ answer submissions and present them to the teacher in such a way that the cognitive load of the teacher is reduced while evaluating the submissions. The student submissions are mentioned as “slides” through the paper since the students draw their submissions on PowerPoint slides.

The authors use an image-based system for slide clustering. Hand drawn diagrams are rasterized using a grid. The pixels of the grid are painted if the strokes pass through them, generating a binary image. This binary image is then used to obtain a distance image. For each pixel in the distance image the number corresponds to the distance to the nearest pixel in the binary image. Using this distance metric, a K-means clustering algorithm is used to perform the clustering.

The authors carry a classroom experience study to learn about the advantages and disadvantages of diagram clustering. A group of about 60 students are presented 20 tablets which they share with each other. Four tasks (single graph drawing, dual graph drawing, ui layout design and tree drawing) with different number of submissions are evaluated. To test the accuracy of clustering, the authors first group the student slides manually.

The results of the clustering system are compared with manual clustering. However, for the time being, the authors use cleaned up versions of student slides. In other words, “doodles” and preliminary work toward solving the tasks are manually removed from the submitted diagrams. The algorithm makes some errors such as putting obviously different slides into the same clusters. Another error is putting the same group of submissions into different clusters (i.e. splitting). Also, visually minor but otherwise important features of some of the slides are either missed or interpreted wrong. A mean slide for each cluster is given, which represents the slide closest to the mean of that cluster. With different number of clusters, the mean slides of clusters seem to be stable (they do not change with increasing number of clusters). The authors argue that this is an important result, which shows that the mean slides are still representative of their group. It is also said that choosing an incorrect number of clusters is not detrimental; instead it is possible to adjust the number by trying different number of clusters since the algorithm is fast.

Another comparison is given for non-cleaned slides. The results are worse as expected, making more errors than cleaned up versions. It is inferred that removing non-pertinent ink is vital to successful clustering.

The authors accept that their algorithm is not novel, but their dealing with digital ink media in the classroom setting is new. As an initial study on this domain, their results are satisfactory. If the simple algorithm they use is also taken into account, it is clear that diagram clustering is a very open area for further study.

It is important to note that there is no well-defined metric to measure the success rate of clustering. The article gives some comparisons, but they use manually done groupings as ground-truth data. It is really difficult to do a distinction between minor and major features of a hand drawn diagram since it is strongly dependent on the domain. For example, in the graph drawing task, the slopes of the graph is important for the answer, but the algorithm usually disregards subtle differences between flat and sloppy graphs.

Another point is the cleaning-up task mentioned in the article. To overcome the negative effect of non-pertinent strokes on the grouping performance, the user may be forced to draw those strokes on a digital “scratch-paper” interface. Afterwards, only the digital “answer pad” can be used to draw the answer. However, this will restrict the users and impose additional cognitive loads. The article gives this problem as a planned work.

While an image based technique is used in this system, a stroke based technique might be employed in other studies. The article says that simple stroke based features have been tried, but an extensive study does not seem to be done. Today’s sketch recognition systems are very robust in symbol recognition, and they may be used for this task, probably giving much better clustering performance.

24 Şubat 2009 Salı

Visual Recognition of Sketched Symbols

Tom Y Ouyang, Randall Davis


The paper presents a symbol recognition system. The system works on isolated symbols and does low level classification (ie. no context info is needed). It is different from the works in the literature in the way the classification is done. Many other systems use stroke information while this system, in contrast, is based on visual appearance of the symbols.

Symbol Normalization:

The symbols are first normalized by resamping the points of the strokes such that they are evenly separated. Then the symbols are translated such that their center of mass is at the origin and scaled such that it has unit standard deviation in both horizontal and vertical axis.

Feature extraction:

In feature extraction, the normalized symbols are converted to low-level feature images. 5 features are calculated for each point in the input image as follows:

4 orientation features are calculated for 0, 45, 90 and 135 degrees. The feature values are calculated as the difference between the stroke angle and the reference angle, and vary linearly between 1.0 (if the two are equal) and 0.0 (if they differ by more than 22.5 degrees).
The other feature image consists of the endpoints of the stroke. The pixels corresponding to the start and end points of the stroke are given value 1 while others set to zero.
At the end, 5 feature images are produced for each symbol. These feature images are rendered on a 24x24 grid and the images are then smoothed and down-sampled by a factor of 2, producing 5 of 12x12 images.


In recognition, nearest neighbor template matching is used in combination with image deformation model (IDM). Sections of 3x3 of the input image are shifted within 3x3 local windows of the feature images of the example set and a distance metric is calculated for the whole images, where the minimum difference between pixel values are chosen within the local window and the squares of these values are summed.

This approach produces 720 features (5x12x12). To reduce complexity, the images are instead indexed using the first N(=128) principle components. On these components, the sum of square of the differences between nth principle components of input and template images are calculated.

Since an input image must be checked against all of the templates, this calculation is still time consuming. For this reason, the training examples are hierarchically clustered using Euclidean distance measures. This process starts with each example in its own cluster, then progressively merges the two nearest clusters based on Euclidean distance. This continues until there is only one cluster per symbol class. The result of this clustering process is a tree-like structure with the largest clusters at the top and progressively smaller sub-clusters below as one traverses down each branch. When a comparison is to be done, the search starts from the top and traverses down until a cluster is matched. This tree like structure is also not sufficient for high performance. For that reason, a coarse-to-fine ranking search is applied where a coarse search returns N best matches (N=10), and fine search finds the correct match.

Rotational invariance of the system is accomplished by generating 32 rotated templates with evenly spaced angles from 0 to 360 degrees. Rotation matching is only used in the coarse search, and the rotation angle found is reused in the remaining steps.

Finally, the recognition performance of this IDM system is compared with other methods. Among non rotation-invariant techniques, the system ranks 2nd with recognition rate of 99.2% after SVM-RBF. The complexity of the IDM is simpler than SVM-RBF however. Among rotation-invariant techniques, the system ranks best with an accuracy of 98.2% before 96.7% accuracy of Zernike Moments. Regarding runtime performance, the version of this system using hierarchical tree and re-rank algorithm ranks after SVM-RBF with average times per symbol at 8.4 ms to 2.4 ms. The average time for the full IDM (without hierarchical tree and ranking algorithms) is 3952 ms, which indicates about 480 times faster performance for hieararchical tree+ranking algorithms.


The recognition rates for the system are impressive. However, the system currently only works on isolated symbols. The only difference from “content based image search algorithms” are the stroke angle and end-point features of the strokes. The system is translation, rotation and scale invariant, has high performance both in recognition rate and running time.

Ink features for Diagram Recognition

Ink features for Diagram Recognition

Patel et al



The paper presents a formal statistical analysis of features used for distinguishing texts and shapes in sketch recognition problems. These features are mostly the features used in the low-level stages of sketch recognition systems. The authors argue that the geometrical features used in bottom-up and top-down recognition approaches are mostly selected empricially, and no formal study has been done on them.

A large set of 46 features have been chosen among related work in sketch recognition, the features the authors thought might be useful and the features from newly available hardware such as pressure sensitive Tablet PC Screens. These features are grouped into seven categories: size, time, intersections, curvature, pressure, operating system recognition values and inter-stroke gaps.

An example data set is collected from 26 people, each drawing 26 sketches. A total of 1519 strokes are extracted from this data set for testing the 46 features. The authors construct a binary classification tree using R statistical tree to determine the most useful features from their initial selection. It is found that the classification tree consists of only 8 different features: Time till next stroke, speed till next stroke, distance from last stroke, distance to next stroke, bounding box width, perimeter to area, amounf ot ink inside and total angle.

The constructed classification tree is then compared with a Microsoft implementation of text/shape divider and another implementation named InkKit. In tests done using the training data, Microsoft divider gives the worst misclassification rate for shapes and zero misclassification rate for texts. The newly constructed divider outperforms the other 2 dividers in shape classification (10.8% misclassification compared to Microsoft's 75.7% and InkKit's 67.4%) and outperforms inkkit in text classification (8.8% misclassification compared to InkKit's 10.3%, with Microsoft 0%). In tests done using a new diagram set, the new divider still outperforms the other two in shape classification (42.1% misclassification compared to Microsoft's 93.1% and InkKit's 80.8%) but gives the worst result in text classification (21.4% misclassification compared to Microsoft's 1.4% and InkKit's 17.2%). It is seen that text classification gives better results in all three dividers.

The rest of the paper discusses some of the features selected for classification and possible optimizations involving the unused features from the initial set. The authors also indicate that while the features may be useful for distinguishing text and shape sketches, they may not be suitable for use in high level sketch recognition algorithms.


Being a comprehensive study on low-level features of strokes in sketch recognition, the paper gives formal reasons over which features could be selected and which not. It is always desirable to have numbers and percentages in hand while making decisions instead of intuition. However, most of the features selected from the large set for classification are not scale and/or rotation invariant. This raises the question whether the final features selected for the classification of texts and shapes could be used in a broad range of applciations.

Moreover, while the divider constructed gives better results in shape recognition compared to Microsoft and InkKit classifiers, it does not improve the text recognition rate of the other classifiers substantially. It is also somewhat interesting to note that Microsoft classifier gives a misclassification rate of 75.7% for shapes and 0% for texts, which suggests that it is heavily biased towards classifying strokes into texts.

CALI: An Online Scribble Recognizer for Calligraphic Interfaces

CALI: An Online Scribble Recognizer for Calligraphic Interfaces
Manuel J. Fonseca, Cesar Pimentel, Joaquim A. Jorge, 2002

In this paper, Fonseca, Pimentel and Jorge from Technical University of Lisbon describe their online scribble recognizer for calligraphic interfaces, CALI. Scribbles, as they say, are simply multi-stroke geometric shapes. There are two versions of CALI, a non-trainable gesture recognizer and a trainable one.

The recognition algorithm uses a combination of fuzzy logic, geometric features and a set of heuristics. Recognition rates of over 97% for the non-trainable version and 95% for the trainable version are reported.

The non-trainable recognizer uses three main ideas: extracting and using geometric features, using a set of filters to identify or remove unwanted shapes and using fuzzy logic to resolve uncertainities in shape sketches.

The geometric features are obtained first by computing the convex hull of the points. This convex hull is used to compute the largest area triangle and quadrilateral and the smallest area enclosing rectangle. The area and the perimeters of these polygons are used in computation of features. For example, Thinness ratio is defined as (Perimeter of complex hull)^2/(Area of complex hull) and is used to distinguish circles from other shapes, which is expected to give a value near 4PI. Likewise, lines are identified using the aspect ratio (Height of the enclosing rectangle)/(Width of the enclosing rectangle) which is expected to be near zero for lines and larger for other shapes. For each such feature and shape couple, a graph is obtained using training data. Fuzzy logic is then used to derive the fuzzy sets from this data and the degree of membership is calculated for each shape class. An example of the building of rules to classify shapes is given in the paper. Finally, possible ambiguities between shape classes are resolved using context information.

The trainable recognizer uses a Naive Bayes classifier. It is chosen among K-Nearest Neighbors and Inductive Decision Tree algorithms for its high classification efficiency and acceptable training and classification complexity. Naive Bayes algorithm computes the probability of each feature belonging to each class for any given shape instance. The probabilities are then combined by the algorithm to discover the class of the instance. The algorithm is reported to exhibit acceptable recognition rate with 50 examples per shape class and stabilize around 100 examples per class.

The system has two versions, non-trainable and trainable algorithms with the non-trainable algorithm having a slightly higher recognition rate. However, its flexibility for introducing new shape classes makes it a plausible choice.

The reported recognition rates of 95% and 97% are impressive. It should also be noted that the system also recognizes shapes drawn with dotted/dashed and overtraced strokes along with normally drawn strokes. The recognition system is also scale and rotation invariant.