24 Şubat 2009 Salı

Visual Recognition of Sketched Symbols

Tom Y Ouyang, Randall Davis

Summary:

The paper presents a symbol recognition system. The system works on isolated symbols and does low level classification (ie. no context info is needed). It is different from the works in the literature in the way the classification is done. Many other systems use stroke information while this system, in contrast, is based on visual appearance of the symbols.

Symbol Normalization:

The symbols are first normalized by resamping the points of the strokes such that they are evenly separated. Then the symbols are translated such that their center of mass is at the origin and scaled such that it has unit standard deviation in both horizontal and vertical axis.

Feature extraction:

In feature extraction, the normalized symbols are converted to low-level feature images. 5 features are calculated for each point in the input image as follows:

4 orientation features are calculated for 0, 45, 90 and 135 degrees. The feature values are calculated as the difference between the stroke angle and the reference angle, and vary linearly between 1.0 (if the two are equal) and 0.0 (if they differ by more than 22.5 degrees).
The other feature image consists of the endpoints of the stroke. The pixels corresponding to the start and end points of the stroke are given value 1 while others set to zero.
At the end, 5 feature images are produced for each symbol. These feature images are rendered on a 24x24 grid and the images are then smoothed and down-sampled by a factor of 2, producing 5 of 12x12 images.

Recognition:

In recognition, nearest neighbor template matching is used in combination with image deformation model (IDM). Sections of 3x3 of the input image are shifted within 3x3 local windows of the feature images of the example set and a distance metric is calculated for the whole images, where the minimum difference between pixel values are chosen within the local window and the squares of these values are summed.

This approach produces 720 features (5x12x12). To reduce complexity, the images are instead indexed using the first N(=128) principle components. On these components, the sum of square of the differences between nth principle components of input and template images are calculated.

Since an input image must be checked against all of the templates, this calculation is still time consuming. For this reason, the training examples are hierarchically clustered using Euclidean distance measures. This process starts with each example in its own cluster, then progressively merges the two nearest clusters based on Euclidean distance. This continues until there is only one cluster per symbol class. The result of this clustering process is a tree-like structure with the largest clusters at the top and progressively smaller sub-clusters below as one traverses down each branch. When a comparison is to be done, the search starts from the top and traverses down until a cluster is matched. This tree like structure is also not sufficient for high performance. For that reason, a coarse-to-fine ranking search is applied where a coarse search returns N best matches (N=10), and fine search finds the correct match.

Rotational invariance of the system is accomplished by generating 32 rotated templates with evenly spaced angles from 0 to 360 degrees. Rotation matching is only used in the coarse search, and the rotation angle found is reused in the remaining steps.

Finally, the recognition performance of this IDM system is compared with other methods. Among non rotation-invariant techniques, the system ranks 2nd with recognition rate of 99.2% after SVM-RBF. The complexity of the IDM is simpler than SVM-RBF however. Among rotation-invariant techniques, the system ranks best with an accuracy of 98.2% before 96.7% accuracy of Zernike Moments. Regarding runtime performance, the version of this system using hierarchical tree and re-rank algorithm ranks after SVM-RBF with average times per symbol at 8.4 ms to 2.4 ms. The average time for the full IDM (without hierarchical tree and ranking algorithms) is 3952 ms, which indicates about 480 times faster performance for hieararchical tree+ranking algorithms.

Discussion:

The recognition rates for the system are impressive. However, the system currently only works on isolated symbols. The only difference from “content based image search algorithms” are the stroke angle and end-point features of the strokes. The system is translation, rotation and scale invariant, has high performance both in recognition rate and running time.

Ink features for Diagram Recognition

Ink features for Diagram Recognition

Patel et al

http://srl.csdl.tamu.edu/courses/SR2008/papers/others/Plimmer.pdf

Summary:

The paper presents a formal statistical analysis of features used for distinguishing texts and shapes in sketch recognition problems. These features are mostly the features used in the low-level stages of sketch recognition systems. The authors argue that the geometrical features used in bottom-up and top-down recognition approaches are mostly selected empricially, and no formal study has been done on them.

A large set of 46 features have been chosen among related work in sketch recognition, the features the authors thought might be useful and the features from newly available hardware such as pressure sensitive Tablet PC Screens. These features are grouped into seven categories: size, time, intersections, curvature, pressure, operating system recognition values and inter-stroke gaps.

An example data set is collected from 26 people, each drawing 26 sketches. A total of 1519 strokes are extracted from this data set for testing the 46 features. The authors construct a binary classification tree using R statistical tree to determine the most useful features from their initial selection. It is found that the classification tree consists of only 8 different features: Time till next stroke, speed till next stroke, distance from last stroke, distance to next stroke, bounding box width, perimeter to area, amounf ot ink inside and total angle.

The constructed classification tree is then compared with a Microsoft implementation of text/shape divider and another implementation named InkKit. In tests done using the training data, Microsoft divider gives the worst misclassification rate for shapes and zero misclassification rate for texts. The newly constructed divider outperforms the other 2 dividers in shape classification (10.8% misclassification compared to Microsoft's 75.7% and InkKit's 67.4%) and outperforms inkkit in text classification (8.8% misclassification compared to InkKit's 10.3%, with Microsoft 0%). In tests done using a new diagram set, the new divider still outperforms the other two in shape classification (42.1% misclassification compared to Microsoft's 93.1% and InkKit's 80.8%) but gives the worst result in text classification (21.4% misclassification compared to Microsoft's 1.4% and InkKit's 17.2%). It is seen that text classification gives better results in all three dividers.

The rest of the paper discusses some of the features selected for classification and possible optimizations involving the unused features from the initial set. The authors also indicate that while the features may be useful for distinguishing text and shape sketches, they may not be suitable for use in high level sketch recognition algorithms.

Discussion:

Being a comprehensive study on low-level features of strokes in sketch recognition, the paper gives formal reasons over which features could be selected and which not. It is always desirable to have numbers and percentages in hand while making decisions instead of intuition. However, most of the features selected from the large set for classification are not scale and/or rotation invariant. This raises the question whether the final features selected for the classification of texts and shapes could be used in a broad range of applciations.

Moreover, while the divider constructed gives better results in shape recognition compared to Microsoft and InkKit classifiers, it does not improve the text recognition rate of the other classifiers substantially. It is also somewhat interesting to note that Microsoft classifier gives a misclassification rate of 75.7% for shapes and 0% for texts, which suggests that it is heavily biased towards classifying strokes into texts.


CALI: An Online Scribble Recognizer for Calligraphic Interfaces

CALI: An Online Scribble Recognizer for Calligraphic Interfaces
Manuel J. Fonseca, Cesar Pimentel, Joaquim A. Jorge, 2002


Summary:
In this paper, Fonseca, Pimentel and Jorge from Technical University of Lisbon describe their online scribble recognizer for calligraphic interfaces, CALI. Scribbles, as they say, are simply multi-stroke geometric shapes. There are two versions of CALI, a non-trainable gesture recognizer and a trainable one.

The recognition algorithm uses a combination of fuzzy logic, geometric features and a set of heuristics. Recognition rates of over 97% for the non-trainable version and 95% for the trainable version are reported.

The non-trainable recognizer uses three main ideas: extracting and using geometric features, using a set of filters to identify or remove unwanted shapes and using fuzzy logic to resolve uncertainities in shape sketches.

The geometric features are obtained first by computing the convex hull of the points. This convex hull is used to compute the largest area triangle and quadrilateral and the smallest area enclosing rectangle. The area and the perimeters of these polygons are used in computation of features. For example, Thinness ratio is defined as (Perimeter of complex hull)^2/(Area of complex hull) and is used to distinguish circles from other shapes, which is expected to give a value near 4PI. Likewise, lines are identified using the aspect ratio (Height of the enclosing rectangle)/(Width of the enclosing rectangle) which is expected to be near zero for lines and larger for other shapes. For each such feature and shape couple, a graph is obtained using training data. Fuzzy logic is then used to derive the fuzzy sets from this data and the degree of membership is calculated for each shape class. An example of the building of rules to classify shapes is given in the paper. Finally, possible ambiguities between shape classes are resolved using context information.

The trainable recognizer uses a Naive Bayes classifier. It is chosen among K-Nearest Neighbors and Inductive Decision Tree algorithms for its high classification efficiency and acceptable training and classification complexity. Naive Bayes algorithm computes the probability of each feature belonging to each class for any given shape instance. The probabilities are then combined by the algorithm to discover the class of the instance. The algorithm is reported to exhibit acceptable recognition rate with 50 examples per shape class and stabilize around 100 examples per class.

Discussion:
The system has two versions, non-trainable and trainable algorithms with the non-trainable algorithm having a slightly higher recognition rate. However, its flexibility for introducing new shape classes makes it a plausible choice.

The reported recognition rates of 95% and 97% are impressive. It should also be noted that the system also recognizes shapes drawn with dotted/dashed and overtraced strokes along with normally drawn strokes. The recognition system is also scale and rotation invariant.