24 Şubat 2009 Salı

Visual Recognition of Sketched Symbols

Tom Y Ouyang, Randall Davis

Summary:

The paper presents a symbol recognition system. The system works on isolated symbols and does low level classification (ie. no context info is needed). It is different from the works in the literature in the way the classification is done. Many other systems use stroke information while this system, in contrast, is based on visual appearance of the symbols.

Symbol Normalization:

The symbols are first normalized by resamping the points of the strokes such that they are evenly separated. Then the symbols are translated such that their center of mass is at the origin and scaled such that it has unit standard deviation in both horizontal and vertical axis.

Feature extraction:

In feature extraction, the normalized symbols are converted to low-level feature images. 5 features are calculated for each point in the input image as follows:

4 orientation features are calculated for 0, 45, 90 and 135 degrees. The feature values are calculated as the difference between the stroke angle and the reference angle, and vary linearly between 1.0 (if the two are equal) and 0.0 (if they differ by more than 22.5 degrees).
The other feature image consists of the endpoints of the stroke. The pixels corresponding to the start and end points of the stroke are given value 1 while others set to zero.
At the end, 5 feature images are produced for each symbol. These feature images are rendered on a 24x24 grid and the images are then smoothed and down-sampled by a factor of 2, producing 5 of 12x12 images.

Recognition:

In recognition, nearest neighbor template matching is used in combination with image deformation model (IDM). Sections of 3x3 of the input image are shifted within 3x3 local windows of the feature images of the example set and a distance metric is calculated for the whole images, where the minimum difference between pixel values are chosen within the local window and the squares of these values are summed.

This approach produces 720 features (5x12x12). To reduce complexity, the images are instead indexed using the first N(=128) principle components. On these components, the sum of square of the differences between nth principle components of input and template images are calculated.

Since an input image must be checked against all of the templates, this calculation is still time consuming. For this reason, the training examples are hierarchically clustered using Euclidean distance measures. This process starts with each example in its own cluster, then progressively merges the two nearest clusters based on Euclidean distance. This continues until there is only one cluster per symbol class. The result of this clustering process is a tree-like structure with the largest clusters at the top and progressively smaller sub-clusters below as one traverses down each branch. When a comparison is to be done, the search starts from the top and traverses down until a cluster is matched. This tree like structure is also not sufficient for high performance. For that reason, a coarse-to-fine ranking search is applied where a coarse search returns N best matches (N=10), and fine search finds the correct match.

Rotational invariance of the system is accomplished by generating 32 rotated templates with evenly spaced angles from 0 to 360 degrees. Rotation matching is only used in the coarse search, and the rotation angle found is reused in the remaining steps.

Finally, the recognition performance of this IDM system is compared with other methods. Among non rotation-invariant techniques, the system ranks 2nd with recognition rate of 99.2% after SVM-RBF. The complexity of the IDM is simpler than SVM-RBF however. Among rotation-invariant techniques, the system ranks best with an accuracy of 98.2% before 96.7% accuracy of Zernike Moments. Regarding runtime performance, the version of this system using hierarchical tree and re-rank algorithm ranks after SVM-RBF with average times per symbol at 8.4 ms to 2.4 ms. The average time for the full IDM (without hierarchical tree and ranking algorithms) is 3952 ms, which indicates about 480 times faster performance for hieararchical tree+ranking algorithms.

Discussion:

The recognition rates for the system are impressive. However, the system currently only works on isolated symbols. The only difference from “content based image search algorithms” are the stroke angle and end-point features of the strokes. The system is translation, rotation and scale invariant, has high performance both in recognition rate and running time.

Hiç yorum yok: