Ink features for Diagram Recognition
Patel et al
http://srl.csdl.tamu.edu/courses/SR2008/papers/others/Plimmer.pdf
Summary:
The paper presents a formal statistical analysis of features used for distinguishing texts and shapes in sketch recognition problems. These features are mostly the features used in the low-level stages of sketch recognition systems. The authors argue that the geometrical features used in bottom-up and top-down recognition approaches are mostly selected empricially, and no formal study has been done on them.
A large set of 46 features have been chosen among related work in sketch recognition, the features the authors thought might be useful and the features from newly available hardware such as pressure sensitive Tablet PC Screens. These features are grouped into seven categories: size, time, intersections, curvature, pressure, operating system recognition values and inter-stroke gaps.
An example data set is collected from 26 people, each drawing 26 sketches. A total of 1519 strokes are extracted from this data set for testing the 46 features. The authors construct a binary classification tree using R statistical tree to determine the most useful features from their initial selection. It is found that the classification tree consists of only 8 different features: Time till next stroke, speed till next stroke, distance from last stroke, distance to next stroke, bounding box width, perimeter to area, amounf ot ink inside and total angle.
The constructed classification tree is then compared with a Microsoft implementation of text/shape divider and another implementation named InkKit. In tests done using the training data, Microsoft divider gives the worst misclassification rate for shapes and zero misclassification rate for texts. The newly constructed divider outperforms the other 2 dividers in shape classification (10.8% misclassification compared to Microsoft's 75.7% and InkKit's 67.4%) and outperforms inkkit in text classification (8.8% misclassification compared to InkKit's 10.3%, with Microsoft 0%). In tests done using a new diagram set, the new divider still outperforms the other two in shape classification (42.1% misclassification compared to Microsoft's 93.1% and InkKit's 80.8%) but gives the worst result in text classification (21.4% misclassification compared to Microsoft's 1.4% and InkKit's 17.2%). It is seen that text classification gives better results in all three dividers.
The rest of the paper discusses some of the features selected for classification and possible optimizations involving the unused features from the initial set. The authors also indicate that while the features may be useful for distinguishing text and shape sketches, they may not be suitable for use in high level sketch recognition algorithms.
Discussion:
Being a comprehensive study on low-level features of strokes in sketch recognition, the paper gives formal reasons over which features could be selected and which not. It is always desirable to have numbers and percentages in hand while making decisions instead of intuition. However, most of the features selected from the large set for classification are not scale and/or rotation invariant. This raises the question whether the final features selected for the classification of texts and shapes could be used in a broad range of applciations.
Moreover, while the divider constructed gives better results in shape recognition compared to Microsoft and InkKit classifiers, it does not improve the text recognition rate of the other classifiers substantially. It is also somewhat interesting to note that Microsoft classifier gives a misclassification rate of 75.7% for shapes and 0% for texts, which suggests that it is heavily biased towards classifying strokes into texts.
Hiç yorum yok:
Yorum Gönder