In this paper, we propose an improved text recognition method by considering the local correlation of the character region. Fractal theory indicates that most images have self-similarity properties including scene text images. The recent methods always extract the features of word region through a Convolution Neural Network(CNN) which uses fixed kernels. The self-similarity of the image is not fully used. In our paper, we propose Local Correlation(LC) layer which represents the self-similarity of text image by considering the local correlation of the character region. This layer weight the input by computing the correlation. This mechanism not only brings significant improvement of recognition results but also can be easy to embed in other recognition architectures. After we embed this layer in scene text recognition architecture, the experiment shows that the proposed model gains better representations of the scene images and achieves the state-of-the-art results on several benchmark datasets including IIIT-5K, SVT, CUTE80, SVT-Perspective and ICDAR.
Supplementary material (ZIP)