A Learning-based Text Synthesis Engine for Scene Text Detection

Xiao Yang (Pennsylvania State University), Dafang He (Pennsylva State University), Dan Kifer (Pennsylva State University), Lee Giles (Pennsylva State University)

Abstract
Scene text detection and recognition methods have recently greatly improved with the use of synthetic training data playing an important role. That being said, for text detection task the performance of a model that is trained sorely on large-scale synthetic data is significantly worse than one trained on a few real-world data samples. However, state-of-the-art performance on text recognition can be achieved by only training on synthetic data. This shows the limitations in only using large-scale synthetic data for scene text detection. In this work, we propose the first learning-based, data-driven text synthesis engine for scene text detection task. Our text synthesis engine is decomposed into two modules: 1) a \textit{location} module that learns the distribution of text locations on the image plane, and 2) an \textit{appearance} module that translates the text-inserted images to realistic-looking ones that are essentially indistinguishable from real-world scene text images. Evaluation of our created synthetic data on ICDAR 2015 Incidental Scene Text dataset~\cite{karatzas2015icdar} outperforms previous text synthesis methods.

DOI
10.5244/C.33.97
https://dx.doi.org/10.5244/C.33.97

Files
Paper (PDF)

BibTeX
@inproceedings{BMVC2019,
title={A Learning-based Text Synthesis Engine for Scene Text Detection},
author={Xiao Yang and Dafang He and Dan Kifer and Lee Giles},
year={2019},
month={September},
pages={97.1--97.12},
articleno={97},
numpages={12},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Kirill Sidorov and Yulia Hicks},
doi={10.5244/C.33.97},
url={https://dx.doi.org/10.5244/C.33.97}
}