Document Binarization using Recurrent Attention Generative Model

Shuchun Liu (ele AI Lab), Feiyun Zhang (ele AI Lab), Pan He (University of Florida), Mingxi Chen (Tongji University), Yufei Xie (East China Normal University), Jie Shao (Fudan University)

Image binarization is an elementary pre-processing step in the document image analysis and recognition pipeline. It is well-known that contextual and semantic information is beneficial to the separation of foreground text from complex background. We develop a simple general deep learning approach, by introducing a recurrent attention generative model with adversarial training. The DB-RAM model comprises three contributions: First, to suppress the interference from complex background, non-local attention blocks are incorporated to capture spatial long-range dependencies. Second, we explore the use of Spatial Recurrent Neural Networks (SRNNs) to pass spatially varying contextual information across an image, which leverages the prior knowledge of text orientation and semantics. Third, to validate the effectiveness of our proposed method, we further synthetically generate two comprehensive subtitle datasets that cover various real-world conditions. Evaluated on various standard benchmarks, our proposed method significantly outperforms state-of-the-art binarization methods both quantitatively and qualitatively. Experiment results show that the proposed method can also improve the recognition rate. Moreover, the proposed method performs well in the task of image unshadowing, which evidently verifies its generality.


Paper (PDF)

title={Document Binarization using Recurrent Attention Generative Model},
author={Shuchun Liu and Feiyun Zhang and Pan He and Mingxi Chen and Yufei Xie and Jie Shao},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Kirill Sidorov and Yulia Hicks},