In recent years, siamese networks have shown to be useful for visual tracking with high accuracy and real-time speed. However, since the networks only use the output of the last convolution layer, low-level feature maps which provide important spatial details for visual tracking are ignored. In this paper, we propose bilinear siamese networks for visual object tracking to take into account both high- and low-level feature maps. To effectively incorporate feature maps extracted from multiple layers, we adopt factorized bilinear pooling into our network. Also, we introduce a novel background suppression module to reduce the background interference. This module collects negative feature maps for the background in the first frame and suppresses the background information during tracking. Therefore, the module makes the tracker more robust to the background interference. Experimental results on the OTB-50 and OTB-100 benchmarks demonstrate that the proposed tracker has comparable performance with that of the state-of-the-art trackers while running in real-time.