Background subtraction is an active research topic due to its great utility on many video analysis applications. In this work, a new approach for background subtraction employing an end-to-end deep learning architecture is proposed. The proposed architecture consists in two nested networks that are trained together. The first one extracts the background model features of the scene from a small group of frames. The second performs the subtraction operation given the previous features and a target frame. In contrast to most of the recent deep learning proposals, our trained model can be used on any scene without the need of being retrained. The method has been trained and evaluated using the public CDnet2014 database following a scene-wise cross-validation approach. The obtained results show a competitive performance of the proposed method on background subtraction, proving its ability to extrapolate to unseen scenes.