Guided Zoom: Questioning Network Evidence for Fine-grained Classification

Sarah Bargal (Boston University), Andrea Zunino (Istituto Italiano di Tecnologia), Vitali Petsiuk (Boston University), Jianming Zhang (Adobe Research), Kate Saenko (Boston University), Vittorio Murino (Istituto Italiano di Tecnologia), Stan Sclaroff (Boston University)

Abstract
We propose Guided Zoom, an approach that utilizes spatial grounding of a model’s decision to make more informed predictions. It does so by making sure the model has “the right reasons” for a prediction, defined as reasons that are coherent with those used to make similar correct decisions at training time. The reason/evidence upon which a deep convolutional neural network makes a prediction is defined to be the spatial grounding, in the pixel space, for a specific class conditional probability in the model output. Guided Zoom examines how reasonable such evidence is for each of the top-k predicted classes, rather than solely trusting the top-1 prediction. We show that Guided Zoom improves the classification accuracy of a deep convolutional neural network model and obtains state-of-the-art results on three fine-grained classification benchmark datasets.

DOI
10.5244/C.33.7
https://dx.doi.org/10.5244/C.33.7

Files
Paper (PDF)

BibTeX
@inproceedings{BMVC2019,
title={Guided Zoom: Questioning Network Evidence for Fine-grained Classification},
author={Sarah Bargal and Andrea Zunino and Vitali Petsiuk and Jianming Zhang and Kate Saenko and Vittorio Murino and Stan Sclaroff},
year={2019},
month={September},
pages={7.1--7.13},
articleno={7},
numpages={13},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Kirill Sidorov and Yulia Hicks},
doi={10.5244/C.33.7},
url={https://dx.doi.org/10.5244/C.33.7}
}