Recently face recognition has made significantly progress due to the advancement of large scale Deep Convolutional Neural Network (DeepCNNs). Despite the great success, the known deficiencies of DeepCNNs have not been addressed, such as the need for too much labeled training data, energy hungry, lack of theoretical interpretability, lack of robustness to image transformations and degradations, and vulnerable to attacks, which limits DeepCNNs to be used in many real world applications. Therefore, these factors make previous predominating Local Binary Patterns (LBP) based face recognition methods still irreplaceable. In this paper we propose a novel approach called BIRD (learning Binary and Illumination Robust Descriptor) for face representation, which nicely balances the three criteria: distinctiveness, robustness, and computationally inexpensive cost. We propose to learn discriminative and compact binary codes directly from six types of Pixel Difference Vectors (PDVs). For each type of binary codes, we cluster and pool these compact binary codes to obtain a histogram representation of each face image. Six global histograms derived from six types of learned compact binary codes are fused for the final face recognition. Experimental results on the CAS_PERL_R1 and LFW databases indicate the performance of our BIRD surpasses all previous binary based face recognition methods on the two evaluated datasets. More impressively, the proposed BIRD is shown to be highly robust to illumination changes, and produces 89.5% on the CAS_PEAL_R1 illumination subset, which, we believe, is so far the best reported results on this dataset. Our code is made available.