Zero-shot learning (ZSL) aims to recognize unseen categories by associating image features with semantic embeddings of class labels and its performance can be improved progressively through learning better features and more generalized visual-semantic mapping (V-S mapping) to unseen classes. Current methods typically learn feature extractors and V-S mapping independently. In this work, we propose a simple but effective joint learning framework with fused autoencoder (AE) paradigm, which can simultaneously learn features specific to ZSL task as well as V-S mapping inseparable to learning features. In particular, the encoder in AE can not only transfer semantic knowledge to the feature space, but also achieve semantics-guided attended feature learning. At the same time, the decoder in AE can be used as a V-S mapping, which further improves the generalization ability to unseen classes. Extensive experiments show that the proposed approach can achieve promising results.