Steganography is the practice of hiding a secret message in a cover message such that the cover stays indiscernible after hiding and only the intended recipients can extract the secret from it. Traditional image steganography techniques hide the secret image into high-frequency regions of the cover images. These techniques typically result in lower embedding ratios and easy detection. In this paper, we propose VStegNET, a video steganography network that extracts spatio-temporal features using 3D-CNN and micro-bottleneck (Hourglass) which is the first of its kind in the literature of video steganography. The proposed network hides M x N (RGB) secret video frames into same sized cover video frames. We have trained our model on UCF 101 action recognition video dataset and evaluated its performance using various quantitative metrics (APD, PSNR, and SSIM) and compared it with previous the state-of-the-art. Furthermore, we have also presented a detailed analysis, supporting the proposal's superiority over image steganography models. Finally, several standard steganalysis tools like StegExpose, SRNET, etc. have been used to justify the steganographic capabilities of VStegNET.