Prof. Adrian Hilton

University of surrey

Title: 4D Vision in the Wild

Abstract: Over the past decade 3D Computer Vision has advanced from reconstruction of static scenes under controlled conditions towards full 4D spatio-temporal reconstruction and structured modelling of complex dynamic scenes. This talk will review recent advances in this field towards general 4D reconstruction and understanding of unconstrained scenes highlighting real-world challenges. High-level semantic understanding and reconstruction of dynamic scenes leveraging deep-learning will be presented. 4D dynamic understanding of scenes and people is an enabling technology for applications ranging from healthcare and security, through to immersive entertainment and autonomous robotic systems that can work safely alongside people at home or work. Examples of collaborative research to enable immersive audio-visual entertainment and monitoring of people for healthcare at home will be presented.

Bio: Adrian Hilton, BSc(hons), DPhil, CEng, FIET, is Professor of Computer Vision and Director of the Centre for Vision, Speech and Signal Processing at the University of Surrey, UK. The focus of his research is Perceptual AI enabling machines to understand and interact with the world through seeing and hearing. This combines the fields of computer vision and machine learning to develop new methods for reconstruction, modelling and understanding natural scenes from video and audio.

He is an internationally recognised expert in 3D and 4D computer vision. His research has contributed to advancing machine perception from controlled static scenes to real-world dynamic scenes and people. This is a key technology for future intelligent systems allowing human-machine interaction in robotics, healthcare, assisted living, entertainment and immersive experiences.

Adrian has successfully commercialised technologies for 3D and 4D shape capture exploited in entertainment, manufacture & health, receiving two EU IST Innovation Prizes, a Manufacturing Industry Achievement Award, a Royal Society Industry Fellowship with Framestore on Digital Doubles for Film and a Royal Society Wolfson Research Merit Award in 4D Vision. He is currently Principal Investigator on the EPSRC Programme Grant “S3A Future Spatial Audio” bringing together expertise in audio, vision and human perception to achieve immersive listener experiences at home or on the move.

Prof. Cordelia Schmid

INRIA

Title: Automatic Understanding of the Visual World

Abstract:  One of the central problems of artificial intelligence is machine perception, i.e., the ability to understand the visual world based on input from sensors such as cameras. In this talk, I will present recent progress of my team in this direction. Data plays a key role, and I will start with presenting results on how to generate additional training data using weak annotations, motion information, and synthetic data. Next, I will discuss our recent results for action recognition, where human tubes and tubelets have shown to be successful. Our tubelets move away from state-of-the-art frame based approaches and improve classification and localization by relying on joint information from several frames and the interaction with objects. Finally, I will present some recent results on robot manipulation

Bio: Cordelia Schmid holds a M.S. degree in Computer Science from the University of Karlsruhe and a Doctorate, also in Computer Science, from the Institut National Polytechnique de Grenoble (INPG). Her doctoral thesis received the best thesis award from INPG in 1996. Dr. Schmid was a post-doctoral research assistant in the Robotics Research Group of Oxford University in 1996–1997. Since 1997 she has held a permanent research position at Inria Grenoble Rhone-Alpes, where she is a research director and directs an Inria team. Dr. Schmid has been an Associate Editor for IEEE PAMI (2001–2005) and for IJCV (2004–2012), editor-in-chief for IJCV (2013—), a program chair of IEEE CVPR 2005 and ECCV 2012 as well as a general chair of IEEE CVPR 2015 and ECCV 2020. In 2006, 2014 and 2016, she was awarded the Longuet-Higgins prize for fundamental contributions in computer vision that have withstood the test of time. She is a fellow of IEEE. She was awarded an ERC advanced grant in 2013, the Humbolt research award in 2015 and the Inria & French Academy of Science Grand Prix in 2016. She was elected to the German National Academy of Sciences, Leopoldina, in 2017. In 2018 she received the Koenderink prize for fundamental contributions in computer vision that have withstood the test of time. Starting 2018 she holds a joint appointment with Google research.

Prof. Antonio Torralba

MIT

Title: Dissecting neural nets

Abstract: With the success of deep neural networks and access to image databases with millions of labeled examples, the state of the art in computer vision is advancing rapidly. Even when no examples are available, Generative Adversarial Networks (GANs) have demonstrated a remarkable ability to learn from images and are able to create nearly photorealistic images. The performance achieved by convNets and GANs is remarkable and constitute the state of the art on many tasks. But why do convNets work so well? what is the nature of the internal representation learned by a convNet in a classification task? How does a GAN represent our visual world internally? In this talk I will show that the internal representation in both convNets and GANs can be interpretable in some important cases. I will then show several applications for object recognition, computer graphics, and unsupervised learning from images and audio.

Bio: TBA