A true mentor who cares about success.
Stephen Gould is a Professor of Computer Science in the School of Computing at the Australian National University (ANU) and an Australian Research Council (ARC) Future Fellow. He also serves as Intelligent Systems Lead. Gould holds a BSc in mathematics and computer science (1994) and a BE in electrical engineering (1996) from the University of Sydney, an MS in electrical engineering (1998) from Stanford University, and a PhD in probabilistic models for region-based scene understanding (2010) from Stanford University. After completing his MS, he worked in industry and co-founded Sensory Networks, which was acquired by Intel in 2013. In November 2010, he joined ANU as a faculty member. He has held several prestigious appointments, including ARC Postdoctoral Fellow, Microsoft Faculty Fellow, Contributed Researcher at Data61, Principal Research Scientist at Amazon Inc., Director of the ARC Centre of Excellence in Robotic Vision, and Amazon Scholar.
Gould's research specializations include computer and robotic vision, machine learning, deep learning, structured prediction, and optimization. His primary focus is applying machine learning techniques, such as conditional Markov random fields, deep learning, and deep declarative networks, to geometric, semantic, and dynamic scene understanding in images and videos. He actively collaborates with industry to translate research into practical applications. Gould has authored over 100 peer-reviewed publications and regularly serves as a program committee member or reviewer for top venues including CVPR, ECCV, ICCV, ICML, NeurIPS, IEEE PAMI, IEEE TIP, IJCV, and JMLR. He has delivered numerous invited talks and tutorials, such as “Ikea Assembly and Other Experiences in Dataset Labelling” at DanaXa Industry Forum (2021), “Deep Declarative Networks” at IVCNZ (2019), structured prediction for computer vision at MLSS Sydney (2015), and inference in discrete graphical models at CVPR (2014). Key publications include “The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models” (2025), “3DInAction: Understanding Human Actions in 3D Point Clouds” (2024), “SPICE: Semantic Propositional Image Caption Evaluation” (2016), and “Interpreting Visually-Grounded Navigation Instructions in Real Environments” (2018).