I lead the Multimodal AI team at Qualcomm AI Research, advancing the next generation of on-device AI systems. Previously, I was a Professor and ARC DECRA Fellow at Monash University. My research spans computer vision, generative AI, and multimodal learning, with contributions that bridge fundamental research and real-world applications.
PhD in Computer Science, 2015
The University of Western Austraia
Masters in Space Science, 2011
Luleå Tekniska Universitet
BSc in Engineering, 2009
National University of Sciences & Technology
We propose Restormer, an efficient Transformer architecture for high-resolution image restoration that captures long-range pixel interactions while remaining computationally tractable. Our model achieves state-of-the-art results across multiple restoration tasks including deraining, motion deblurring, defocus deblurring, and image denoising.
We address domain generalized semantic segmentation through Semantic-Aware Normalization (SAN) and Semantic-Aware Whitening (SAW) modules. Our framework promotes both intra-category compactness and inter-category separability, achieving significant improvements over state-of-the-art methods on widely-used datasets including GTAV, SYNTHIA, and Cityscapes.
We systematically study the robustness properties of Vision Transformers, revealing their remarkable resilience to occlusions, perturbations, and domain shifts. ViTs demonstrate significantly less texture bias than CNNs, achieve human-level shape recognition capabilities, and enable accurate semantic segmentation without pixel-level supervision through flexible self-attention mechanisms.
A comprehensive survey of transformer models in computer vision, covering fundamental concepts of self-attention and self-supervision. We review extensive applications across recognition, generative modeling, multi-modal tasks, video processing, low-level vision, and 3D analysis, providing insights into architectural designs and future research directions.
We present MIRNet, a novel architecture that maintains spatially-precise high-resolution representations while receiving strong contextual information from low-resolution streams. Our multi-scale residual blocks with parallel convolution streams, information exchange, and attention mechanisms achieve state-of-the-art results on image denoising, super-resolution, and enhancement tasks.
We propose a self-supervised adversarial training mechanism in the input space that provides generalizable defense against adversarial attacks. Our plug-and-play approach significantly improves robustness across classification, segmentation, and detection tasks, reducing attack success rates from 82.6% to 31.9%.
We present CycleISP, a framework that models camera imaging pipelines in forward and reverse directions to generate realistic synthetic training data for image denoising. Our approach achieves state-of-the-art performance on real camera benchmarks with 5× fewer parameters than previous methods, and generalizes beyond denoising to tasks like color matching.
We propose RPS-Net, a random path selection algorithm for continual learning that progressively chooses optimal paths for new tasks while encouraging parameter sharing. Integrated with knowledge distillation and a dynamic plasticity controller, our approach surpasses state-of-the-art incremental learning performance while running in constant time.
We propose using deep image restoration networks as a defense against adversarial attacks by bringing off-manifold adversarial samples back onto the natural image manifold. Our approach simultaneously provides robustness, enhances image quality, and maintains performance on clean images without requiring model training, parameter optimization, or adversarial image detection.
We propose a cost-sensitive deep neural network that automatically learns robust feature representations for both majority and minority classes in imbalanced datasets. Our approach jointly optimizes class-dependent costs and network parameters, significantly outperforming baseline algorithms and data sampling techniques without altering the original data distribution.