Understanding how humans move their eyes to gather visual information is a central question in neuroscience, cognitive science, and vision research with many applications in computer vision, robotics, design and more.
While recent deep learning models achieve state-of-the-art performance in predicting the human scanpaths, the sequence of fixations in which humans explore an image, their underlying decision processes remain opaque. At an opposite end of the modeling spectrum, cognitively inspired mechanistic models aim to explain scanpath behavior through interpretable cognitive mechanisms but lag far behind in predictive accuracy. In this work, we bridge this gap by using a high-performing deep model—DeepGaze III—to discover and test mechanisms that improve a leading mechanistic model, SceneWalk. By identifying individual fixations where DeepGaze III succeeds and SceneWalk fails, we isolate behaviorally meaningful discrepancies and use them to motivate targeted extensions of the mechanistic framework. These include time-dependent temperature scaling, saccadic momentum and an adaptive cardinal attention bias: Simple, interpretable additions that substantially boost predictive performance. With these extensions, the gap between SceneWalk's prediction performance and the deep learning model is reduced to half the size. Our findings show how performance-optimized neural networks can serve as tools for cognitive model discovery, offering a new path toward interpretable and high-performing models of visual behavior.
Ort: C10 | 9.01
Uhrzeit: 16.15 Uhr