Correspondence to Dr Victoria Fleming, Institute of Cognitive Neuroscience, University College London, London, UK; victoria.fleming{at}ucl.ac.uk Objective The efficacy of spoken language comprehension ...
Abstract: Current one-stage methods for visual grounding encode the language query as one holistic sentence embedding before fusion with visual features for target localization. Such a formulation ...