We introduce a dataset containing human-authored descriptions of target locations in an "end-of-trip in a taxi ride" scenario. We describe our data collection method and a novel annotation scheme that supports understanding of such descriptions of target locations. Our dataset contains target location descriptions for both synthetic and real-world images as well as visual annotations (ground truth labels, dimensions of vehicles and objects, coordinates of the target location, distance, and direction of the target location from vehicles and objects) that can be used in various visual and language tasks. We also perform a pilot experiment on how the corpus could be applied to visual reference resolution in this domain.
Authors
Ramesh Manuvinakurike
Kallirroi Georgila
Related Content
Sparse DNNs with Improved Adversarial Robustness
Deep neural networks (DNNs) are computationally/memory-intensive and vulnerable to adversarial attacks, making them prohibitive in some real-world applications. By converting....
Deep Learning under Privileged Information Using Heteroscedastic Dropout
Unlike machines, humans learn through rapid, abstract model-building. The role of a teacher is not simply to hammer home right....
On Offline Evaluation of Vision-based Driving Models
Autonomous driving models should ideally be evaluated by deploying them on a fleet of physical vehicles in the real world....
Out-of-Distribution Detection Using an Ensemble of Self Supervised....
As deep learning methods form a critical part in commercially important applications such as autonomous driving and medical diagnostics, it....