L2C: Describing Visual Differences Needs Semantic Understanding of Individuals

dc.contributor.authorAn Yan
dc.contributor.authorXin Wang
dc.contributor.authorTsu-Jui Fu
dc.contributor.authorWilliam Yang Wang
dc.coverage.spatialBolivia
dc.date.accessioned2026-03-22T20:42:34Z
dc.date.available2026-03-22T20:42:34Z
dc.date.issued2021
dc.descriptionCitaciones: 1
dc.description.abstractRecent advances in language and vision push forward the research of captioning a single image to describing visual differences between image pairs. Suppose there are two images, I 1 and I 2 , and the task is to generate a description W 1,2 comparing them, existing methods directly model I 1 , I 2 W 1,2 mapping without the semantic understanding of individuals. In this paper, we introduce a Learningto-Compare (L2C) model, which learns to understand the semantic structures of these two images and compare them while learning to describe each one. We demonstrate that L2C benefits from a comparison between explicit semantic representations and singleimage captions, and generalizes better on the new testing image pairs. It outperforms the baseline on both automatic evaluation and human evaluation for the Birds-to-Words dataset.
dc.identifier.doi10.18653/v1/2021.eacl-main.196
dc.identifier.urihttps://doi.org/10.18653/v1/2021.eacl-main.196
dc.identifier.urihttps://andeanlibrary.org/handle/123456789/83609
dc.language.isoen
dc.sourceUniversidad Cristiana de Bolivia
dc.subjectClosed captioning
dc.subjectComputer science
dc.subjectTask (project management)
dc.subjectImage (mathematics)
dc.subjectBaseline (sea)
dc.subjectNatural language processing
dc.subjectArtificial intelligence
dc.subjectMachine learning
dc.subjectInformation retrieval
dc.titleL2C: Describing Visual Differences Needs Semantic Understanding of Individuals
dc.typepreprint

Files