L2C: Describing Visual Differences Needs Semantic Understanding of Individuals

An Yan; Xin Wang; Tsu-Jui Fu; William Yang Wang

doi:10.18653/v1/2021.eacl-main.196

L2C: Describing Visual Differences Needs Semantic Understanding of Individuals

dc.contributor.author	An Yan
dc.contributor.author	Xin Wang
dc.contributor.author	Tsu-Jui Fu
dc.contributor.author	William Yang Wang
dc.coverage.spatial	Bolivia
dc.date.accessioned	2026-03-22T20:42:34Z
dc.date.available	2026-03-22T20:42:34Z
dc.date.issued	2021
dc.description	Citaciones: 1
dc.description.abstract	Recent advances in language and vision push forward the research of captioning a single image to describing visual differences between image pairs. Suppose there are two images, I 1 and I 2 , and the task is to generate a description W 1,2 comparing them, existing methods directly model I 1 , I 2 W 1,2 mapping without the semantic understanding of individuals. In this paper, we introduce a Learningto-Compare (L2C) model, which learns to understand the semantic structures of these two images and compare them while learning to describe each one. We demonstrate that L2C benefits from a comparison between explicit semantic representations and singleimage captions, and generalizes better on the new testing image pairs. It outperforms the baseline on both automatic evaluation and human evaluation for the Birds-to-Words dataset.
dc.identifier.doi	10.18653/v1/2021.eacl-main.196
dc.identifier.uri	https://doi.org/10.18653/v1/2021.eacl-main.196
dc.identifier.uri	https://andeanlibrary.org/handle/123456789/83609
dc.language.iso	en
dc.source	Universidad Cristiana de Bolivia
dc.subject	Closed captioning
dc.subject	Computer science
dc.subject	Task (project management)
dc.subject	Image (mathematics)
dc.subject	Baseline (sea)
dc.subject	Natural language processing
dc.subject	Artificial intelligence
dc.subject	Machine learning
dc.subject	Information retrieval
dc.title	L2C: Describing Visual Differences Needs Semantic Understanding of Individuals
dc.type	preprint

Collections

Artículo Científico (Preprint)

L2C: Describing Visual Differences Needs Semantic Understanding of Individuals

Files

Collections