MEDDOPROF corpus: test set

dc.contributor.authorEulàlia Farré-Maduell
dc.contributor.authorSalvador Lima López
dc.contributor.authorAntonio Miranda-Escalada
dc.contributor.authorVicent Brivá-Iglesias
dc.contributor.authorMartin Krallinger
dc.coverage.spatialBolivia
dc.date.accessioned2026-03-22T20:54:57Z
dc.date.available2026-03-22T20:54:57Z
dc.date.issued2021
dc.description.abstractThe MEDDOPROF Shared Task tackles the detection of occupations and employment statuses in clinical cases in Spanish from different specialties. Systems capable of automatically processing clinical texts are of interest to the medical community, social workers, researchers, the pharmaceutical industry, computer engineers, AI developers, policy makers, citizen’s associations and patients. Additionally, other NLP tasks (such as anonymization) can also benefit from this type of data. MEDDOPROF has three different sub-tasks: <strong>1) MEDDOPROF-NER</strong>: Participants must find the beginning and end of occupation mentions and classify them as PROFESION (PROFESSION), SITUACION_LABORAL (WORKING_STATUS) or ACTIVIDAD (ACTIVIDAD). <strong>2) MEDDOPROF-CLASS: </strong>Participants must find the beginning and end of occupation mentions and classify them according to their referent (PACIENTE [patient], FAMILIAR [family member], SANITARIO [health professional] or OTRO [other]). <strong>3) MEDDOPROF-NORM</strong>: Participants must find the beginning and end of occupation mentions and normalize them according to a reference codes list. This repository hosts the 344 files that make up the test set. ONLY the text files to be used for the evaluation phase have been uploaded. The Gold Standard annotations will be included once the evaluation phase is over. <strong>Resources:</strong> - Web - Training Data - Codes Reference List (for MEDDOPROF-NORM) - Annotation Guidelines MEDDOPROF is part of the IberLEF 2021 workshop, which is co-located with the SEPLN 2021 conference. For further information, please visit https://temu.bsc.es/meddoprof/ or email us at encargo-pln-life@bsc.es MEDDOPROF is promoted by the Plan de Impulso de las Tecnologías del Lenguaje de la Agenda Digital (Plan TL).
dc.identifier.doi10.5281/zenodo.4889777
dc.identifier.urihttps://doi.org/10.5281/zenodo.4889777
dc.identifier.urihttps://andeanlibrary.org/handle/123456789/84829
dc.language.isoen
dc.sourceBarcelona Supercomputing Center
dc.subjectSet (abstract data type)
dc.subjectTest (biology)
dc.subjectNatural language processing
dc.subjectComputer science
dc.subjectInformation retrieval
dc.titleMEDDOPROF corpus: test set
dc.typedataset

Files

Collections