Browsing by Autor "Melissa Robles"

Now showing 1 - 4 of 4

Improving Low-Resource Translation with Dictionary-Guided Fine-Tuning and RL: A Spanish-to-Wayuunaiki Study
(2026) Association for Artificial Intelligence 2026; Rubén Manrique; Manuel Mosquera; Johan Portela; Melissa Robles
Low-resource machine translation remains a significant challenge for large language models (LLMs), which often lack exposure to these languages during pretraining and have limited parallel data for fine-tuning. We propose a novel approach that enhances translation for low-resource languages by integrating an external dictionary tool and training models end-to-end using reinforcement learning, in addition to supervised fine-tuning. Focusing on the Spanish–Wayuunaiki language pair, we frame translation as a tool-augmented decision-making problem in which the model can selectively consult a bilingual dictionary during generation. Our method combines supervised instruction tuning with Group Relative Policy Optimization (GRPO), enabling the model to learn both when and how to use the tool effectively. BLEU similarity scores are used as rewards to guide this learning process. Preliminary results show that our tool-augmented models achieve up to +3.37 BLEU improvement over previous work, and a 18\% relative gain compared to a supervised baseline without dictionary access, on the Spanish–Wayuunaiki test set from the AmericasNLP 2025 Shared Task. We also conduct ablation studies to assess the effects of model architecture and training strategy, comparing Qwen2.5-0.5B-Instruct with other models such as LLaMA and a prior NLLB-based system. These findings highlight the promise of combining LLMs with external tools and the role of reinforcement learning in improving translation quality in low-resource language settings.
New algorithms for unsupervised cell clustering from scRNA-seq data
(2024) Melissa Robles; Jorge Díaz-Riaño; Cristhian Forigua; Soledad Ojeda; Laura Guio; Paula Siaucho; Jennifer J Guzmán-Porras; Danilo García-Orjuela; Andrés Naranjo; Silvia Maradei
Abstract The identification of cell types is a basic step of the pipeline for Single-Cell RNA sequencing data analysis. However, unsupervised clustering of cells from scRNA-seq data has multiple challenges: the high dimensional nature of the data, the sparse nature of the gene expression matrix, and the presence of technical noise that can introduce false zero entries. In this study, we introduce new algorithms for clustering scRNA-seq data. The first algorithm builds a k -MST graph from distances obtained directly from the input data without dimensionality reduction. The computation follows an iterative procedure of k steps in which each step calculates and stores the edges of minimum spanning trees over different subgraphs obtained removing edges selected in previous iterations. The Louvain algorithm is executed on the k -MST graph for cell clustering. We also explored alternatives based on neural networks in which an autoencoder is used to learn the parameters of a Gaussian mixture model, aiming to improve the handling of clusters with different shapes and sizes. Benchmark experiments with simulated data and public datasets show that the algorithms proposed in this work have competitive accuracy, compared to previous solutions, but also that sequencing depth, number of cells and tissue types have important effects on the performance of the algorithms. Moreover, we performed further experiments with scRNA-data taken from a patient with refractory epilepsy. The AE-GMM model achieved the best accuracy for this dataset, and the k -MST ranked first among methods that do not require previous information on the expected number of clusters.
PSEUDOFINITENESS AND MEASURABILITY OF THE EVERYWHERE INFINITE FOREST
(Cambridge University Press, 2025) Darío García; Melissa Robles
Abstract In this article we study the theories of the infinite-branching tree and the r -regular tree, and show that both of them are pseudofinite. Moreover, we show that they can be realized by infinite ultraproducts of polynomial exact classes of graphs, and provide a characterization of the Morley rank of definable sets in terms of the degrees of polynomials measuring their non-standard cardinalities. This answers negatively some questions from [2], where it is asked whether every stable generalised measurable structure is one-based.
SESGO: Spanish Evaluation of Stereotypical Generative Outputs
(2025) Melissa Robles; C. Bernal Bellido; Denniss Raigoso; Mateo Dulce Rubio
This paper addresses the critical gap in evaluating bias in multilingual Large Language Models (LLMs), with a specific focus on Spanish language within culturally-aware Latin American contexts. Despite widespread global deployment, current evaluations remain predominantly US-English-centric, leaving potential harms in other linguistic and cultural contexts largely underexamined. We introduce a novel, culturally-grounded framework for detecting social biases in instruction-tuned LLMs. Our approach adapts the underspecified question methodology from the BBQ dataset by incorporating culturally-specific expressions and sayings that encode regional stereotypes across four social categories: gender, race, socioeconomic class, and national origin. Using more than 4,000 prompts, we propose a new metric that combines accuracy with the direction of error to effectively balance model performance and bias alignment in both ambiguous and disambiguated contexts. To our knowledge, our work presents the first systematic evaluation examining how leading commercial LLMs respond to culturally specific bias in the Spanish language, revealing varying patterns of bias manifestation across state-of-the-art models. We also contribute evidence that bias mitigation techniques optimized for English do not effectively transfer to Spanish tasks, and that bias patterns remain largely consistent across different sampling temperatures. Our modular framework offers a natural extension to new stereotypes, bias categories, or languages and cultural contexts, representing a significant step toward more equitable and culturally-aware evaluation of AI systems in the diverse linguistic environments where they operate.