Browsing by Autor "Mateo Dulce Rubio"

Now showing 1 - 5 of 5

A Fair Allocation Algorithm for Predictive Police Patrolling
(2021) Isabella Rodas Arango; Mateo Dulce Rubio; Álvaro Riascos
We address the tradeoff of developing good predictive models for police allocation vs. optimally deploying police officers over a city in a way that does not imply an unfair allocation of resources. We modify the fair allocation algorithm of [1] to tackle a real world problem: crime in the city of Bogota, Colombia. Our approach allows for more sophisticated prediction models and we ´ show that the whole methodology outperforms the current police allocating mechanism in the city. Results show that even with a simple model such as a Kernel Density Estimation of crime, one can have much better prediction than the current police model and, at the same time, mitigate fairness concerns. Although we can not provide general performance guarantees, our results apply to a real life problem and should be seriously considered by policy makers.
A Manifold Learning Data Enrichment Methodology for Homicide Prediction
(2020) Juan S. Moreno Pabón; Mateo Dulce Rubio; Yor Castaño; Álvaro Riascos; Paula Rodríguez Díaz
Not all types of crime have the same priority in the agendas of policymakers since society tends to be more reluctant to more violent and costly crimes such as homicide. However, relative to other types of crime, homicides are statistically more challenging due to its sparsity and low frequency. For instance, over the last five years the average number of homicides across the city of Bogota has been roughly a thousand events per year, compared to the more than one hundred thousand robberies reported in the same period. Nevertheless, more than 80% of the homicides in the city occur during street fights suggesting a strong spatial and temporal correlation between these two types of crime. With this in mind, we used a manifold learning approach that capitalizes on a rich dataset of street fights to discover a criminal manifold that we use to penalize a KDE model of homicides where sparsity and low frequency is an issue. To implement this we follow a Kernel Warping methodology (Zhou & Matteson, 2015). The methodology reduces the relevant space for homicide prediction to regions of the city where homicides or street fights have occurred, giving more weight to the homicide episodes. We also introduce a temporal decay component to place a larger importance to recent events. The proposed model outperforms a standard KDE trained with homicide data, a KDE trained in both homicide and street fights data for homicide prediction, and a standard self-exciting point process on homicide data: flagging just the 5% of the area of the city with the highest estimated density, the Kernel Warping model correctly identifies between 30% and 35% of the homicides in the test set. 11 Results of the project “Diseño y validación de modelos de analítica predictiva de fenómenos de seguridad y convivencia para la toma de decisiones en Bogotá” funded by Colciencias with resources from the Sistema General de Regalías, BPIN 2016000100036. The opinions expressed are solely those of the authors.
Modelling underreported Spatio-temporal Crime Events
(European Organization for Nuclear Research, 2023) Álvaro Riascos; Jose Sebastian Ñungo; Lucas Gómez Tobón; Mateo Dulce Rubio; Francisco Gómez Gómez
The code needed to replicate our work is available in our GitHub Repository Description of the files distance_1000.csv: is a data frame with 5000 rows and 3 columns. Each row is a time step of the algorithms, and reports the euclidean distance between the vector with the real crime rate in each cell and the estimation made by the algorithm. The exercise was performed in the case of 1,000 arms and at most 100 super arms. This file is created in the times.py script of our repository. distance_10000.csv: is a data frame with 5000 rows and 3 columns. Each row is a time step of the algorithms, and reports the euclidean distance between the vector with the real crime rate in each cell and the estimation made by the algorithm. The exercise was performed in the case of 10,000 arms and at most 1,000 super arms. This file is created in the times.py script of our repository. distance_50000.csv: is a data frame with 5000 rows and 3 columns. Each row is a time step of the algorithms, and reports the euclidean distance between the vector with the real crime rate in each cell and the estimation made by the algorithm. The exercise was performed in the case of 50,000 arms and at most 5,000 super arms. This file is created in the times.py script of our repository. grilla_bogota.csv: is a data frame with 1638 rows and 5 columns in which each row described one grid of Bogotá. The difference between this file and grilla_bogota2.csv is that this file is used to plot Figure 9 which includes the rural area of the city. Something that is removed in our analysis due to the low density of crime in this zone. This file is created in the 3_create_grid.ipynb script of our repository. grilla_bogota2.csv: is a data frame with 1008 rows and 10 columns in which each row described one grid of Bogotá. This file is more complete than grilla_bogota.csv because it includes the name of the Localidad in which the centroid of the cell belongs and its Rep. Rate. However, this file does not contain the rural area of the city. This file is created in the 3_create_grid.ipynb script of our repository. localidades.zip: this zipped folder contains the shapefiles to draw the map of Bogotá with its respective administrative limits. The information contained herein is of a public nature and can also be found on the government's open data page. matriz_eventos_real.csv: is a matrix of 498 rows and 368 columns in which each row represents one cell of Bogota's grid and each column represents the number of real crimes for each date. Recall that we assume that the total of crimes is the combination of NUSE and SIEDCO crimes after the removal of duplicates. This file is created in the 3_create_grid.ipynb script of our repository. matriz_eventos_subreporte.csv: is a matrix of 498 rows and 368 columns in which each row represents one cell of Bogota's grid and each column represents the number of subreported crimes for each date. Recall that we assume that the number of sub-reported crimes is the number of crimes reported in NUSE. This file is created in the 3_create_grid.ipynb script of our repository. subreporte_ccb.csv: is a data frame of 498 rows and 4 columns that describe the Rep. Rate and lambda for each cell of Bogota's grid. This file is created in the 3_create_grid.ipynb script of our repository. upla.zip: this zipped folder contains other extra shapefiles to draw the map of Bogotá with its respective administrative limits. The information contained herein is of a public nature and can also be found on the government's open data page. victimización.xlsx: is an Excel file with 20 rows and 4 columns that contains the Vict. Rate and the Rep. Rate for each Localidad of Bogotá. This information comes from survey-based victimization and victim crime reporting rates presented by Bogotá’s Chamber of commerce (2014).
Modelling underreported Spatio-temporal Crime Events
(European Organization for Nuclear Research, 2023) Álvaro Riascos; Jose Sebastian Ñungo; Lucas Gómez Tobón; Mateo Dulce Rubio; Francisco Gómez Gómez
Crime observations are one of the principal inputs used by governments for designing citizens' security strategies. However, crime measurements are obscured by underreporting biases, resulting in the so-called "dark figure of crime". Current approaches for estimating the "true" crime rate do not account for underreporting temporal crime dynamics. This work studies the possibility of recovering "true" crime incident rates over time using data from underreported crime observations and complementary crime-related measurements acquired online. For this, a novel underreporting model of spatiotemporal events based on the combinatorial multi-armed bandit framework was proposed. Through extensive simulations, the proposed methodology was validated for identifying the fundamental parameters of the proposed model: the "true" rates of incidence and underreporting of events. Once the proposed model was validated, crime data from a large city, Bogotá (Colombia), was used to estimate the "true" crime and underreporting rates. Our results suggest that this methodology could be used to rapidly estimate the underreporting rates of spatiotemporal events, which is a critical problem in public policy design.
SESGO: Spanish Evaluation of Stereotypical Generative Outputs
(2025) Melissa Robles; C. Bernal Bellido; Denniss Raigoso; Mateo Dulce Rubio
This paper addresses the critical gap in evaluating bias in multilingual Large Language Models (LLMs), with a specific focus on Spanish language within culturally-aware Latin American contexts. Despite widespread global deployment, current evaluations remain predominantly US-English-centric, leaving potential harms in other linguistic and cultural contexts largely underexamined. We introduce a novel, culturally-grounded framework for detecting social biases in instruction-tuned LLMs. Our approach adapts the underspecified question methodology from the BBQ dataset by incorporating culturally-specific expressions and sayings that encode regional stereotypes across four social categories: gender, race, socioeconomic class, and national origin. Using more than 4,000 prompts, we propose a new metric that combines accuracy with the direction of error to effectively balance model performance and bias alignment in both ambiguous and disambiguated contexts. To our knowledge, our work presents the first systematic evaluation examining how leading commercial LLMs respond to culturally specific bias in the Spanish language, revealing varying patterns of bias manifestation across state-of-the-art models. We also contribute evidence that bias mitigation techniques optimized for English do not effectively transfer to Spanish tasks, and that bias patterns remain largely consistent across different sampling temperatures. Our modular framework offers a natural extension to new stereotypes, bias categories, or languages and cultural contexts, representing a significant step toward more equitable and culturally-aware evaluation of AI systems in the diverse linguistic environments where they operate.