Development of Hourly Resolution Air Temperature Across Titicaca Lake on Auxiliary ERA5 Variables and Machine Learning-Based Gap-Filling
| dc.contributor.author | J. W. Sirpa-Poma | |
| dc.contributor.author | Juan Marcos Calle | |
| dc.contributor.author | Elvis Uscamayta-Ferrano | |
| dc.contributor.author | Jorge Molina‐Carpio | |
| dc.contributor.author | Frédéric Satgé | |
| dc.contributor.author | Osmar Cuentas Toledo | |
| dc.contributor.author | Ricardo Duran | |
| dc.contributor.author | Paula Pacheco Mollinedo | |
| dc.contributor.author | Rizuana Iqbal Hussain | |
| dc.contributor.author | Ramiro Pillco Zolá | |
| dc.coverage.spatial | Bolivia | |
| dc.date.accessioned | 2026-03-22T19:50:51Z | |
| dc.date.available | 2026-03-22T19:50:51Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | This article presents an innovative procedure that combines advanced quality control (QC) methods with machine learning (ML) techniques to produce reliable, continuous, high-resolution meteorological data. The approach was applied to hourly air temperature records from six automatic weather stations located around Lake Titicaca in the Altiplano region of South America. The raw dataset contained time gaps, inconsistencies, and outliers. To address these, the QC stage employed Interquartile Range, Biweight, and Local Outlier Factor (LOF) statistics, resulting in a clean dataset. Two gap-filling methods were implemented: a spatial approach using time series from nearby stations and a temporal approach based on each station's time series and selected variables from the ERA5-Land reanalysis. Several ML models were also employed in this process: Random Forest (RF), Support Vector Machine (SVM), Stacking (STACK), and AdaBoost (ADA). Model performance was evaluated on a validation subset (30% of station data). The RF model achieved the best results, with R<sup>2</sup> values up to 0.9 and Root Mean Square Error (RMSE) below 1.5 °C. The spatial approach performed best when stations were strongly correlated, while the temporal approach was more suitable for locations with low inter-station correlation and high local variability. Overall, the procedure substantially improved data reliability and completeness, and it can be extended to other meteorological variables. | |
| dc.identifier.doi | 10.3390/s25237165 | |
| dc.identifier.uri | https://doi.org/10.3390/s25237165 | |
| dc.identifier.uri | https://andeanlibrary.org/handle/123456789/78474 | |
| dc.language.iso | en | |
| dc.publisher | Multidisciplinary Digital Publishing Institute | |
| dc.relation.ispartof | Sensors | |
| dc.source | Universidad Mayor de San Andrés | |
| dc.subject | Environmental science | |
| dc.subject | Outlier | |
| dc.subject | Random forest | |
| dc.subject | Air temperature | |
| dc.subject | Terrain | |
| dc.subject | Mean squared error | |
| dc.subject | Temporal resolution | |
| dc.subject | Support vector machine | |
| dc.subject | Weather station | |
| dc.subject | Meteorology | |
| dc.title | Development of Hourly Resolution Air Temperature Across Titicaca Lake on Auxiliary ERA5 Variables and Machine Learning-Based Gap-Filling | |
| dc.type | article |