Development of Hourly Resolution Air Temperature Across Titicaca Lake on Auxiliary ERA5 Variables and Machine Learning-Based Gap-Filling

dc.contributor.authorJ. W. Sirpa-Poma
dc.contributor.authorJuan Marcos Calle
dc.contributor.authorElvis Uscamayta-Ferrano
dc.contributor.authorJorge Molina‐Carpio
dc.contributor.authorFrédéric Satgé
dc.contributor.authorOsmar Cuentas Toledo
dc.contributor.authorRicardo Duran
dc.contributor.authorPaula Pacheco Mollinedo
dc.contributor.authorRizuana Iqbal Hussain
dc.contributor.authorRamiro Pillco Zolá
dc.coverage.spatialBolivia
dc.date.accessioned2026-03-22T19:50:51Z
dc.date.available2026-03-22T19:50:51Z
dc.date.issued2025
dc.description.abstractThis article presents an innovative procedure that combines advanced quality control (QC) methods with machine learning (ML) techniques to produce reliable, continuous, high-resolution meteorological data. The approach was applied to hourly air temperature records from six automatic weather stations located around Lake Titicaca in the Altiplano region of South America. The raw dataset contained time gaps, inconsistencies, and outliers. To address these, the QC stage employed Interquartile Range, Biweight, and Local Outlier Factor (LOF) statistics, resulting in a clean dataset. Two gap-filling methods were implemented: a spatial approach using time series from nearby stations and a temporal approach based on each station's time series and selected variables from the ERA5-Land reanalysis. Several ML models were also employed in this process: Random Forest (RF), Support Vector Machine (SVM), Stacking (STACK), and AdaBoost (ADA). Model performance was evaluated on a validation subset (30% of station data). The RF model achieved the best results, with R<sup>2</sup> values up to 0.9 and Root Mean Square Error (RMSE) below 1.5 °C. The spatial approach performed best when stations were strongly correlated, while the temporal approach was more suitable for locations with low inter-station correlation and high local variability. Overall, the procedure substantially improved data reliability and completeness, and it can be extended to other meteorological variables.
dc.identifier.doi10.3390/s25237165
dc.identifier.urihttps://doi.org/10.3390/s25237165
dc.identifier.urihttps://andeanlibrary.org/handle/123456789/78474
dc.language.isoen
dc.publisherMultidisciplinary Digital Publishing Institute
dc.relation.ispartofSensors
dc.sourceUniversidad Mayor de San Andrés
dc.subjectEnvironmental science
dc.subjectOutlier
dc.subjectRandom forest
dc.subjectAir temperature
dc.subjectTerrain
dc.subjectMean squared error
dc.subjectTemporal resolution
dc.subjectSupport vector machine
dc.subjectWeather station
dc.subjectMeteorology
dc.titleDevelopment of Hourly Resolution Air Temperature Across Titicaca Lake on Auxiliary ERA5 Variables and Machine Learning-Based Gap-Filling
dc.typearticle

Files