Improving Water Quality Predictions with Machine Learning – Earth and Environmental Sciences Area

Rivers wind through valleys, mountains, and cities, defining the watersheds that deliver 60–90% of the world’s water. However, river hydrology–the distribution and movement of water, along with its physical and chemical properties–can be hard to predict because environmental conditions are ever-changing and rivers can run thousands of miles through hard-to-reach, remote locations.

In a recent study, EESA scientist Charu Varadharajan and former NESAP postdoctoral fellow at the National Energy Research Scientific Computing Center (NERSC) Jared Willard demonstrated that combining results from multiple machine learning (ML) models–an approach known as “ensembling”–can improve the accuracy of river water temperature predictions, even in areas with little to no observations. Their work, published in the Journal of Geophysical Research: Machine Learning and Computation, highlights how ensemble modeling can lead to more reliable predictions of river hydrology and other key environmental variables.

Multiple models for a better understanding of water systems

Machine learning can be trained to learn from past observations to make predictions about hydrology at new locations with fewer observations. Ensemble models, which combine multiple individual predictive models, can produce more accurate final predictions than any single model alone. The more accurate our predictions of hydrology, the better we can predict and manage our dwindling freshwater resources as more disturbances occur and demand continues to rise.

“Ensembles have traditionally been used in forecasting applications to improve model accuracy and indicate how confident we are about the results,” explained Varadharajan. “These produce a range of possible outcomes instead of a single prediction. However, these have not been used for ML applications in hydrology. Our study shows that even simple ML ensembles can significantly improve prediction accuracy compared to using single models. ”

The team compared the ability of an ensemble model and single ML models to predict daily stream temperature–an important indicator of water quality that affects chemistry and aquatic life. Using high-performance computing available at NERSC, they built computer models using different ML approaches to predict river water temperatures in places where there were little or no prior observations or data. They also examined how the method of building ensembles affected the accuracy of their water temperature predictions.

Accurately projecting the extremes

The study showed that, in every case, the ensemble models gave better predictions of stream temperatures. When building the ensemble model, the best predictions came from using different types of models or giving each model slightly different data.

The team also developed a method to understand the reliability of the predictions, finding that ensembles improved predictions for not just average values, but also extremely warm temperatures.

“These findings can help to inform the way we develop ensemble models in hydrology, but also in other fields that rely on deep learning model predictions like climate, epidemiology, economics, and more,” said Varadharajan.

With more robust, reliable predictions about environmental factors like stream temperature, water managers and policy makers can better monitor water quantity and quality–even if there is little to no data in a particular area–and ultimately make more informed decisions. This study gives important insights into how, with improved methods to make better predictions about the world around us, we could drastically improve our understanding of ecosystem health and function that is essential for food, water, and energy security.

This work was funded by the Biological and Environmental Research and Advanced Scientific Computing Research programs of the Department of Energy Office of Science.

Improving Water-Quality Predictions with Machine Learning

Connect

Our Organization