A Remote Sensing and Machine Learning Based Framework for the Assessment of Spatiotemporal Water Quality Along the Middle Ganga Basin
Date
2023
Authors
S K, Ashwitha
Journal Title
Journal ISSN
Volume Title
Publisher
National Institute Of Technology Karnataka Surathkal
Abstract
Understanding the changes in surface water quality over time and space necessitates an
examination of spatiotemporal water quality data. This data can be used to identify
pollution sources, monitor changes in water quality, and assess the effectiveness of
management and conservation efforts. Furthermore, spatiotemporal surface water
quality assessment can forecast future water quality trends, allowing for precise
decision-making and conservation. Overall, spatiotemporal water quality assessment is
critical in protecting and managing water resources.
Various multivariate statistical and machine learning techniques are used in this study
to determine the river water quality status and comprehend the spatiotemporal pattern
along the Middle Ganga Basin in Uttar Pradesh. The study was carried out for 14 years
(2005-2018), with 20 Water Quality Parameters (WQPs) collected monthly and
covering spatially from up-stream to downstream Ankinghat to Chopan respectively
(20 monitoring stations under Central Water Commission, Middle Ganga Basin). The
temporal dissimilarity of river water quality is established by applying the Spearman
non-parametric correlation coefficient test (Spearman r). A significant p-level (0.0000)
is observed for temperature within the season with a Spearman r of -0.866. Besides that,
the parameters EC, pH, TDS, T, Ca, Cl, HCO3, Mg, NO2+NO3, SiO2, and DO strongly
correlated with the season (p < 0.05). The K-means clustering algorithm temporarily
classified the 20 monitoring stations into four clusters based on the similarity and
dissimilarity of WQPs. Box and Whisker plots were generated based on these clusters
to study water quality trends along individual clusters in different seasons. PCA was
applied to screen out the most dominating WQPs causing spatial and seasonal variations
from a large data set. Seasonally, the three PCs chosen explained 75.69% and 75% of
the variance in the data. With PCs >0.70, the variables EC, pH, Temp, TDS, NO 2+NO3,
P-Tot, BOD, COD, and DO have been identified as the dominant pollutants. The
applied RDA analysis revealed that LULC has a moderate to strong contribution to
WQPs during the monsoon season but not during the non-monsoon season.
Furthermore, dense vegetation is critical for keeping water clean, whereas agriculture,
barren land and build-up area degrade water quality. Besides that, the findings suggest
the relationship between WQPs and LULC differs at different spatial scales. The
istacked ensemble regression model is applied to understand the model's predictive
power across different clusters and scales. Overall, the results indicate that the riparian
scale is more predictive than a watershed and reach scales.
As a further part of this work, an integrated use of remote sensing, insitu measurements,
and machine learning modelling is used better to understand the water quality status
along the study region. In this context, a remote sensing framework based on the
Extreme Gradient Boosting (XGBoost) and Multilayer Perceptron (MLP) regressor
with optimized hyperparameters to quantify the concentrations of different WQPs from
the Landsat-8 satellite imagery is developed. Six years of satellite data from upstream
to downstream Ankinghat to Chopan (20 stations under Central Water Commission
(CWC), Middle Ganga Basin) are analysed to characterise the trends of dominant
physicochemical WQPs across the four identified clusters. A significant coefficient of
determination (R2) in the range of 0.88- 0.98 for XGBoost and 0.72-0.97 for MLP was
generated using the developed XGBoost and MLP regression models. The bands B1-
B4 and their ratios are found to be more consistent with the WQPs. Meanwhile, the
performance matrix RMSE for the parameters SiO2 and DO for all clusters for the
XGBoost method is determined to be superior to MLP. Indeed, these findings show that
a small number of insitu measurements is sufficient to develop reliable models for
estimating the spatiotemporal variations of physicochemical and biological WQPs. As
a result, Landsat-8 models could aid in the environmental, economic, and social
management of any body of water.
Description
Keywords
Surface water quality, Multi- spatial scale, RS of water quality, XGBoost