نوع مقاله : مقاله پژوهشی
عنوان مقاله English
نویسندگان English
This study develops and evaluates a river flow prediction model in the Kashkan River Basin. The main objective is to improve streamflow prediction accuracy at the Kashkan-Afrine and Kashkan-Poldokhtar hydrometric stations using the Random Forest model, combined with effective feature selection via a Genetic Algorithm and k-fold cross validation. The dataset includes observations from three hydrometric and nine meteorological stations.
The RF model, by averaging predictions from an ensemble of decision trees, can identify complex patterns, capture nonlinear relationships, and prevent overfitting. Results showed that some input variables, such as the discharge at the Kakareza station, had a very high correlation (0.97) with the target variable, playing a key role in improving model accuracy. Removing these inputs caused a notable decline in performance. When all features were used, the coefficient of determination (R²) for the Kashkan-Poldokhtar and Kashkan-Afrine stations was 0.91 and 0.95, respectively, indicating strong generalization capability. Prediction errors had means close to zero, low standard deviations, and residuals distributed around zero. Excluding the Kakareza discharge data reduced the input correlation to about 0.64 and led to a noticeable performance drop, with R² decreasing to 0.72 for Kashkan-Poldokhtar and 0.73 for Kashkan-Afrine, while RMSE and MSE increased significantly. These findings highlight the importance of accurate feature selection and the inclusion of key hydrometric stations. Overall, integrating the RF model with effective feature selection and k-fold cross-validation provides an efficient approach for streamflow prediction in data-scarce basins.
کلیدواژهها English