مدل جنگل تصادفی چندهدفه برای پیش‌بینی جریان رودخانه با بهبود انتخاب ویژگی مبتنی بر الگوریتم ژنتیک

میربیک سبزواری, مریم; ترابی پوده, حسن; دولتشاهی, محمد باقر; حقی آبی, امیر حمزه; ملکی, عباس

doi:10.22125/iwe.2026.556636.1906

مدل جنگل تصادفی چندهدفه برای پیش‌بینی جریان رودخانه با بهبود انتخاب ویژگی مبتنی بر الگوریتم ژنتیک

نوع مقاله : مقاله پژوهشی

نویسندگان

مریم میربیک سبزواری ¹

حسن ترابی پوده ¹

محمد باقر دولتشاهی ²

امیر حمزه حقی آبی ¹

عباس ملکی ¹

¹ گروه مهندسی آب، دانشکده کشاورزی، دانشگاه لرستان، خرم آباد، ایران

² گروه مهندسی کامپیوتر، دانشکده فنی و مهندسی، دانشگاه لرستان، خرم آباد، ایران

10.22125/iwe.2026.556636.1906

چکیده

مطالعه حاضر به توسعه و ارزیابی مدل پیش‌بینی جریان رودخانه در حوضه آبریز کشکان پرداخته است. هدف اصلی پژوهش، بهبود دقت پیش‌بینی دبی جریان در ایستگاه‌های هیدرومتری کشکان-افرینه و کشکان-پلدختر با استفاده از مدل جنگل تصادفی و انتخاب ویژگی‌های موثر از طریق الگوریتم ژنتیک، همراه با اعتبارسنجی متقاطع k بخشی است. داده‌های مورد استفاده شامل مشاهدات سه ایستگاه هیدرومتری و نه ایستگاه هواشناسی می‌باشد. مدل جنگل تصادفی با میانگین‌گیری از پیش‌بینی‌های مجموعه‌ای از درختان تصمیم، قادر به شناسایی الگوهای پیچیده، روابط غیرخطی و جلوگیری از بیش‌برازش است. نتایج نشان داد برخی ورودی‌ها مانند دبی ایستگاه کاکارضا همبستگی بسیار بالایی (97/0) با متغیر هدف دارند که نقش کلیدی در افزایش دقت مدل داشته و حذف آن‌ها منجر به کاهش قابل توجه عملکرد مدل شد. در سناریوی استفاده از تمام ویژگی‌ها، R2 برای ایستگاه کشکان-پلدختر و کشکان-افرینه به ترتیب 91/0 و 95/0 به‌دست آمد که بیانگر قدرت تعمیم‌پذیری مناسب مدل است. خطاهای پیش‌بینی دارای میانگین نزدیک به صفر و انحراف معیار پایین بودند و پراکندگی خطاها حدود صفر بود.

حذف داده دبی کاکارضا باعث کاهش همبستگی ورودی‌ها به حدود 64/0 و افت محسوس عملکرد مدل شد به طوری‌که R² برای کشکان-پلدختر به 72/0 و کشکان-افرینه به 73/0 رسید در حالی‌که مقادیر RMSE و MSE به‌طور قابل توجهی افزایش یافتند. این یافته‌ها اهمیت انتخاب دقیق ویژگی‌ها و بهره‌گیری از ایستگاه‌های کلیدی هیدرومتری را نشان داده و ترکیب RF با انتخاب ویژگی‌های موثر و اعتبارسنجی متقاطع k بخشی را به‌عنوان روشی کارآمد برای پیش‌بینی در حوضه‌های داده محدود تایید می‌کند.

کلیدواژه‌ها

الگوریتم ژنتیک

انتخاب ویژگی

پیش بینی جریان

جنگل تصادفی چند هدفه

مدل سازی هیدرولوژیکی

موضوعات

هیدرولوژی

عنوان مقاله English

A Multi-Target Random Forest Model for River Flow Forecasting Enhanced by Genetic Algorithm-Based Feature Selection

نویسندگان English

Maryam Mirbeyk Sabzevari ¹

Hassan Torabi podeh ¹

Mohammad Bagher Dowlatshahi ²

Amir Hamzeh Haghiabi ¹

Abbas Maleki ¹

¹ , Department of Water Engineering, Faculty of Agriculture, Lorestan University, Khorramaad, Iran

² Department of Computer Engineering, Faculty of Engineering, Lorestan University,Khorramabad, Iran

چکیده English

This study develops and evaluates a river flow prediction model in the Kashkan River Basin. The main objective is to improve streamflow prediction accuracy at the Kashkan-Afrine and Kashkan-Poldokhtar hydrometric stations using the Random Forest model, combined with effective feature selection via a Genetic Algorithm and k-fold cross validation. The dataset includes observations from three hydrometric and nine meteorological stations.

The RF model, by averaging predictions from an ensemble of decision trees, can identify complex patterns, capture nonlinear relationships, and prevent overfitting. Results showed that some input variables, such as the discharge at the Kakareza station, had a very high correlation (0.97) with the target variable, playing a key role in improving model accuracy. Removing these inputs caused a notable decline in performance. When all features were used, the coefficient of determination (R²) for the Kashkan-Poldokhtar and Kashkan-Afrine stations was 0.91 and 0.95, respectively, indicating strong generalization capability. Prediction errors had means close to zero, low standard deviations, and residuals distributed around zero. Excluding the Kakareza discharge data reduced the input correlation to about 0.64 and led to a noticeable performance drop, with R² decreasing to 0.72 for Kashkan-Poldokhtar and 0.73 for Kashkan-Afrine, while RMSE and MSE increased significantly. These findings highlight the importance of accurate feature selection and the inclusion of key hydrometric stations. Overall, integrating the RF model with effective feature selection and k-fold cross-validation provides an efficient approach for streamflow prediction in data-scarce basins.

کلیدواژه‌ها English

Feature Selection

Flow Forecasting

Genetic Algorithm

Hydrological Modeling

Multi-Target Random Forest

Afan, H.A., M.F. Allawi, A. El-Shafie, Z.M. Yaseen, A. Najah Ahmed, M. Abdul Malek, S.B. Koting, S.Q. Salih, W.H.M. Wan Mohtar, A. Sai Hin Lai, A. Sefelnasr and M. Sherif. 2020. Input attributes optimization using the feasibility of genetic nature inspired algorithm: Application of river flow forecasting. Scientific Reports, 10, Article 4684.

Ali, M., R. Prasad, Y. Xiang and Z.M. Yaseen. 2020. Complete ensemble empirical mode decomposition hybridized with random forest and kernel ridge regression model for monthly rainfall forecasts. Journal of Hydrology, 584, 124647.

Al-Abadi, A.M., and S. Shahid. 2016. Spatial mapping of artesian zone at Iraqi southern desert using a GIS-based random forest machine learning model. Modeling Earth Systems and Environment, 2(2):1-17.

Beven, K. 2012. Rainfall-Runoff Modelling: The Primer. Wiley-Blackwell.

Breiman, L. 2001. Random Forests. Machine learning, 45(1): 5-32.

Chaudhary, K. Singh, P. Kumar, R. Singh, A. and Verma, S. 2024. River stream flow prediction through advanced machine learning models for enhanced accuracy. Results in Engineering, 22, 102215.

Cheng, Q. Ma, X. and Li, Y. 2006. Optimization of hydrological model parameters using genetic algorithms: Application to Xinanjiang model. Journal of Hydrology, 316(1–4).

Cutler, D.R., T.C. Edwards, K.H. Beard, A. Cutler, K.T. Hess, J. Gibson and J.J. Lawler. 2007. Random forests for classification in ecology. Ecology, 88 (11): 2783-2792.

Cutler A., D.R. Cutler and J.R. Stevens. 2012. Random forests In Ensemble machine learning. Springer, Boston, MA: 157-175.

Danandeh Mehr, A. Kahya, E. and Olyaie, E. 2013. Streamflow prediction using linear genetic programming in comparison with a neuro-wavelet technique. Journal of Hydrology, 505, 240–249.

Elshorbagy, A., G. Corzo, S. Srinivasulu and D. Solomatine. 2010. Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology–Part 1, Concepts and methodology. Hydrology and Earth System Sciences, 14(10): 1931–1941.

Friedman, J.H. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5): 1189–1232.

Geem. Z.W., J.H. Kim and G.V. Loganathan. 2001. A new heuristic optimization algorithm: Harmony search. Simulation, 76(2): 60–68.

Ghorbani, MA., R. Khatibi, A. Geol, M.H. Fazelifard and A. Azani.2016. Modeling river discharge time series using support vector machine and artificial neural networks. Environmental Earth Sciences, 75(4): 675-685.

Guyon, I. and A. Elisseeff A. 2003. An introduction to variable and feature selection. Journal of Machine Learning Research, 3: 1157–1182.

Holland, J.H. 1975. Adaptation in Natural and Artificial Systems. University of Michigan Press.

Islam, K.I., E. Elias, K.C. Carroll and C. Brown. 2023. Exploring Random Forest Machine Learning and Remote Sensing Data for Streamflow Prediction: An Alternative Approach to a Process-Based Hydrologic Modeling in a Snowmelt-Driven Watershed. Remote Sens, 15, 3999.

Liaw, A. and M. Wiener. 2002. Classification and Regression by Random Forest. R News 2002, 2: 18–22.

Kennedy, J. and R. Eberhart. 1995. Particle swarm optimization. Proceedings of IEEE International Conference on Neural Networks, 4: 1942–1948.

Leonard, L. 2019. Using machine learning models to predict and choose meshes reordered by graph algorithms to improve execution times for hydrological modeling. Environmental Modelling & Software, 119: 84–98.

Li, X., J. Sha and Z.L Wang. 2018. Application of feature selection and regression models for chlorophyll-a prediction in a shallow lake. Environmental Science and Pollution Research, 25 (20): 19488–19498.

Lorestan Regional Water Company. 2014. Hydrological and meteorological data of Lorestan Province. Lorestan, Iran.

Moriasi, D.N., J.G. Arnold, M.W. Van Liew, R.L. Bingner, R.D. Harmel and T.L. Veith .2007. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Transactions of the ASABE, 50(3): 885-900.

Nagelkerke, N.J.D. 1991. A note on a general definition of the coefficient of determination. Biometrika, 78(3) :691-692.

Nash, J.E. and J.V. Sutcliffe. 1970. River flow forecasting through conceptual models. Part I-A discussion of principles, Journal of Hydrology, 10(3): 282–290.

Natekin, A. and A. Knoll. 2013. Gradient boosting machines, a tutorial. Frontiers in Neurorobotics, 7, 21.

Panda, C. Panda, K. C. Singh, R. M. Singh, R.and Singh, V. P. 2025. A generalised hydrological model for streamflow prediction using wavelet ensembling. Journal of Hydrology, 655, 132883.

Solomatine, D.P. and A. Ostfeld. 2008. Data-driven modelling: some past experiences and new approaches. Journal of Hydroinformatics, 10(1): 3–22.

Vieira, A.C., G. Garcia, R.E. Pabón, L.P. Cota, P. de Souza and J. Ueyama. 2021. Improving flood forecasting through feature selection by a genetic algorithm – experiments based on real data from an Amazon rainforest river. Earth Science Informatics, 14(1): 37–50.

Vincenzi, S., M. Zucchetta, P. Franzoi and M. Pellizzato. 2011. Application of a random forest algorithm to predict spatial distribution of the potential yield of Ruditapes philippinarum in the Venice lagoon, Italy. Ecological Modelling, 222 (80): 1471–1478.

Were, K., D.T. Bui, B. Dick and B.R. Singh. 2015. A comparative assessment of support vector regression, artificial neural networks and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape. Ecological Indicators, 52: 394-403.

Worland, S.C., W.H. Farmer and J.E. Kiang. 2018. Improving predictions of hydrological low flow indices in ungaged basins using machine learning. Environmental Modelling & Software, 101: 169–182.

نشریه علمی پژوهشی مهندسی آبیاری و آب ایران

دوره 16، شماره 2 - شماره پیاپی 62
زمستان 1404
صفحه 252-254

XML

اصل مقاله 1.66 M

تعداد مشاهده مقاله 91
تعداد دریافت فایل اصل مقاله 39

نشریه علمی پژوهشی مهندسی آبیاری و آب ایران

مدل جنگل تصادفی چندهدفه برای پیش‌بینی جریان رودخانه با بهبود انتخاب ویژگی مبتنی بر الگوریتم ژنتیک

A Multi-Target Random Forest Model for River Flow Forecasting Enhanced by Genetic Algorithm-Based Feature Selection

دوره 16، شماره 2 - شماره پیاپی 62زمستان 1404صفحه 252-254

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

دوره 16، شماره 2 - شماره پیاپی 62
زمستان 1404
صفحه 252-254