تخمین داده های مفقود جریان رودخانه با استفاده از الگوریتم‌های یادگیری جمعی و ماشینی (مطالعه موردی: رودخانه کرخه)

بوستانی, مهسا; فرزین, سعید; موسوی, سیدفرهاد

doi:10.22125/iwe.2025.491838.1842

تخمین داده های مفقود جریان رودخانه با استفاده از الگوریتم‌های یادگیری جمعی و ماشینی (مطالعه موردی: رودخانه کرخه)

نوع مقاله : مقاله پژوهشی

نویسندگان

مهسا بوستانی ¹

سعید فرزین ²

سیدفرهاد موسوی ³

¹ دانشجوی دکترای تخصصی، گروه مهندسی آب و سازه های هیدرولیکی، دانشکده مهندسی عمران، دانشگاه سمنان، ایران

² دانشیار، گروه مهندسی آب و سازه های هیدرولیکی، دانشکده مهندسی عمران، دانشگاه سمنان، ایران

³ استاد، گروه مهندسی آب و سازه های هیدرولیکی، دانشکده مهندسی عمران، دانشگاه سمنان، ایران

10.22125/iwe.2025.491838.1842

چکیده

در این پژوهش، از 9 الگوریتم یادگیری جمعی و ماشینی شامل الگوریتم‌های Xgboost، Catboost، Extra Trees، Random Forest، M5، MLP، K-NN، Decision Tree وSVR برای تخمین داده‌های مفقود جریان روزانه رودخانه کرخه استفاده شد. جهت برآورد داده‌های مفقود ایستگاه عبدالخان و پای پل، داده‌های جریان روزانه ایستگاه هیدرومتری حمیدیه به عنوان ایستگاه همسایه در دوره آماری 40 ساله مورد بررسی قرار گرفت. بهینه‌سازی فراپارامترهای الگوریتم‌های مذکور، به روش Optuna انجام شد. مقایسه عملکرد مدل‌ها نشان داد که الگوریتم Xgboost با یادگیری روابط غیرخطی پیچیده،دقت بیشتری در تخمین داده‌های مفقود دارد. الگوریتم مذکور، در ایستگاه‌های عبدالخان و پای پل، با داشتن بیشترین مقدار ضریب تعیین (R2) به ترتیب برابر با 95/0 و 78/0 و کمترین مقدار میانگین خطای مطلق (MAE) بترتیب برابر با 76/18 و 45/36 بهترین عملکرد را دارد. همچنین، کمترین مقدار ریشه میانگین مربع خطاها (RMSE) برابر با 75/43 و 87/108 به‌دست آمد.علاوه براین، الگوریتم Xgboost کمترین مقدار مجذور میانگین مربعات خطای نسبی (RRMSE) برابر با 20/0 و 46/0 ثبت کرد.بنابراین، الگوریتم Xgboost بیشترین کارایی را در تخمین داده‌های مفقود نسبت به بقیه مدل‌ها در هر دو ایستگاه دارد. همچنین، می‌تواند بر چالش‌های مکانی و داده‌های محدود غلبه کند. نتایج نمودار تیلور نیز حاکی از برتری مدل Xgboost در هر دو ایستگاه مذکور است. مدل Catboost نیز در ایستگاه‌های عبدالخان و پای پل به ترتیب 11% و 5% دقت کمتر از مدل Xgboost داشت و دومین جایگاه را میان مدل‌های بررسی‌شده کسب کرد .نتایج این پژوهش می‌تواند جهت تخمین جریان رودخانه در سایر ایستگاه‌های فاقد آمار مفید واقع شود.

کلیدواژه‌ها

تخمین داده های مفقود

تقویت گرادیان شدید

یادگیری جمعی

یادگیری ماشین

optuna

موضوعات

منابع آب

عنوان مقاله English

Estimation of Missing Streamflow Data in River Using Ensemble and Machine Learning Algorithms (Case Study: Karkheh River)

نویسندگان English

Mahsa Boustani ¹

Saeed Farzin ²

Sayed-Farhad Mousavi ³

¹ PhD Candidate, Department of Water Engineering and Hydraulic Structures, Faculty of Civil Engineering, Semnan University, Semnan, Iran,

² Associated Professor, Department of Water Engineering and Hydraulic Structures, Faculty of Civil Engineering, Semnan University, Semnan, Iran

³ Professor, Department of Water Engineering and Hydraulic Structures, Faculty of Civil Engineering, Semnan University, Semnan

چکیده English

In this study, nine ensemble and machine learning algorithms, including Xgboost, Catboost, Extra Trees, Random Forest, M5, MLP, K-NN, Decision Tree, and SVR, were employed to estimate missing daily streamflow data for the Karkheh River in southwestern Iran. To estimate the missing data at Abdolkhan and Paye-Pol stations, the daily flow data from the Hamidiyeh hydrometric station, as a neighboring station, was analyzed over a 40-year period. Hyperparameter optimization for these algorithms was carried out using the Optuna method. A thorough comparison of model performance showed that the Xgboost algorithm, by learning complex nonlinear relationships, provided the highest estimation accuracy. The results revealed that at Abdolkhan and Paye-Pol stations, Xgboost achieved the highest efficiency, with the highest coefficient of determination (R²) values of 0.95 and 0.78, the lowest mean absolute error (MAE) values of 18.76 and 36.45, the lowest root mean square error (RMSE) values of 43.75 and 108.87, and the lowest relative root mean square error (RRMSE) values of 0.20 and 0.46, respectively.Furthermore, Taylor diagrams confirmed the superiority of the Xgboost model at both stations. These findings highlight the ability of Xgboost to overcome spatial challenges and handle limited data effectively.The Catboost model achieved second place among the models evaluated, with 11% and 5% lower accuracy compared to the Xgboost model at the Abdolkhan and Pay-Pol stations, respectively. The results of this study can be valuable for estimating missing flow data at other stations of this river and play a significant role in the effective management of water resources.

کلیدواژه‌ها English

Estimation of missing data

Extreme Gradient Boosting

Ensemble learning

Machine learning. Optuna