Optimization of Power Flow Outage Detection using Machine Learning Algorithm
Article Main Content
This study investigates the application of Machine Learning (ML) methods to detect problems in distribution networks. The main goal is to swiftly and reliably identify and categorize disturbances, hence improving the network's reliability and accelerating restoration efforts. The proposed methodology leverages the functionalities of Supervised Machine Learning algorithms such as Random Forest, Logistic Regression, and K-Nearest Neighbours, which offer user-friendliness and adaptability in addressing both classification and regression tasks for pattern recognition and anomaly detection. Through the analysis of real-time data streams collected from various sensors distributed across the grid, encompassing current measurements, breaker conditions, voltage monitoring, meteorological data, and load information, the machine learning model can discern the typical operational patterns of the network during optimal functioning. Identifying and highlighting variations in these established patterns helps facilitate prompt responses and enhance service consistency. The sophisticated framework may be modified and applied to various network architectures, promoting a more efficient and automated approach to outage management. This paper utilizes machine learning algorithms and their diverse evaluation measures to accurately anticipate power interruptions.
Introduction
The electrical power system is a sophisticated human achievement comprising numerous wonders. To deliver power to end users, a high degree of competence and knowledge is necessary; hence, the system is equipped with control framework parameters that adjust to acceptable levels well within statutory limits. Consequently, society has become reliant on electricity to energize residences and industry, thereby enhancing economic growth and facilitating infrastructure development in technological sectors. Ultimately, faults in electricity distribution have resulted in disruptions to both individual lives and enterprises, continually evolving in shape and pattern, rendering their description occasionally intractable. Governments and utility stakeholders emphasize the secure and reliable functioning of the electricity system because of its essential role in various social, political, and economic activities. Adversaries can penetrate network nodes and alter parameters, including control commands, so disrupting operations, resulting in blackouts, financial losses, and perhaps endangering national security. A power outage represents a considerable obstacle in any country, irrespective of its size, as it interrupts social and economic activities in the area [1], [2]. Power distribution networks often suffer outages, with these incidents increasing in some countries due to declining infrastructures and changing factors such as weather patterns. Although smart design and routine maintenance reduce the incidence of outages, they cannot be completely eliminated. During a required outage for maintenance or equipment upgrades, the utility can reduce service disruption to consumers by carefully coordinating personnel deployment and the order of operations. Electric utility companies must employ an Outage Management System to identify the locations of power outages. This technology allows the energy provider to prioritize restoration efforts based on critical areas, outage severity, and more factors. Acknowledging the significance of power supply, several initiatives have been implemented to mitigate power outages, including the deployment of an OMS and smart meters to facilitate swift identification and restoration of power disruptions. Sultan and Hilton [3]. Precisely and efficiently predicting power outages and detecting faults is crucial for maintaining the reliability and stability of power systems [4], [5]. Conventional strategies for tackling these difficulties have depended on rule-based methodologies and statistical analysis. Consequently, employing standard methodologies may hinder the visualization of the complex patterns and dynamics inherent in energy systems. Deep learning has garnered significant attention across various domains, including speech recognition, natural language processing, and computer vision, because to its promising outcomes. Nonetheless, the accuracy of machine learning fluctuates according to the “no free lunch” theorem, which posits that in the absence of significant information regarding the modeling problem, no singular model will consistently outperform all others. Consequently, the research will be undertaken to design a suitable model for our case study. The aim of the Electric Power Industry is to provide power at the minimal cost while maintaining consistent service quality. The reliability of an electrical power distribution system denotes its capacity to provide uninterrupted power to consumers. Overhead feeders, frequently utilized in distribution systems, are responsible for the bulk of service interruptions to customers due to their vulnerability to numerous risks [4], [6]. Fortunately, facilities possess auxiliary systems equipped with backup power sources that immediately activate in the event of a primary power grid failure. Backup power is widely utilized in corporate buildings, manufacturing, mining, enterprises, and residential regions because to the growing reliance on technology and equipment in daily life. Understanding the potential causes of power failure is essential for safeguarding ourselves and our enterprises against its detrimental consequences. Upon identifying all potential failures, it becomes simpler to optimize and implement appropriate remedies. According to Ajeigbe et al. [5], an Ukata et al. [7], traditional maintenance strategies such as corrective and time-based methods possess numerous drawbacks, including high costs and frequent system shutdowns due to the inability to foresee and identify faults. Experiencing power failures in areas where environmental and community safety are compromised is perilous. Facilities such as hospitals, sewage treatment plants, mining shelters, and telecommunications necessitate emergency backup power [6], [8]. The analysis of power system outages and cascade failures is a crucial aspect of the electric power system contingency research field, significantly influencing the future stability and safety of these systems. A consistent supply of power is essential for contemporary society. Power outages, conversely, interrupt this flow, resulting in economic losses, hassles, and even safety problems [9]. Consistent maintenance, enhancements, and investments in innovative technologies are essential to ensure the secure and reliable operation of the distribution network. This may involve the use of smart grid technologies to enable real-time monitoring and control of systems. Traditional artificial intelligence (AI) approaches demonstrate constraints in accuracy and reliability. Rule-based methodologies depend on established rule sets; yet they encounter difficulties in managing the intricate variances and various fault patterns present in power systems. Furthermore, they need considerable manual labor for development and maintenance and generally lack the ability to learn and adapt to novel situations or data. The electricity system is among the most intricate systems in contemporary life. The contemporary electricity system is nearing critical operational thresholds within the market context. To accommodate the rising load demand, extensive high-capacity transmission networks are extensively employed [8], [10]. The authors did extensive study and determined that adverse weather conditions are the predominant cause of power outages. Nevertheless, power outages often have multiple reasons. The researcher identifies three primary types of power interruptions: Blackout, Burnout, and Permanent Fault [9], [11]. Machine learning amalgamates data from various sources to generate predictions, discerns patterns within datasets, and establishes relationships among those patterns. Utilizing a training dataset, it learns and adapts to test or validate data autonomously, without human intervention [12], [13].
Materials and Methods
Data Collection
The data gathering for this study entailed amalgamating information from several sources, such as sensors, smart meters, weather stations, and historical outage records, to create a comprehensive dataset. Python tools including pandas for data manipulation, requests for API access, and SQL alchemy for database interfaces were utilized to enhance data collection and storage efficiency. Historical outage records furnished essential information on previous grid disturbances, encompassing their duration and causes. Meteorological variables such as temperature, humidity, and wind speed, derived from weather data, are crucial for assessing environmental implications on grid performance. Furthermore, power consumption records documented consumer usage trends over time, providing significant insights into demand variations. The study established a solid basis for thorough analysis and model development by integrating these varied information.
Data Pre-Processing
The data was meticulously cleaned and transformed to guarantee its appropriateness for analysis. Missing values were handled by methods such as mean, median, and k-nearest neighbors (k-NN) imputation, while numerical data was normalized using min-max scaling to standardize variable contributions. Categorical variables, including outage causes, were transformed into binary matrices by one-hot encoding. Libraries such as pandas, numpy, and scikit-learn enabled these preprocessing operations. The dataset was subsequently divided into training, validation, and test sets via scikit-learn’s train_test_split function, hence facilitating supervised learning with k-NN, Random Forest, and Logistic Regression to accurately forecast grid performance.
Model Training and Evaluation
The simulation and learning process utilizes Random Forest, Logistic Regression, and k-Nearest Neighbors (k-NN), each employing unique methodologies: Random Forest employs an ensemble of decision trees with bootstrap sampling and feature subset selection; Logistic Regression calculates class probabilities through logit and sigmoid functions; and k-NN classifies data based on proximity using metrics such as Euclidean distance. Preprocessing encompasses the management of absent values, normalization, and the encoding of categorical variables. Models undergo training and validation utilizing parameters like as accuracy, precision, recall, and F1-score, employing cross-validation and thresholding methods to guarantee robustness and dependability.
Cross-Validation
Cross-validation was carried out by dividing the dataset into multiple folds and training the model on each fold while using the remaining data for validation. This helps in assessing the model’s performance more reliably using Eq. (1):
where k is the number of folds and is the error on the i-th fold.
The performance metrics used and the equation formular used are Accuracy, Precision, Recall and F1 Score as shown in Eqs. (2) to (5):
where TP is True Positives, TN is True Negatives, FP is False Positives and FN is False Negatives.
Statistical Analysis
Statistical analysis was performed to identify trends and correlations in the data, which helped in feature selection and model refinement with 0.85 correlation coefficient on weather outages and 0.78 on load outages.
System Architecture
The system architecture incorporates machine learning models with various data sources, such as outage records, meteorological data, and power consumption logs, to facilitate effective outage detection and management. It utilizes algorithms like as Random Forest, k-Nearest Neighbors (k-NN), and Logistic Regression to ensure precise forecasts and activates response mechanisms to proactively address grid interruptions. The workflow encompasses data collecting, processing, model engineering, execution, and deployment, guaranteeing a robust and scalable solution.
Model Engineering and Execution
The model construction process employed Random Forest, k-Nearest Neighbors (k-NN), and Logistic Regression via scikit-learn, emphasizing feature selection, hyperparameter optimization, and performance validation to guarantee accuracy and mitigate overfitting. Methods such as cross-validation and GridSearchCV were utilized to enhance predictive performance, accompanied by thorough evaluation to assess reliability and robustness. Trained models were utilized on new data for predictions, and their efficacy was assessed using measures including accuracy, precision, recall, and F1-score via scikit-learn’s classification reports and confusion matrices.
Implementation of Algorithms in the Simulation and Learning Process
The simulation and learning process utilizes Random Forest, Logistic Regression, and k-Nearest Neighbors (k-NN), each employing unique methodologies: Random Forest employs an ensemble of decision trees with bootstrap sampling and feature subset selection; Logistic Regression calculates class probabilities through logit and sigmoid functions; and k-NN classifies data based on proximity using metrics such as Euclidean distance. Preprocessing encompasses the management of absent values, normalization, and the encoding of categorical variables. Models undergo training and validation utilizing metrics like accuracy, precision, recall, and F1-score, employing cross-validation and thresholding methods to guarantee robustness and dependability.
Results and Discussion
Simulation and Results
The library for the dataset was uploaded to initiate model testing, as illustrated in Fig. 1. The date and time derived from the data were computed, and the relevant datasheet is presented in Table I. This data was utilized to assess the efficacy of the different algorithms.
Fig. 1. Data imported from the library on VScode.
Event description | Year | Respondent | Geographic areas | NERC region | Demand loss (MW) | Number of customers affected | Tags | Duration |
---|---|---|---|---|---|---|---|---|
268 | 2014 | 207 | 372 | 16 | 248.346964 | 0.0 | 53 | 46.500000 |
268 | 2014 | 350 | 555 | 16 | 248.346964 | 0.0 | 53 | 17.666667 |
268 | 2014 | 506 | 780 | 10 | 424.000000 | 0.0 | 53 | 8.966667 |
214 | 2014 | 480 | 519 | 18 | 248.346964 | 0.0 | 78 | 0.016667 |
214 | 2014 | 480 | 519 | 18 | 248.346964 | 0.0 | 78 | 0.016667 |
Performance Metrics Overview of the Test Models
In evaluating the performance of the power flow outage detection system enhanced by machine learning algorithms, several key metrics are pivotal.
Bar Plot of Accuracy Scores
The code was executed to assess the plot of accuracy scores for all test models, as illustrated in Fig. 2, which compares the accuracy scores of the three models. The Random Forest Classifier attained the best accuracy of 0.89, succeeded by Logistic Regression at 0.85, and K-Nearest Neighbours at 0.82. This picture distinctly illustrates the superiority of the Random Forest regarding predictive accuracy.
Fig. 2. Accuracy scores for all test models: (a) Code simulated for accuracy scores on the bar plot, and (b) Bar plot of accuracy scores.
Fig. 2b illustrates the preeminent performance of the Random Forest classifier, which surpassed the other models in accuracy. Logistic Regression and K-Nearest Neighbors, although still very accurate, are inferior, illustrating the Random Forest’s robustness and superior generalization on the test data.
Histogram of Actual vs. Predicted Values
Fig. 3 illustrates the histograms for Random Forest, Logistic Regression, and K-Nearest Neighbors, depicting the distribution of actual vs. predicted values. The Random Forest histogram has the greatest correlation between real and forecast values, signifying its enhanced predictive efficacy.
Fig. 3. Histograms for all test models (a) histogram of actual vs. predicted values codes and (b) histogram of actual vs. predicted values.
Histograms juxtaposing actual and predicted values for Random Forest, Logistic Regression, and K-Nearest Neighbors (k-NN) in binary classification indicate that Random Forest is the most precise model, exhibiting low discrepancies and strong performance. Logistic Regression and k-NN exhibit significant disparities, especially at the distribution tails, signifying elevated error rates. The x-axis denotes predicted class labels (0 for non-outage, 1 for outage), while the y-axis illustrates instance counts. In summary, Random Forest demonstrates superior reliability, particularly in managing intricate patterns and imbalanced datasets.
Pie Chart of Actual vs. Predicted Values for Each Algorithm
The pie chart in Fig. 5 illustrates the ratio of accurate to inaccurate predictions for each method. The Random Forest pie chart illustrates a greater share of accurate predictions, hence underscoring its superiority relative to the other models. Fig. 4 displays the code for the projected values of the IDE algorithm.
Fig. 4. Data of code simulated for actual and predicted figure.
Fig. 5. Pie chart of actual vs predicted values for each algorithm.
The pie charts offer a visual representation of the forecast accuracy for each model. The Random Forest chart indicates a greater proportion of accurate predictions relative to the other models. This visual representation corresponds with the numerical accuracy scores, substantiating the conclusion that Random Forest is the superior model among the three.
K-Nearest Neighbour Classifier
The K-Nearest Neighbour Classifier was trained on the dataset depicted in Fig. 6, yielding a simple model with comprehensible rules. The values of the test set were subsequently predicted based on the model training.
Fig. 6. Algorithm to train K-neighbors classifier on the data set.
Random Forest Classifier
The Random Forest classifier, an ensemble technique, was utilized to harness the capabilities of several decision trees. The method illustrated in Fig. 7 aids in mitigating overfitting and enhancing the model’s generalizability.
Fig. 7. Algorithm to train random classifier model on data set.
Logistic Regression Classifier
Logistic Regression Classifiers are proficient for classification tasks, particularly when the decision boundary is nonlinear. The model was trained according to the datasheet illustrated in Fig. 8, and the test results were displayed in Table II.
Fig. 8. Algorithm to train logistic regression classifier on the data set.
Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
Random forest | 0.89 | 0.90 | 0.88 | 0.89 |
Logistic regression | 0.88 | 0.89 | 0.87 | 0.88 |
K-Nearest neighbors | 0.85 | 0.87 | 0.82 | 0.84 |
The efficacy of the Random Forest classifier was assessed utilizing standard measures, with the findings delineated in Table II. Similarly, the Logistic Regression classifier was trained and evaluated, with its performance detailed in Table II.
Summary of Results Compared to Previous Works
Using the algorithms described, we train models on historical power system data to predict outages. The models’ performance metrics are compared with results from previous works.
Table III illustrates a comparison of the performance of the machine learning models employed in this study for power outage detection to those from prior studies, highlighting substantial improvements. The Random Forest model attained an accuracy of 0.89, surpassing the accuracies of 0.85 and 0.87 reported by Flores et al. (2023) and Rogers et al. (2024), respectively. This enhancement demonstrates the efficacy of optimizations such as feature selection and parameter adjustment. The Logistic Regression model attained an accuracy of 0.83, exceeding prior outcomes and underscoring the current study’s proficiency in managing and extracting pertinent features. The K-Nearest Neighbors (k-NN) model exhibited superior performance, with an accuracy of 0.81, surpassing the lower accuracies reported in previous investigations. The comparisons illustrate the efficacy of the proposed machine learning framework, with enhancements ascribed to superior data preprocessing, feature engineering, and model optimization strategies, hence improving power outage detection and optimization.
Model | Accuracy (Current study) | Accuracy [14] | Accuracy [15] | Accuracy [16] | Accuracy [17] |
---|---|---|---|---|---|
Random forest | 0.89 | 0.85 | 0.87 | N/A | N/A |
Logistic regression | 0.85 | 0.82 | N/A | 0.84 | N/A |
K-Nearest neighbours | 0.82 | N/A | N/A | 0.84 | 0.80 |
Consequences of the Findings
This study’s results underscore notable progress in power outage detection, illustrating the efficacy of machine learning algorithms in attaining enhanced accuracy for prompt and dependable outage diagnosis. The suggested approach improves grid resilience and dependability by decreasing detection and reaction times, hence lowering interruptions to customers and critical infrastructure. The amalgamation of these models with smart grid technology facilitates real-time monitoring and predictive maintenance, hence enhancing power distribution efficiency. A comparative review with previous research highlights the superiority of these techniques, facilitating the advancement of resilient and efficient power systems that satisfy the needs of contemporary societies.
Conclusion
This study employed a pre-processed dataset before implementing machine learning models to forecast power outage occurrences. Sequencing and feature selection were essential for data preparation before analysis. Three machine learning methodologies were assessed: K-Nearest Neighbors (k-NN), Random Forest, and Logistic Regression. The Random Forest model exceeded prior research, with an accuracy of 0.89 owing to its enhanced feature selection and parameter optimization. k-NN and logistic regression demonstrated strong performance, with accuracies of 0.85 and 0.82, respectively. The results demonstrate that the methods and improvements applied substantially increased machine learning’s ability to detect power shortages.
Recommendation
This work illustrates the viability and advantages of utilizing machine learning for power outage detection, hence promoting further research and development in these essential areas. Incorporating real-time data from sensors and smart grid technologies can enhance the forecasting capacities of various machine learning models for this research. This can substantially improve the model’s accuracy. This addition must account for real-time load data and changes in grid topology. Employing feature engineering to identify subtle patterns in the data, including temporal patterns and variable interactions, can enhance model performance. The efficiency of power distribution networks can be improved by fully utilizing machine learning for greater outage management and detection.
References
-
Amadi HN, Festus O, Ijeoma RC. Simulation and analysis of improved relay coordination in Tungbo 11kV feeders in sag- bama substation, Bayelsa State, Nigeria. Eur J Adv Eng Technol. 2024;11(11):41–9.
Google Scholar
1
-
Ajeigbe OA, Chowdhury SP, Olwal TO, Abu-Mahfouz AM. Har- monic control strategies of utility-scale photovoltaic inverters. Int J Renew Ene Res. 2018;8(3):1354–68.
Google Scholar
2
-
Jamal TB, Hasan S. A generalized accelerated failure time model to predict restoration time from power outages. Int J Disaster Risk Sci. 2023;14(6):995–1010. doi: 10.1007/s13753-023-00529-3.
Google Scholar
3
-
Rizvi M. Leveraging deep learning algorithms for predicting power outages and detecting faults: a review. Adv Res. 2023;24(5):80– 8. doi: 10.9734/AIR/2023/v24i5961. Article no.AIR.101464 ISSN: 2348-0394, NLM ID: 101666096.
Google Scholar
4
-
Ajeigbe OA, Munda JL, Hamam Y. Optimal allocation of renew- able energy hybrid distributed generations for small-signal stability enhancement. Energies. 2019;12(24):1–31.
Google Scholar
5
-
Eto JH, LaCommare KH, Caswell HC, Till D. Distribution sys- tem versus bulk power system: identifying the source of electric service interruptions in the US. IET Generat, Transmiss Distribut. 2019;13(5):717–23.
Google Scholar
6
-
Ukato A, Sofoluwe OO, Jambol DD, Ochulor OJ. Opti- mizing maintenance logistics on offshore platforms with AI: current strategies and future innovations. World J Adv Res Rev. 2024;22(1):1920–9.
Google Scholar
7
-
Ayvaz S, Alpay K. Predictive maintenance system for production lines in manufacturing: a machine learning approach using IoT data in real-time. Expert Syst Appl. 2021;173:114598.
Google Scholar
8
-
Chen J. Research on power system automation communication technology for smart grid. IOP Conf Ser: Mat Sci Eng. 2019 Jul;569(4):042025. IOP Publishing.
Google Scholar
9
-
Lu H, Zhang Y, Liu Y. Research on key technologies of remote operation and maintenance system for substation automation equipment. Appl Mathem Nonl Sci. 2024;9(1):1–17.
Google Scholar
10
-
Ajeigbe OA, Munda JL, Hamam Y. Renewable distributed generations’ uncertainty modelling: a survey. 7th IEEE PES & IAS Power Africa Conference (PAC 2020), pp. 1–5, IEEE, 2020 Aug 25–28.
Google Scholar
11
-
Zhang L, Fan Y, Cui R, Lorenz RD, Cheng M. Fault-tolerant direct torque control of five-phase FTFSCW-PM motor based on analogous three phase SVPWM for electric vehicle applications. IEEE Trans Veh Technol. 2018;67(2):910–9.
Google Scholar
12
-
Huang W, Hua W, Chen F, Yin F, Qi J. Model predictive current control of open circuit fault-tolerant five-phase flux switching permanent magnet motor drives. IEEE J Emerg Sel Topic Power Electron. 2019;6(4):1840–9.
Google Scholar
13
-
Flores D, Sang Y, McGarry MP. Transmission line outage detection with limited information using machine learning. 2023 North American Power Symposium (NAPS), pp. 1–5, IEEE, 2023.
Google Scholar
14
-
Rogers J, Lin H, Sun YL. Prediction-based data augmentation for smart grid line outage detection. 2024 56th North American Power Symposium (NAPS), pp. 1–6, IEEE, 2024.
Google Scholar
15
-
He J, Cheng MX. Machine learning methods for power line outage identification. Electricity J. 2021;34(1):106885.
Google Scholar
16
-
Alam M, Kundu S, Thakur SS, Banerjee S. PMU based line outage identification using comparison of current phasor measurement technique. Int J Elect Power Energy Syst. 2020;115:105501.
Google Scholar
17