Illuminating the Future: Predictive Modelling of PV Output Using Machine Learning Techniques
Article Main Content
Leveraging solar energy will bring about a notable change in the fundamental production and use of power, and the parameters to achieve success in this area must be forecasted to aid steady production. This work entailed the use of advanced predictive machine learning models for optimal power output, reduced uncertainty, optimal resource planning, and a notably high degree of alignment with peak demands for energy for efficient power production from solar radiations. Models were generated by employing machine learning algorithms for data evaluation. The direct in-plane irradiance has the strongest correlation (1.00) with PV output, according to the results. Additionally, it indicated that the value of R2: 0.999567 of the Random Forest Regression was higher than all other regression models and the least Mean Squared Error (MSE) and Mean Absolute Error (MAE), 17.130680 and 2.28139, respectively. On the other hand, the Linear Regression’s Mean Squared Error (MSE), R2, and Mean Absolute Error (MAE) values are, respectively, 20.645271, 0.999478, and 3.16270. Random Forest Regression is a stronger forecasting model because of its higher R2 value, which also helps to explain variations in PV power output.
Introduction
Diverse organisations have established strategic alliances with the Port of Newcastle in order to aid in the development of an eco-system for renewable energy that is recognised internationally. These collaborations will serve as the foundation for the progression of renewable energy generation and empower the Port to emerge as a dominant force in clean energy. The renewable energy precinct will be supported by shared infrastructure for renewable energy storage, transportation, and export facilities, in addition to providing a platform for large-scale clean energy production [1].
The switch to solar energy signifies a fundamental change in the way we produce and use power. We may take advantage of the plentiful energy resources that surround our planet by using photovoltaic (PV) cells or concentrating solar power (CSP) systems, all without contributing to the ongoing cycle of environmental destruction [2]. The options for deploying solar energy are as varied as they are revolutionary, providing people all over the world with a route towards resilience and energy independence. These options range from rooftop solar panels to massive solar farms, in which several machine learning approaches have been aimed at predicting the optimal the solar PV output [3]–[6].
Although solar energy has great potential for transformation, there are still obstacles to overcome in order to maximise its use. The fluctuating nature of solar power poses hurdles, making it difficult to achieve economic efficiency and worsening scheduling mismatches between peak energy demand and supply. This work was aimed at developing advanced predictive models for solar installation power output to optimize economic efficiency, manage daily and hourly power grid production, thus ensuring optimal use, reduced uncertainty, better resource planning, and alignment with peak demand and energy production timing; using Newcastle Upon Tyne as the site to be considered.
From the research, Huang et al. [7] the authors used a multivariate linear regression model for the analysis of the impact of candidate components on stability prediction, and the findings from this work contribute to the understanding of components which impact implant stability quotient (ISQ) values and provide a basis for developing mathematical models to predict implant stability in clinical practice. Sharma and Kakchapati [8] use Linear Regression Model to recognise the components associated with carbon stock to provide valuable insights into the factors influencing carbon stock in the chure forest and contribute to the understanding of carbon dynamics in this region, highlighting the importance of recognising factors associated with carbon content in the chure forest for both environmental conservation and supporting international programs.
Somvanshi et al. [9] provide a brief overview of the growing importance of decision tree algorithms in data analysis and classification tasks, drawing attention to the growing volume of digital data that is currently available, and emphasise the necessity of developing efficient methods to derive insights from this data. By comparing sample properties with previously learned threshold values, a decision tree’s prediction process starts at the root and progresses towards the leaves. At every level, the comparison determines the next branch to follow until a leaf node, where a decision or prediction is made based on the majority class in the leaf [10].
Decision trees are prone to overfitting because they tightly fit the training data samples. Random forests, consisting of numerous decision trees, mitigate overfitting by averaging independent trees, according to [11]. For the meta-analysis of findings on random forests, specification has demonstrated that Random Forests have been effectively utilised in a wide range of application areas since its inception. Random Forests, among other ensemble algorithms, have demonstrated superior performance compared to individual classifier methods [12].
Methodology
Data Collection and Processing
In a world driven by the need for sustainable energy solutions, solar radiation is the key driving force behind solar energy generation. PVGIS is instrumental in harnessing the true potential of solar energy by allowing us to understand the availability and intensity of sunlight in different regions. The downloaded data in comma separated variables contains time (UTC), PV power output, direct in-plane irradiance, sun elevation, air temperature and speed of the wind at 10 m for Newcastle [13]. It contains 15 years of data from 2006 to 2020, for the hours of 9 am to 12 noon for all the days of the month of January at an optimized slope. Table I shows the attributes and specifications of the PV system.
Specifications | Attributes |
---|---|
PV technology | Crystalline silicone |
Mounting type | Inclined axis |
Region | Northeast England |
Site | Newcastle upon Tyne |
Coordinates | 54.9783° N, −1.6178° W |
With the use of Power BI in the power query editor, necessary duplicate removal was done, and the time column was split into other columns, namely year, day, and hour, and the missing values were filled via interpolation with Python programming.
Algorithms for Machine Learning
For the identification of the interaction of the features of the dependent and independent variables, a correlation heat map was generated, thereby providing an enhanced visualisation of the data and the plausible multicollinearity among the variables. Air temperature, direct-in-plane irradiance and sun elevation are the predicting factors, which are the inputs of ‘X’. The PV power output is the predicted value, which is the output of ‘Y’. The data was then split into 80% and 20%, respectively, for training and testing.
Linear Regression: Applying linear regression entails having the data fit into a linear equation, where the coefficients represent the relationship between the independent variables (air temperature, direct-in-plane irradiance, sun height) and the dependent variable (power output). The resulting model provides insights into the magnitude and direction of these relationships, as shown in (1), aiding in predicting power output based on changing environmental conditions. (1)Y=β0+β1X+∈
Decision Trees: Their inherent visual clarity and adaptability to non-linear relationships remain a versatile choice for predicting power output in solar PV systems, offering valuable insights for system optimization and performance analysis computed using the Gini index as shown in (2). (2)G=∑k=1KP^mk(1−P^mk)
Random Forests: When applied to predicting power output in a solar PV system, random forests regression can offer accurate and reliable insights into how air temperature, direct-in-plane irradiance, and sun height collectively influence energy generation. The model’s adaptability to non-linear relationships makes it particularly well-suited for capturing the intricate dynamics inherent in solar energy productioni as shown in (2), but using m=p; ultimately contributing to more precise predictions and better-informed decision-making for optimizing PV system performance. Fig. 1 shows the chart from the process of data collection and processing, to the generation of the machine learning algorithm and predictions.
Fig. 1. Chart showing the process from data collection to predictions.
Result and Discussion
The direct in-plane irradiance had a higher correlation with the PV power output as compared to other environmental factors as shown in Fig. 2.
Fig. 2. Correlation heat map.
Forecasting Obtained by LR
With a high R-squared value of 0.999478, the linear regression model shows good prediction ability and can account for about 99.9478% of the variation in the data. The MSE value of 20.645277 in this instance shows that the squared difference between the PV power output that was produced and that was anticipated is 21 on average.
With an R2 value of 0.999478, the LR model is able to show 99.9478% of the variance in the PV power output. The MAE value of 3.16270 indicates that there is an average deviation of 3 units between the actual data and the predictions of the linear regression model illustrated in Fig. 3.
Fig. 3. The graph of actual vs. predicted PV power output using LR.
Forecasting Obtained by RFR
With an even higher R2 value of 0.999567, the RFR model exceeds the LR model and shows an outstanding fit to the data. The significantly decreased MAE and MSE both point to better forecasting accuracy. The MSE value of 17.130680, which is significantly lower than the MSE of the LR model in this case, indicates that the RFR model appears to produce predictions that are, on average, closer to the true values.
The R2 shows that 99.9567% of the variance in the PV power output, which is an excellent predictive capability. The high R2 value indicates how well the RFR model illustrates the relationship between the predictors and the target variable by capturing nearly all of the variability in the data illustrated in Fig. 4.
Fig. 4. The graph of actual vs. predicted PV power output using RFR.
Forecasting Obtained by DTR
Due to its higher prediction errors and lower R-squared value, the DTR model performs marginally worse than the other two models. In this instance, the decision tree regression model’s MSE score of 27.426819 is greater than that of the LR and RFR, as shown in Table II.
Machine learning models | Performance metrics | ||
---|---|---|---|
MSE | R2 (%) | MAE | |
Linear regression | 20.645277 | 0.999478 | 3.16270 |
Random forest regression | 17.130680 | 0.999567 | 2.28139 |
Decision tree regression | 27.426819 | 0.999307 | 2.870215 |
The DTR model’s R2 score 0.999307 suggests that the features of the model can account for 99.9307% of the variance in PV power production. The decision tree regression model has a greater MAE value of 2.870215 when compared to the LR and RFR models. This means that the decision tree regression model typically has larger absolute gaps between the predicted and actual values, as illustrated in Fig. 5.
Fig. 5. The graph of actual vs. predicted PV power output using DTR.
Conclusion
It is clear that the RFR algorithm is the most successful method after a thorough examination of the performance metrics for LR, RFR, and DTR models in predicting PV output. All important metrics, such as MSE, R2 value, and MAE, show that the RFR model performs better than both the LR and DTR models. The RFR model is better at making predictions and explaining the changes in PV power output. It has a higher R2 value close to 1 and much lower MAE and MSE.
Furthermore, the scatter plot comparing the actual and predicted outcomes provides additional evidence of the RFR model’s ability to generate precise forecasts with little variation. Finally, it can be concluded that the RFR algorithm is the most appropriate method for predicting or forecasting PV output in the designated area.
It is the best option for energy planning and management activities because of its strong performance, which is demonstrated by smaller prediction errors and increased explanatory power. Stakeholders can enhance the sustainability and efficiency of the energy ecosystem by facilitating the integration of renewable energy sources into the grid, optimising resource allocation, and making well-informed decisions by utilising sophisticated machine learning techniques such as RFR.
References
-
Kypriotaki A. Port of Newcastle forms strategic partnerships for clean energy. SAFETY4SEA. 2023. Available from: https://safety4sea.com/port-of-newcastle-forms-strategic-partnerships-for-clean-energy. [Accessed 10.03.2024].
Google Scholar
1
-
Cardoso PCN, Schettino S, Minette LJ, Soranso DR. Paradigms of environmental sustainability in photovoltaic energy generation. DELOS: Desarrollo Local Sostenible. 2024;17(52):e1256. doi: 10.55905/rdelosv17.n52-002.
Google Scholar
2
-
Scott C, Ahsan M, Albarbar A. Machine learning for forecasting a photovoltaic (PV) generation system. Energy. 2023;278:127807. doi: 10.1016/j.energy.2023.127807.
Google Scholar
3
-
Khandakar A, Chowdhury EH, Kazi MK, Benhmed K, Touati F, Al-Hitmi M, et al. Machine learning based photovoltaics (PV) power prediction using different environmental parameters of Qatar. Energies. 2019;12(14):2782. doi: 10.3390/en12142782.
Google Scholar
4
-
Lee D, Jeong JW, Choi G. Short term prediction of PV power output generation using hierarchical probabilistic model. Energies. 2021;14(10):2822. doi: 10.3390/en14102822.
Google Scholar
5
-
Kumar PM, Saravanakumar R, Karthick A, Mohanavel V. Artificial neural network-based output power prediction of grid- connected semitransparent photovoltaic system. Environ Sci Pollut Res. 2021;29(7):10173–82. doi: 10.1007/s11356-021-16398-6.
Google Scholar
6
-
Huang H, Xu Z, Shao X, Wismeijer D, Sun P, Wang J, et al. Multivariate linear regression analysis to identify general factors for quantitative predictions of implant stability quotient values. Williams JL (ed.). Plos One. 2017;12(10):e0187010. doi: 10.1371/journal.pone.0187010.
Google Scholar
7
-
Sharma I, Kakchapati S. Linear regression model to identify the factors associated with carbon stock in Chure forest of Nepal. Scientifica. 2018;2018:1–8. doi: 10.1155/2018/1383482.
Google Scholar
8
-
Somvanshi M, Chavan P, Tambade S, Shinde SV. A review of machine learning techniques using decision tree and support vector machine. International Conference on Computing Communication Control and automation (ICCUBEA). IEEE Xplore, 1–7. doi: 10.1109/ICCUBEA.2016.7860040.
Google Scholar
9
-
Suthaharan S. Decision tree learning. Mach Learn Models Algorithms Big Data Classif. 2016;36:237–69. doi: 10.1007/978-1-4899-7641-3_10.
Google Scholar
10
-
Ezenagu G. 2022. Engineering Education. Available from: http://www.webscale.com/engineering-education/.
Google Scholar
11
-
Pretorius A, Bierman S, Steel SJ. A meta-analysis of research in random forests for classification. 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), 2016. doi: 10.1109/robomech.2016.7813171
Google Scholar
12
-
European Commission. 2022. JRC Photovoltaic Geographical Information System (PVGIS)—European Commission. Available from: https://re.jrc.ec.europa.eu/pvg_tools/en/.
Google Scholar
13