Due to the stochastic nature of photovoltaic (PV) power generation, there is high demand for forecasting PV output to better integrate PV generation into power grids. Systematic knowledge regarding the factors influencing forecast accuracy is crucially important, but still mostly unknown. In this paper, we review 180 papers on PV forecasts and extract a database of forecast errors for statistical analysis. We show that among the forecast models, hybrid models consistently outperform the others and will most likely be the future of PV output forecasting. The use of data processing techniques is positively correlated with the forecast quality, while the lengths of the forecast horizon and out-of-sample test set have negative effects on the forecast accuracy. We also found that the inclusion of numerical weather prediction variables, data normalization, and data resampling are the most effective data processing techniques. Furthermore, we found some evidence for “cherry picking” in reporting errors and recommend that the test sets be at least one year to better assess models’ performance. The paper also takes the first step towards establishing a benchmark for assessing PV output forecasts.
- Final_code is the source code used to analyse the database
- The "data" excel file has the database and the explanation for the variables.
- The data_limit_new covers the database in the format of R-data for analysis.