-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data gaps sphinx documentation #135
Data gaps sphinx documentation #135
Conversation
…ed data, and check for daily data completeness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needs a whatsnew entry -- I'd say merge #134 and then merge master
into this branch so that the 0.1.2 file is available.
@kanderso-nrel I've made the changes you suggested and added this entry to the whatsnew doc. Have a look and let me know if there's anything else you think looks wonky! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kperrynrel
# %% | ||
# Now, use :py:func:`pvanalytics.quality.gaps.completeness_score` to get the | ||
# percentage of daily data that isn't NaN. | ||
data_completeness_score = gaps.completeness_score(data['value_normalized']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a few comments here about what the data are, e.g., what is 'value_normalized'? Is it any column of numbers or are there constraints? I think one assumption that is not so obvious is that the completeness score is a fraction of a 24 hour day, thus nighttime values are expected.
# interpolated periods. The time series we download is a normalized AC power | ||
# time series from the PV Fleets Initiative, and is available via the DuraMAT | ||
# DataHub: | ||
# https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment as above - add comments describing data requirements, e.g., is it assumed (or required) that the input AC power is serially complete? In this example, it would help to point out that there are gaps in the data which will be filled in by interpolation, for the purpose of creating a dataset with linear features to illustrate their identification. Otherwise, it may be confusing why we are interpolating in order to detect interpolations.
# stale data periods. The time series we download is a normalized AC power time | ||
# series from the PV Fleets Initiative, and is available via the DuraMAT | ||
# DataHub: | ||
# https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
# %% | ||
# Now, we use :py:func:`pvanalytics.quality.gaps.interpolation_diff` to | ||
# identify linearly interpolated periods in the time series. We re-plot | ||
# the data with this mask. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a remark that nighttime periods are identified as interpolated data, because there is little to no change in AC power at night.
plt.xlabel("Date") | ||
plt.ylabel("Normalized AC Power") | ||
plt.tight_layout() | ||
plt.show() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment about stale periods at night.
…he different issues, to avoid further confusion.
@cwhanse I talked to @kanderso-nrel this afternoon about adapting our approach by storing a specific data set for stale data as well as interpolated data in the /data/ folder. I implemented this and I think this cleans up a lot of the issues (we no longer have to generate issues and then check for said issues with our functions). I also made updates to the docstrings based on your suggestions. Let me know what you think! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM barring a few minor comments below
Description
This PR contains example documentation for the data_gaps module of PVAnalytics. It has example files for detecting interpolated data, missing data, and stale data. Related to #133.
Checklist
in
docs/whatsnew
for all changes. Includes link to the GitHub Issue with
:issue:`num`
or this Pull Request with
:pull:`num`
. Includes contributor nameand/or GitHub username (link with
:ghuser:`user`
).