Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data gaps sphinx documentation #135

Merged
merged 16 commits into from
Mar 3, 2022

Conversation

kperrynrel
Copy link
Member

@kperrynrel kperrynrel commented Feb 23, 2022

Description

This PR contains example documentation for the data_gaps module of PVAnalytics. It has example files for detecting interpolated data, missing data, and stale data. Related to #133.

Checklist

  • Clearly documented all new API functions with PEP257 and numpydoc compliant docstrings
  • Adds description and name entries in the appropriate "what's new" file
    in docs/whatsnew
    for all changes. Includes link to the GitHub Issue with :issue:`num`
    or this Pull Request with :pull:`num`. Includes contributor name
    and/or GitHub username (link with :ghuser:`user`).
  • Non-API functions clearly documented with docstrings or comments as necessary
  • Pull request is nearly complete and ready for detailed review

Copy link
Member

@kandersolar kandersolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needs a whatsnew entry -- I'd say merge #134 and then merge master into this branch so that the 0.1.2 file is available.

@kandersolar kandersolar added this to the v0.1.2 milestone Feb 24, 2022
@kandersolar kandersolar added the documentation Improvements or additions to documentation label Feb 24, 2022
@kperrynrel
Copy link
Member Author

@kanderso-nrel I've made the changes you suggested and added this entry to the whatsnew doc. Have a look and let me know if there's anything else you think looks wonky!

Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kperrynrel

# %%
# Now, use :py:func:`pvanalytics.quality.gaps.completeness_score` to get the
# percentage of daily data that isn't NaN.
data_completeness_score = gaps.completeness_score(data['value_normalized'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a few comments here about what the data are, e.g., what is 'value_normalized'? Is it any column of numbers or are there constraints? I think one assumption that is not so obvious is that the completeness score is a fraction of a 24 hour day, thus nighttime values are expected.

# interpolated periods. The time series we download is a normalized AC power
# time series from the PV Fleets Initiative, and is available via the DuraMAT
# DataHub:
# https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as above - add comments describing data requirements, e.g., is it assumed (or required) that the input AC power is serially complete? In this example, it would help to point out that there are gaps in the data which will be filled in by interpolation, for the purpose of creating a dataset with linear features to illustrate their identification. Otherwise, it may be confusing why we are interpolating in order to detect interpolations.

# stale data periods. The time series we download is a normalized AC power time
# series from the PV Fleets Initiative, and is available via the DuraMAT
# DataHub:
# https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

# %%
# Now, we use :py:func:`pvanalytics.quality.gaps.interpolation_diff` to
# identify linearly interpolated periods in the time series. We re-plot
# the data with this mask.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a remark that nighttime periods are identified as interpolated data, because there is little to no change in AC power at night.

plt.xlabel("Date")
plt.ylabel("Normalized AC Power")
plt.tight_layout()
plt.show()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment about stale periods at night.

@kperrynrel
Copy link
Member Author

@cwhanse I talked to @kanderso-nrel this afternoon about adapting our approach by storing a specific data set for stale data as well as interpolated data in the /data/ folder. I implemented this and I think this cleans up a lot of the issues (we no longer have to generate issues and then check for said issues with our functions). I also made updates to the docstrings based on your suggestions. Let me know what you think!

Copy link
Member

@kandersolar kandersolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM barring a few minor comments below

@kperrynrel kperrynrel merged commit c12b7e9 into pvlib:master Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants