Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Before release: DQA of the released data! #56

Open
JamesOwers opened this issue Oct 9, 2019 · 4 comments
Open

Before release: DQA of the released data! #56

JamesOwers opened this issue Oct 9, 2019 · 4 comments
Assignees
Milestone

Comments

@JamesOwers
Copy link
Owner

JamesOwers commented Oct 9, 2019

We should do some data quality analysis of the data we are going to release. I'm thinking a notebook (also doubles as an intro to what data are available for use) which reviews the data by:

  • Playing a selection of degraded and clean excerpts
    • Any issues with data? Choppy? Did flattening tracks work well?
    • Are degradations obvious? Are there better parameters for degradations to use?
  • providing stats about number of notes in those excerpts, lengths of notes, and the actual amount of time these notes occur in etc.
    • This will inform the correct seq_len to use for models (may be worth excluding silly long excerpts)
  • giving some background as to where these data are from and, if possible, some summary stats about genre, or tempo, or whatever we can glean
  • Summarise performance broken doWn over datasets (info available in metadata)

Essentially I want to check that the data are not rubbish, and we can hear where the degradations are!

@apmcleod
Copy link
Collaborator

apmcleod commented Oct 9, 2019

Make sure to check for very short note (that may have been introduced by overlap checks).

@JamesOwers
Copy link
Owner Author

I think it will be best to add this as a notebook to the ACME repo too. Keeping issue here. Do this in conjunction with #129

@apmcleod
Copy link
Collaborator

Can be closed.

@JamesOwers
Copy link
Owner Author

Actually, I'd like to keep this. I've done some basic looks at the data, but haven't addressed specific things in the description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants