You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Heavily related to #16. For this item, we want a bot that can detect which datasets should be inspected for removal. Datasets should be slated for removal if it is clear that the dataset was never intended to be shared to be public for use in ML experiments.
There are quite a few datasets on OpenML which are uploaded by users that should not be on the production server. This includes users uploading datasets to test upload functionality, users that made mistakes on initial uploads so uploaded newer versions, and so on.
Besides a bad title and description, other indications may also be: having no tasks, or only tasks without runs. A good title and description that is duplicate from existing datasets. It may not always be obvious, and it's ok if the bot misses some of the poor quality data. It is important that the bot has a relatively high precision, as each flagged dataset will require a human to asses if deactivation/deletion is warranted.
This is also true for studies.
Besides flagging the dataset, the bot should be able to generate a small report explaining why the dataset may be considered for removal.
The text was updated successfully, but these errors were encountered:
Heavily related to #16. For this item, we want a bot that can detect which datasets should be inspected for removal. Datasets should be slated for removal if it is clear that the dataset was never intended to be shared to be public for use in ML experiments.
There are quite a few datasets on OpenML which are uploaded by users that should not be on the production server. This includes users uploading datasets to test upload functionality, users that made mistakes on initial uploads so uploaded newer versions, and so on.
Besides a bad title and description, other indications may also be: having no tasks, or only tasks without runs. A good title and description that is duplicate from existing datasets. It may not always be obvious, and it's ok if the bot misses some of the poor quality data. It is important that the bot has a relatively high precision, as each flagged dataset will require a human to asses if deactivation/deletion is warranted.
This is also true for studies.
Besides flagging the dataset, the bot should be able to generate a small report explaining why the dataset may be considered for removal.
The text was updated successfully, but these errors were encountered: