Skip to content

Initial zotero cleaning

Olly Butters edited this page Aug 7, 2020 · 3 revisions

Since the data in zotero is the source data the pipeline uses it needs to be as clean as possible. Below are some sources of uncleanness we have come across.

Duplicates

Merge these, keeping the most complete info you can. Right click on the collection and select show duplicates. These might come from literally having the same thing twice, or both an initial and final version of the same publication (e.g. working paper/preprint and the refereed version)

Attachments

These don't need to be here, and one of the issues we had in the pipeline did come from them (this may not be the case now though). Sort by the paperclip and delete the attachment - not the actual paper entry though.

Child items

Sometimes the bulk import process will result in some entries having child stub entries, these will have limited information in them - e.g. just a title and a DOI. Find these by looking at the leftmost column in the list of items, any with a child will have an icon to expand the item. Just delete them.