Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagging and NA's #4

Open
LuciaSegovia opened this issue Aug 19, 2020 · 4 comments
Open

Tagging and NA's #4

LuciaSegovia opened this issue Aug 19, 2020 · 4 comments

Comments

@LuciaSegovia
Copy link
Collaborator

Hi @rbroth

The issue with tagging is: some FCTs include values that are low quality, normally those items are marked with a range of special characters (best case scenario) like bracket, parenthesis , asterisk, etc. In other cases, they use italics, bold or colours...

I would like to have a way to account for that, so we can choose to use or not that values. There are several problems:

  1. For those values marked with special characters, I can fix the issue (I hope) by creating a column, as you suggested before, to account for those values. If you have other suggestions, I'm happy to hear them.

  2. For those values that are marked with font related modifications, I have no clue how to identify them because when I open the dataset in R, all fonts and colours are standardized removing all colour and other things. Do you have a solution for this?

Thanks again!

@rbroth
Copy link

rbroth commented Sep 28, 2020

For special characters, I can write some SQL code to extract them; I'm not worried about those. Or you can use

Font formatting is different, and a big problem (hence why you should avoid using formatting in excel). As far as i can see we can:

  • Contact the original authors and see if they have the data in a different format. Would take time and no certainty of success.

  • Re-code the data manually. Time+work intensive, though we might be able to speed this up e.g. by sorting the data on another column and doing multiple rows at a time.

  • Search for an R package that can read excel formatting. I don't have experience in R, so you're a better judge of how feasible this is.

  • Convert the formatting into a new column inside excel, using find/replace, VBA, and such. I think may be the most feasible option; there are tutorials on how to do this inside excel, though we may have to write some VBA code.

@LuciaSegovia
Copy link
Collaborator Author

Hi Roman!

Thank you very much for your suggestions. I will try with point 3, and use point 4 as plan B.

@rbroth
Copy link

rbroth commented Sep 28, 2020

If you can't find a promising package by the end of today, let me know and we can have a pair programming session over zoom tomorrow afternoon.

@LuciaSegovia
Copy link
Collaborator Author

I found a potential package :) I hope it's useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants