Tagging and NA's #4

LuciaSegovia · 2020-08-19T13:46:47Z

The issue with tagging is: some FCTs include values that are low quality, normally those items are marked with a range of special characters (best case scenario) like bracket, parenthesis , asterisk, etc. In other cases, they use italics, bold or colours...

I would like to have a way to account for that, so we can choose to use or not that values. There are several problems:

For those values marked with special characters, I can fix the issue (I hope) by creating a column, as you suggested before, to account for those values. If you have other suggestions, I'm happy to hear them.
For those values that are marked with font related modifications, I have no clue how to identify them because when I open the dataset in R, all fonts and colours are standardized removing all colour and other things. Do you have a solution for this?

Thanks again!

rbroth · 2020-09-28T10:22:49Z

For special characters, I can write some SQL code to extract them; I'm not worried about those. Or you can use

Font formatting is different, and a big problem (hence why you should avoid using formatting in excel). As far as i can see we can:

Contact the original authors and see if they have the data in a different format. Would take time and no certainty of success.
Re-code the data manually. Time+work intensive, though we might be able to speed this up e.g. by sorting the data on another column and doing multiple rows at a time.
Search for an R package that can read excel formatting. I don't have experience in R, so you're a better judge of how feasible this is.
Convert the formatting into a new column inside excel, using find/replace, VBA, and such. I think may be the most feasible option; there are tutorials on how to do this inside excel, though we may have to write some VBA code.

LuciaSegovia · 2020-09-28T12:49:19Z

Hi Roman!

Thank you very much for your suggestions. I will try with point 3, and use point 4 as plan B.

rbroth · 2020-09-28T13:41:21Z

If you can't find a promising package by the end of today, let me know and we can have a pair programming session over zoom tomorrow afternoon.

LuciaSegovia · 2020-09-28T17:09:07Z

I found a potential package :) I hope it's useful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tagging and NA's #4

Tagging and NA's #4

LuciaSegovia commented Aug 19, 2020

rbroth commented Sep 28, 2020

LuciaSegovia commented Sep 28, 2020

rbroth commented Sep 28, 2020

LuciaSegovia commented Sep 28, 2020

Tagging and NA's #4

Tagging and NA's #4

Comments

LuciaSegovia commented Aug 19, 2020

rbroth commented Sep 28, 2020

LuciaSegovia commented Sep 28, 2020

rbroth commented Sep 28, 2020

LuciaSegovia commented Sep 28, 2020