-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getDrugIngredientCodes and non UTF-8 characters #233
Comments
To be more exact, this seems to be an issue with the helper function tidyWords:
More specifically the following lines:
|
Hi @tleht I was looking into this, but the above worked fine on my machine so it could quite possibly be related to locale. I tweaked the code a little in the new release - can you please see if it is now working for you? library(CodelistGenerator)
packageVersion("CodelistGenerator")
#> [1] '3.3.2'
CodelistGenerator:::tidyWords("[ ¹⁸ F]AlF-NOTA-FAPI-04")
#> [1] "falf nota fapi 04" Created on 2025-01-28 with reprex v2.0.2 If you are still having problems could you please share the output from |
I tried running this with the latest release, but it still keeps running into the same issue with the character ⁸:
Sounds likely that this might have to do with the locale or generally our system environment as apparently none of the other nodes running the script in our project ran into this specific issue. Here are the
|
Ah sorry that hasn't worked @tleht, would using iconv like below work for you? concept_name <- "[ ¹⁸ F]AlF-NOTA-FAPI-04"
concept_name <- iconv(concept_name,
from = "UTF-8",
to = "UTF-8",
sub = "byte")
CodelistGenerator:::tidyWords(concept_name)
#> [1] "falf nota fapi 04" Created on 2025-01-29 with reprex v2.1.0 |
Already tried that back in December without any success.
|
hmmm @tleht how about library(stringi)
library(stringr)
concept_name <- "[ ¹⁸ F]AlF-NOTA-FAPI-04"
concept_name <- str_replace_all(concept_name, "[^\\x20-\\x7E]", "")
concept_name
#> [1] "[ F]AlF-NOTA-FAPI-04"
CodelistGenerator:::tidyWords(concept_name)
#> [1] "falf nota fapi 04" Created on 2025-01-29 with reprex v2.1.0 |
Thanks @edward-burn , that did the trick.
|
Fantastic, to go back to your original issue - will installing the branch below now get things working?
|
Yes, we now get the expected results when using the updated version of the function.
|
Wonderful, I will incorporate that in the next release (but leave the branch there until then so you can use it if needed in the meantime) Will close this issue when this is implemented in the next cran release |
Describe the bug
Calling the function getDrugIngredientCodes with the argument "name" specified returns the following error
The error is caused by the standard RxNorm Extension drug ingredient concept 1253507 "[ ¹⁸ F]AlF-NOTA-FAPI-04" present in our concept-table.
To Reproduce
getDrugIngredientCodes(cdm = cdm, name = "Adalimumab")
The text was updated successfully, but these errors were encountered: