-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
back to regexes in checkbox_choices()
#504
Comments
Looks like this works for all the scenarios except pipes in the label. For that to work it seems like you'd need to first split on the pipe, then check if each entry looks like a value,label pair (without any trimming). If it doesn't look like a pair its part of the label for the preceding option so add it back on. Of course, throw in commas and pipes into a label and there's no hope. |
wibeasley
added a commit
that referenced
this issue
Jul 15, 2023
This was referenced Jul 15, 2023
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@BlairCooper, maybe a hybrid of the two approaches should replace #500. (These attempts are still in dev branches --nothing has been pulled into the main branch yet.)
Maybe we still use a two-stage approach, where
sptrsplit()
uses pipes to separate the choices/rows. Then use our own regex to separateid
fromlabel
.I gave up on
readr:read_csv()
for this purpose because I couldn't get it working if the label contained commas. I'd have to preprocess the entries to enclose the label in quotes forread_csv()
to operate as intended --but we'd need a regex to do that. So we might as well use a single regex to separate them into columns and avoidread_csv()
. Tell me if anyone disagrees or sees something I'm overlooking.@BlairCooper, here's the heart of the function's current version (still on a dev branch). I tried to avoid using
base::trimws()
and had a working regex (ie,^\\s*+(.+?),\\s*+(.*?)\\s*$
). But I couldn't figure out how to drop empty rows, like your scenario in #502.You mentioned that this approach is 0.5sec vs 0.3sec. I think I'm ok with that because moving data across the network always dwarfs the computation time on the client machine.
Are there any scenarios this approach can't handle correctly? The only outstanding case is if the label has pipes (#503). In theory the
strsplit()
could use a regex to distinguish pipes between choices versus pipes within labels. I haven't figured out how though, because theid
is so flexible (eg, letter(s) & number(s)).@BlairCooper or anyone else, any thoughts on performance, accuracy, or maintainability?
The text was updated successfully, but these errors were encountered: