Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Pattern functions #25

Open
wdkrnls opened this issue Sep 9, 2020 · 3 comments
Open

Feature Request: Pattern functions #25

wdkrnls opened this issue Sep 9, 2020 · 3 comments

Comments

@wdkrnls
Copy link

wdkrnls commented Sep 9, 2020

Great package! This saves me from having to leave R for many tasks. I'm curious if you think it would be reasonable to support pattern functions similar to those provided by the TXR pattern munging language https://www.nongnu.org/txr/? This would be in addition to regular expressions. For example, I might want to ensure a patterns could be matched against a vector of options.

Imagine I have the data:

The green lawn.
The red chair.
The blue box.
The grey cat.

I define the pattern function:

known_color = function(x) x %in% c("green", "red", "blue", "grey")

Then I can extract like:

unglue_data("The {color=known_color} {object}.")

Thanks for your consideration and great work!

@moodymudskipper
Copy link
Owner

moodymudskipper commented Sep 16, 2020

Thanks!

I didn't know txr. It would be nice to be able to use it as is but I didn't find any interface in R.

You say unglue allows you not to leave R, when you did have to leave R, was it to use txr ?

A link for future ref : https://www.nongnu.org/txr/txr-pattern-language.html

Your proposed syntax can't work as is because it should match the exact string "known_color" here. Also as I believe you allude to, it works on top of regular expressions so there needs to be a spot to mention this regex.

Given the function should return a boolean we could use the /character to mean "if" like in probability theory. So we'd have:

unglue_data(input, "The {color/known_color} {object}.")

Or with explicit regex :

unglue_data(input, "The {color/known_color=.*?} {object}.")

Would it answer your needs? Do you think it's intuitive?

Note: I can't do :

unglue_data(input, "The {color=.*?/known_color} {object}.")

Because it doesn't unambiguously tell me the regex isn't the full ".*?/known_color"

@moodymudskipper
Copy link
Owner

Note that this example can be solved with :

unglue_data(input, "The {color=(green)|(red)|(blue)|(grey)} {object}.")

Or if we want to define it separately :

known_color_pattern <- "(green)|(red)|(blue)|(grey)" 
unglue_data(input, sprintf("The {color=%s} {object}.", known_color_pattern)) 

Can you think of a use case where the above wouldn't be satisfying? I prefer not to complexity unglue if the added value is not clear.

@wdkrnls
Copy link
Author

wdkrnls commented Sep 25, 2020

I gave a poor example. Enumerating known cases is pretty convenient to do in R as you have shown. However, the TXR pattern function approach is way more powerful when you cannot enumerate the options and they cannot be described by a regular expression. I really liked your conditional syntax for boolean functions with /. That would be getting far closer to the power of the TXR approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants