Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-numeric seeds when using randomize in itemset nodeset expression #804

Merged
merged 4 commits into from
Oct 22, 2024

Conversation

lognaturel
Copy link
Member

@lognaturel lognaturel commented Oct 21, 2024

Closes #800

#801 did not have any effect fully address #800 because an exception was thrown when a non-numeric literal input was provided to randomize in an itemset nodeset expression (see tests in f31435c).

This brings the intended behavior from #801 to both the itemset nodeset context and any other expression in which randomize is called. It also switches to using the XPath parser rather than regex for identifying randomize calls and their seeds in itemset nodeset expressions.

What has been done to verify that this works as intended?

Added tests.

Why is this the best possible solution? Were any other approaches considered?

I went further than I initially expected so there's some risk here. I think it's the right balance of risk and getting the work to a consistent, clear state.

I started out by considering whether we should instead consistently throw when receiving input that gets coerced to NaN. I realized I’d still want to do the work to improve the ItemsetBinding case (use the parser, save the raw expression, etc). And I think it might add complexity in some cases like where the seed starts out blank and then gets populated. Again, the approach here feels like a good balance to me and I’m interested in what others think!

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

I think the changes to XFormParser are the riskiest. This PR changes the regex-based identification of randomize calls and seeds to using the XPath parser. It also allows any type of expression to be passed in as a seed whereas previously only numeric literal or path expressions were allowed. Any mistakes here could lead to changes in how existing seeds randomize choices.

Another area of risk is that the cached/serialized form definition has changed because now ItemsetBinding stores a single, more general representation of the seed. Now is a good time for this kind of change because we're working on a major release. Collect will now delete the form cache file and rebuild it on every app upgrade.

Do we need any specific form for testing your changes? If so, please attach one.

See issue

Does this change require updates to documentation? If so, please file an issue here and include the link below.

getodk/xforms-spec#318

Copy link
Contributor

@brontolosone brontolosone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#801 did not have any effect because an exception was thrown when a non-numeric input was provided to randomize in an itemset nodeset expression (see tests in f31435c).

I was under the impression that at least for the survey linked to in #801 this one there would be an effect.

Looks good (does what it should do); I'll be merging this so that I can pile some modifications to the tests you've added in a new PR.

item("a", "A"),
item("b", "B")
),
bind("/data/choice").type("string").calculate("selected-at(join(' ', randomize(instance('choices')/root/item/label, '1')), 0)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would award more Overtly Obvious points if

-randomize(instance('choices')/root/item/label, '1')
+randomize(instance('choices')/root/item/label, 'Any string here (thus also this one) results in a seed of 0')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, the following test would be more rigorous; using a larger choice list and testing its total sort order rather than just the 1st element of the randomized order makes it much less prone to serendipitous success.

diff --git a/src/test/java/org/javarosa/xpath/expr/RandomizeTest.java b/src/test/java/org/javarosa/xpath/expr/RandomizeTest.java
index 5c6fef782..9fbfe7184 100644
--- a/src/test/java/org/javarosa/xpath/expr/RandomizeTest.java
+++ b/src/test/java/org/javarosa/xpath/expr/RandomizeTest.java
@@ -217,21 +217,30 @@ public class RandomizeTest {
                 title("Randomize non-numeric seed"),
                 model(
                     mainInstance(t("data id=\"rand-non-numeric\"",
-                        t("choice")
+                        t("choices_numeric_seed"),
+                        t("choices_stringified_numeric_seed")
                     )),
                     instance("choices",
                         item("a", "A"),
-                        item("b", "B")
+                        item("b", "B"),
+                        item("c", "C"),
+                        item("d", "D"),
+                        item("e", "E"),
+                        item("f", "F"),
+                        item("g", "G"),
+                        item("h", "H")
                     ),
-                    bind("/data/choice").type("string").calculate("selected-at(join(' ', randomize(instance('choices')/root/item/label, '1')), 0)")
+                    bind("/data/choices_numeric_seed").type("string").calculate("join(' ', randomize(instance('choices')/root/item/label, 1234))"),
+                    bind("/data/choices_stringified_numeric_seed").type("string").calculate("join(' ', randomize(instance('choices')/root/item/label, '1234'))")
                 )
             ),
             body(
-                input("/data/choice")
+                input("/data/choices_numeric_seed"),
+                input("/data/choices_stringified_numeric_seed")
             )
         ));
 
-        assertThat(scenario.answerOf("/data/choice").getDisplayText(), is("B"));
+        assertThat(scenario.answerOf("/data/choices_numeric_seed").getDisplayText(), is(scenario.answerOf("/data/choices_stringified_numeric_seed").getDisplayText()));
     }
 
     private FormDef serializeAndDeserializeForm(FormDef formDef) throws IOException, DeserializationException {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

following this idea I've changed the tests in #805

@brontolosone brontolosone merged commit 9545fa3 into getodk:master Oct 22, 2024
3 checks passed
@lognaturel lognaturel deleted the randomize branch October 22, 2024 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Seed for randomization derived from non-numeric-looking string is always 0
2 participants