Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parsing for problem answers #365

Closed
bmtcril opened this issue Nov 3, 2023 · 8 comments · Fixed by #366
Closed

Improve parsing for problem answers #365

bmtcril opened this issue Nov 3, 2023 · 8 comments · Fixed by #366
Assignees

Comments

@bmtcril
Copy link
Contributor

bmtcril commented Nov 3, 2023

While digging through problems with parsing array answer for openedx/tutor-contrib-aspects#511 I found the Insights parsing code for dealing with problem_check answers, which highlighted a number of issues that we will likely encounter replaying old tracking logs and a more diverse problem pool.

Things that I don't believe we are currently taking into account, but should:

  • Arrays are currently serialized strings of Python objects and need to be parsed as JSON
  • We may not be handling HTML in responses in a way that makes sense to display
  • We are not taking variant into account, I'm not sure if we should
  • We should investigate "hidden" answers

The task here is probably to poach as much of the Insights code as makes sense, updated it to current Python 3 / coding standards, and update transforms as necessary.

@Ian2012
Copy link
Contributor

Ian2012 commented Nov 3, 2023

cc @felipemontoya

@bmtcril
Copy link
Contributor Author

bmtcril commented Nov 6, 2023

This is not as bad as I feared. Almost all of the issues are related to very old (before March 2014) events that didn't have the submission key:

Addressed in the associated draft PR:

  • Lists are currently serialized repr's of Python objects and need to be serialized as JSON

Only impacts those very old events:

  • We are not taking variant into account, I'm not sure if we should
  • We should investigate "hidden" answers
  • We may not be handling HTML in responses in a way that makes sense to display

I think it's reasonable to leave the basic way we handle very old events, as even the Insights code was only marginally successful in correctly parsing them and it adds a lot of code and complexity.

@Ian2012 @felipemontoya @pomegranited I'd be happy to hear any feedback you have on this.

@pomegranited
Copy link
Contributor

@bmtcril I think your PR catches the vast majority of cases 👍

  • We are not taking variant into account, I'm not sure if we should

Ya me neither.. maybe a question for @crathbun428 and the power users?

  • We should investigate "hidden" answers

Sure.. The Insights code shows that it skips anything it regards as "hidden" (ref 1, 2) -- should we do the same?

  • We may not be handling HTML in responses in a way that makes sense to display

Another power user question, I think, and one best left for a specific use case.

@bmtcril
Copy link
Contributor Author

bmtcril commented Nov 7, 2023

@pomegranited Yeah, and again these points should only be applicable for pre-2014-03 events, I'm not even sure there is another instance besides edx.org that has events back that far.

@crathbun428
Copy link

@pomegranited - Thanks for looping me in - I just spoke with @bmtcril about taking variant into account. And from our conversation I learned that variants info was something we had in place up until March of 2014 and the earliest Open edX instance that we would have stood up (outside of edX) would have been January of 2014 at the earliest. With this information in mind, I think we'll be OK to not take variants into account here as almost zero instances would have this data. Please let me know if you have any concerns about this.

@bmtcril
Copy link
Contributor Author

bmtcril commented Nov 7, 2023

I just dug into this a little more, and while it's true that we're not parsing out the variant from the old event types, we're not actually using it where we have it, which is probably what @pomegranited meant. I'd believed it was part of the submission id, but it looks like even that's only being passed on multi-problem submissions anyway. Still looking into it, but I think we'll need a new addition to the statement to handle that.

@pomegranited
Copy link
Contributor

@bmtcril I think it's ok to ignore variant. If somebody has a compelling use case, they can tell us how it should be used and we can fix it :)

@bmtcril
Copy link
Contributor Author

bmtcril commented Nov 8, 2023

Yeah I think where it will impact things is the answer distribution chart, where there will be a bunch of different correct answers, based on the given seed / variant. I've tested this locally and can confirm we're not getting the variant passed through. The few instructors who use this might will probably want to filter by that, but I've made a separate ticket to track that and we can try to find out more through product interviews. It's worth noting that a quick scan indicates we'd need to create our own xAPI concept for this, which makes it slightly more difficult to implement.

I believe hidden answers are filtered out of the submissions and so would only impact that subset of very early events. I'm going to say that for now that's ok for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants