-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Article Viewer displays raw wikicode for "Health effects of electronic cigarettes" #5957
Comments
Hello @ragesoss, I would like to try working on this! |
@empty-codes go for it. This one may be a challenge, as I'm not sure which codebase is ultimately responsible for the error. There's almost certainly something about the wikicode for these example articles that is triggering the bug, so just knowing precisely what triggers it would be helpful. |
Hi @ragesoss and @empty-codes, I hope you're doing well! I wanted to share an observation I made while looking into this bug earlier today. It seems that for the articles where this issue occurs most frequently, a) Opening tag: For example in the actual HTML output, this appear as: Interestingly, in articles where this bug does not occur, Additionally, I've noticed that the bug tends to happen in articles where Please consider these as initial thoughts— as I’ve been trying to understand how WikiWho algorithm works. Hope my explanation was clear Good luck with solving the bug @empty-codes! |
@Abishekcs Thank you so much for your insights; they were really helpful for getting started. Firstly, I found that there are actually two different APIs involved:
For this issue, the WhoColorAPI is the one we're concerned with. Flow:
Here's the difference between the three props:
@Abishekcs identified that the problem seems to occur because WikiWho is incorrectly outputting Note that the MediaWiki parse action returns a The fact that some articles display correctly while others do not suggests that there may be inconsistencies in how the WhoColor API processes certain revisions of articles. So the questions are:
If it is the Additionally, @Abishekcs also noticed that the bug tends to happen in articles where contributors have also added citations. In this specific article: UCSF Foundations II, in the parse action response, there were parse warnings: parsewarnings[
"Script warning: <span style=\"color:#3a3\">One or more <code style=\"color: inherit; background: inherit; border: none; padding: inherit;\">{{[[Template:cite journal|cite journal]]}}</code> templates have maintenance messages</span>; messages may be hidden ([[Help:CS1_errors#Controlling_error_message_display|help]])."
] Since the bug appears more frequently in articles with numerous citations, it’s possible that these templates are not being parsed correctly. However, these parse warnings are inconsistent (and probably irrelevant) because they were present in another article that does not have this bug and also absent in another article that has this bug. Keeping all these in mind, I will continue further investigation. Hopefully I can pinpoint a cause soon😅 |
@empty-codes, I believe it's the whoColorHtml. I'm mentioning this because 😅 I reviewed the raw HTML output for whoColorHtml, and here's a small screenshot below. However, it might be a good idea to cross-check just to be sure. |
@Abishekcs That answers the question, thank you! I'll also crosscheck from my end. |
I was stuck trying to find a lead for a while 😅 but I finally got somewhere (I think). For the Health effects of electronic cigarettes article:The NewPP limit report for the parsed verThe NewPP limit report for the highlighted authorship verFor the Hispanic and Latino Americans article:The NewPP limit report for the parsed verThe NewPP limit report for the highlighted authorship verThe key issue seems to be that the templates are not being expanded, as indicated by the 0 bytes in the Post-expand include size and Template argument size fields, as well as the minimal expansion depth. This page provides more context about the meaning of the terms. At this point, I would like to ask for further guidance @Abishekcs @ragesoss. What steps should I take from here, please? Thank you in advance! |
That's interesting. The template expansion seems like a good clue, it's not obvious to me whether an expansion limit is involved, or whether it's being misparsed for some other reason. The unparsed I can't tell whether that whoCOLOR repository is indirectly used for this. The main repo for the wikiwho-api servers is https://github.com/wikimedia/wikiwho_api |
Noted! I will update you on any new findings @ragesoss |
@ragesoss I successfully set up the Correct:
In extended_html:
I traced this issue to the parser logic in To fix this, I modified the While this change successfully eliminated the unwanted Both the parser = WikiMarkupParser(wiki_text, whocolor_data['tokens'])
parser.generate_extended_wiki_markup()
extended_html = wp_rev_text_obj.convert_wiki_text_to_html(parser.extended_wiki_text) I am currently investigating whether it is caused by the parser logic or token insertions or anything else. |
Hello @ragesoss, Here is my current update. Initially, I attempted to resolve the issue of the parser prematurely closing templates upon encountering However, this approach resulted in another bug involving duplicate {
'type': 'single',
'start_regex': re.compile(r'{{!}}'),
'end_regex': None,
'no_spans': True,
'no_jump': False
}, Note: I am aware the changes here are not permanent because the parser and special_markups py files are actually site packages/dependencies in a path like so: This modification effectively eliminated the unwanted Both the parser = WikiMarkupParser(wiki_text, whocolor_data['tokens'])
parser.generate_extended_wiki_markup()
extended_html = wp_rev_text_obj.convert_wiki_text_to_html(parser.extended_wiki_text) By changing the argument of the The following are steps I have taken in investigating the bug:
In conclusion, I cannot pinpoint a cause because:
Would you recommend I continue working on this issue? I'm sure there's something I am missing but I cannot pinpoint exactly what it is. Thank you for your patience! |
@empty-codes thanks! this is really useful documentation of your debugging work. I suggest leaving this one; hopefully we can find the next clue at a later time, but it's a relatively rare bug. I just checked the second example with the Who Wrote That? tool on Wikipedia, and it also displays this buggy behavior (which makes sense based on your debugging, as it's clearly a problem with the WikiWho processing). So we can be pretty confident now that it's not a bug in our codebase. One really useful way to wrap this up would be to open an issue on Phabricator against the Who-Wrote-That project, summarizing what you've learned about the like source of the bug within the WikiWho parser. There are some other issues there already related to pages that don't work as expected, but I don't see any that are clearly the same issue here, and I didn't spot anything along the lines of what you've done here to narrow down the source. |
@ragesoss I've created the Phabricator issue here: https://phabricator.wikimedia.org/T377898 While I wasn't able to completely pinpoint the source of the bug, I learned a lot throughout the process and I'm glad this documentation will be useful. Thanks for your guidance throughout this process! 🙏 |
Another example: "Donald Trump 2024 presidential campaign" |
Visit here and wait for the authorship highlighting to load: https://dashboard.wikiedu.org/courses/UCSF/Foundations_II_(Summer_2024)/articles/edited?showArticle=52260526
Once it loads, the rendered article is replaced by highlighted wikicode:
Additional context
The ArticleViewer initially loads the parsed version of the current article, and requests the authorship data from the wikiwho server. Once received, the wikiwho data (which is annotated wikicode) gets processed by Dashboard code to add CSS classes on a per-author basis, it is sent to mediawiki to parse. No explicit errors are occurring in this example, either in the JS console or in network requests, but the call to the mediawiki API
parse
action is returning unparsed wikicode. One possible explanation is that the Dashboard code that operates on the wikiwho data is mishandling some particular aspect of this page's wikicode, resulting in a version that can't be parsed properly by mediawiki.The text was updated successfully, but these errors were encountered: