-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not translate markers #281
base: master
Are you sure you want to change the base?
Conversation
034e8bc
to
03adec2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the USFM parsing classes are based on the implementations used in Paratext. As a result, I don't want to make any major changes to the way that these classes work. This ensures that Machine has the same level of compatibility as Paratext. It also makes it easier to port fixes over from Paratext. It would also introduce breaking changes to classes that other codebases use. It would be good if you could refactor this so that it builds on top of the USFM parsing classes rather than changing them.
Reviewable status: 0 of 17 files reviewed, all discussions resolved
I will have to think about how to implement this, but from what I am hearing, the "SubComponent" method of thinking and working is fine, but it should be implemented in the handlers, not the parsers directly. If that is the case, which files directly mirror the Paratext files (the USFM Tokenizer and USFM parser)? Can I get access to those Paratext files? Should we put a comment in the files saying that they are intended to mirror that parsing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct. It should be implemented in a handler.
There are three sets of classes that are based off of Paratext:
UsfmStylesheet
,UsfmTag
UsfmTokenizer
,UsfmToken
,UsfmAttribute
UsfmParser
,UsfmParserState
,IUsfmParserHandler
Paratext 9 is closed-source, so it is hard to directly link them to the corresponding code in Paratext. We can put comments on these classes to indicate that they are implemented to mimic the behavior of Paratext, so change them with caution.
Reviewable status: 0 of 17 files reviewed, all discussions resolved
That sounds good - I will work on the changes that you are requesting, will add the comments to the top of the files and try to fix the last few bugs. Other than that, I will assume that the general direction for the code is good. I may also try to add in sillsdev/serval#604 - making sure that tables are handled properly. |
21864af
to
20d0768
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you still working on this? I thought we decided to not change the behavior of the USFM parser classes.
Reviewed all commit messages.
Reviewable status: 0 of 17 files reviewed, all discussions resolved
Yes, I am refactoring it now. I just pushed some code to redo the naming first. |
20d0768
to
1451ae2
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #281 +/- ##
==========================================
+ Coverage 70.28% 70.33% +0.05%
==========================================
Files 385 385
Lines 32019 32121 +102
Branches 4504 4523 +19
==========================================
+ Hits 22503 22592 +89
- Misses 8471 8483 +12
- Partials 1045 1046 +1 ☔ View full report in Codecov by Sentry. |
@ddaspit - I made the updates. All tests pass. Please make sure that the "intent" is correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't gone through everything yet, but it is looking good so far. How does this align with @isaac091's work on preserving USFM markers? Will we need to redo the work here when that comes in? It might make sense to wait for those changes to come in before we do this, that way we can reduce churn and ensure that we have the right design.
One thing to note is that the term is "embed" and not "embedded".
Reviewed 6 of 14 files at r3, 1 of 3 files at r4.
Reviewable status: 7 of 17 files reviewed, all discussions resolved
I talked with @isaac091 and the code looked fairly different - we will need to collaborate hand haven't done so fully yet. After this goes through the first review, I will do the parallel Machine.py updates and then we can merge in both at the same time. |
db5a843
to
0b82cf0
Compare
Only translate ft text Configure preserving / stripping embedded and style markers Embed
0b82cf0
to
6108e67
Compare
* Nested embeds * If we strip embeds and there is an updated embed, strip it All tests pass
d08423a
to
ed41aef
Compare
@ddaspit - All machine and machine.py updates have been made - they match each other and all tests pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 17 files at r1, 1 of 14 files at r3, 14 of 14 files at r5, all commit messages.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @johnml1135)
src/SIL.Machine/Corpora/ScriptureRefUsfmParserHandlerBase.cs
line 25 at r5 (raw file):
private bool _inEmbed; public bool InNoteText { get; private set; }
This should be protected.
src/SIL.Machine/Corpora/ScriptureRefUsfmParserHandlerBase.cs
line 176 at r5 (raw file):
} public void StartEmbed(UsfmParserState state, string marker)
This method should be protected.
src/SIL.Machine/Corpora/ScriptureRefUsfmParserHandlerBase.cs
line 191 at r5 (raw file):
} public virtual void StartEmbed(UsfmParserState state, ScriptureRef scriptureRef) { }
This method should be protected.
src/SIL.Machine/Corpora/ScriptureRefUsfmParserHandlerBase.cs
line 193 at r5 (raw file):
public virtual void StartEmbed(UsfmParserState state, ScriptureRef scriptureRef) { } public virtual void EndEmbed(
This method should be protected.
src/SIL.Machine/Corpora/ScriptureRefUsfmParserHandlerBase.cs
line 271 at r5 (raw file):
protected virtual void EndNonVerseText(UsfmParserState state, ScriptureRef scriptureRef) { } public virtual void StartNoteTextWrapper(UsfmParserState state)
This method should be protected. Also, this method is named inconsistently with the corresponding EndNoteText
method.
src/SIL.Machine/Corpora/ScriptureRefUsfmParserHandlerBase.cs
line 280 at r5 (raw file):
protected virtual void StartNoteText(UsfmParserState state) { } public virtual void EndNoteText(UsfmParserState state)
This method should be protected.
src/SIL.Machine/Corpora/ScriptureRefUsfmParserHandlerBase.cs
line 384 at r5 (raw file):
} public bool InEmbed(string marker)
This should be protected. Also, it should be named IsInEmbed
.
src/SIL.Machine/Corpora/ScriptureRefUsfmParserHandlerBase.cs
line 389 at r5 (raw file):
} public bool IsInNestedEmbed(string marker)
This should be protected.
src/SIL.Machine/Corpora/UsfmStylesheet.cs
line 114 at r3 (raw file):
} private static IEnumerable<string> GetEmbeddedStylesheet(string fileName)
It looks like you accidentally renamed this method.
src/SIL.Machine/Corpora/UpdateUsfmParserHandler.cs
line 14 at r5 (raw file):
} public enum UpdateUsfmIntraVerseMarkerBehavior
This should be named UpdateUsfmMarkerBehavior
.
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
This attempts to address the following issues:
It also reworks the Note handling, making it so that it preserves the reference fields.
It also combines Notes, Figures and Cross References as "SubComponents"
There are still a few tests that need fixed.
There are significant parsing updates - and the tests were updated to match the updated behavior. @ddaspit - can you check to see if they are going in the right direction?
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)