-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate alignment deuterocanonical books #284
base: master
Are you sure you want to change the base?
Investigate alignment deuterocanonical books #284
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #284 +/- ##
==========================================
+ Coverage 70.28% 70.32% +0.04%
==========================================
Files 385 386 +1
Lines 32019 32036 +17
Branches 4504 4507 +3
==========================================
+ Hits 22503 22529 +26
+ Misses 8471 8463 -8
+ Partials 1045 1044 -1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick scan of the PR looking more for coding issues, but I hope to look at the Google Doc and look over the issue and see if I have any big picture feedback. Thanks, Mudi!
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1273 at r1 (raw file):
}; Versification.Table.Implementation.RemoveAllUnknownVersifications();
Why is this line needed?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
if (!string.IsNullOrEmpty(mismatchReport)) { TestContext.WriteLine("Mismatches found:");
Shouldn't this test fail if there are mismatches? Could we pass the mismatchReport
variable as the message in the exception that's being thrown?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1411 at r1 (raw file):
public void ValidateSourceAndTargetReferencesWithBackup(string bookId, string verseRef) { string sourceCorpusFile = CorporaTestHelpers.TestDataPath + "/Source - LAT.zip";
It's in general not a good idea to commit the binary zip files. We've been moving toward checking in the unzipped Paratext projects. You can just use ParatextTextCorpus
instead of ParatextBackUpTextCorpus
.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1427 at r1 (raw file):
); ParallelTextCorpus parallelCorpus = new ParallelTextCorpus(sourceCorpus, targetCorpus, alignments)
Why not use the AlignRows
function?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1512 at r1 (raw file):
public void ValidateReferencesWithAllVersifications(ScrVers versification, string bookId, string verseRef) { string sourceCorpusFile = CorporaTestHelpers.TestDataPath + "/Source - LAT.zip";
Maybe some of this could be moved to a SetUp
function if it's consistent across all the tests?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1596 at r1 (raw file):
AllTargetRows = true }; ParallelTextRow[] rows = parallelCorpus.ToArray();
You should be able to just do parallelCorpus.GetRows(new string[] {bookId})
instead of using the Where
statement.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1710 at r1 (raw file):
ScrVers versification = versificationType switch { ScrVersType.Original => ScrVers.Original,
I think you have a helper function below that does this, right?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1800 at r1 (raw file):
Dictionary<string, string> expectedMappings = new Dictionary<string, string> { // { "S3Y 1:1-29", "DAG 3:24-52" },
Is there a reason to leave these commented out rows in?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1897 at r1 (raw file):
// Clean and normalize content for comparison string[] unwanted = { "÷" };
Why remove these before comparison? Aren't the source and target the same?
tests/SIL.Machine.Tests/Corpora/TestData/Source - LAT.zip
line 0 at r1 (raw file):
❗ If this is the source text I sent you (and not the one Peter offered to construct), it's not public domain and all record of it needs to be removed from the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @Enkidu93)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1411 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
It's in general not a good idea to commit the binary zip files. We've been moving toward checking in the unzipped Paratext projects. You can just use
ParatextTextCorpus
instead ofParatextBackUpTextCorpus
.
Noted. Will need a little help on that.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1512 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Maybe some of this could be moved to a
SetUp
function if it's consistent across all the tests?
Noted. Will refactor
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1596 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
You should be able to just do
parallelCorpus.GetRows(new string[] {bookId})
instead of using theWhere
statement.
Noted. Will refactor
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1710 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I think you have a helper function below that does this, right?
Yes. Will refactor
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1800 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Is there a reason to leave these commented out rows in?
Yes. They are also mappings found in the vrs file. Well, I will take them out. One mapping is sufficient for the test.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1897 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Why remove these before comparison? Aren't the source and target the same?
No they are not the same due to the presence of that character in the target files
tests/SIL.Machine.Tests/Corpora/TestData/Source - LAT.zip
line at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
❗ If this is the source text I sent you (and not the one Peter offered to construct), it's not public domain and all record of it needs to be removed from the repo.
Noted. Will do that
Previously, mudiagaobrikisil wrote…
You should be able to just extract the binary like a normal .zip file and put it in the repo. See here for an example of the unzipped project being used. If that doesn't clear it up, let me know & we can chat. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @Enkidu93)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1411 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
You should be able to just extract the binary like a normal .zip file and put it in the repo. See here for an example of the unzipped project being used. If that doesn't clear it up, let me know & we can chat.
I understand that part. However, since you said the files are not for public domain yet, that is why I need to help and some explanations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 4 files reviewed, 11 unresolved discussions (waiting on @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1916 at r2 (raw file):
Versification = versification }; ParatextBackupTextCorpus targetCorpus = new ParatextBackupTextCorpus(sourceCorpusFile)
You use sourceCorpusFile for both sourceCorpus and targetCorpus, so shouldn't they be the same below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 4 files reviewed, 11 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Shouldn't this test fail if there are mismatches? Could we pass the
mismatchReport
variable as the message in the exception that's being thrown?
@johnml1135 suggested I log the issues out and let the tests pass gracefully, since we are not immediately addressing the issues and it would affect the CI/CD pipeline if the tests fail.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1427 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Why not use the
AlignRows
function?
Done
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1916 at r2 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
You use sourceCorpusFile for both sourceCorpus and targetCorpus, so shouldn't they be the same below?
Refactored to use one Corpora
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r2, 30 of 30 files at r3, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @johnml1135 and @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1273 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Why is this line needed?
So why include this line?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, mudiagaobrikisil wrote…
@johnml1135 suggested I log the issues out and let the tests pass gracefully, since we are not immediately addressing the issues and it would affect the CI/CD pipeline if the tests fail.
OK, sounds good
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 81 at r3 (raw file):
/// </summary> /// <returns>A tuple containing the source corpus (first) and target corpus (second).</returns> public static (ParatextTextCorpus sourceCorpus, ParatextTextCorpus targetCorpus) GetDeuterocanonicalCorpora()
Is this used/necessary?
Previously, Enkidu93 (Eli C. Lowry) wrote…
I may have misspoken - if there is a mismatch, we should fail and address the underlying issue. If it becomes too difficult, we can create a new issue to document the issue, but no one will ever look at the text output here. Please make the tests fail and if there are failures, we should understand the issue fully (and document in new issues) before knowledgeably removing tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 29 of 35 files reviewed, 3 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 81 at r3 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Is this used/necessary?
It is not necessary. It has been removed
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1273 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
So why include this line?
I think it's not needed. I have removed it.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
I may have misspoken - if there is a mismatch, we should fail and address the underlying issue. If it becomes too difficult, we can create a new issue to document the issue, but no one will ever look at the text output here. Please make the tests fail and if there are failures, we should understand the issue fully (and document in new issues) before knowledgeably removing tests.
Actually the expected behavior of the test is that it should fail. The mismatch is deliberate and failing is the right response if there is a mismatch. I don't think there is an underlying issue here. There would be if the test does not catch the mismatch. However, if I let the test fail, since it is a test that should fail, would it not affect the CI/CD?
Previously, mudiagaobrikisil wrote…
If a mismatch is the expected behavior, the test should only pass if there is a mismatch. Ideally, the main boundary matches and mismatches would be asserted in these tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 6 files at r4, all commit messages.
Reviewable status: 34 of 35 files reviewed, 1 unresolved discussion (waiting on @johnml1135 and @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
If a mismatch is the expected behavior, the test should only pass if there is a mismatch. Ideally, the main boundary matches and mismatches would be asserted in these tests.
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 34 of 35 files reviewed, 1 unresolved discussion (waiting on @Enkidu93)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
^
@johnml1135 . Thank you. I have edited the test to ensure that it passes only if it catches the matches and the mismatches correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 34 of 35 files reviewed, 1 unresolved discussion (waiting on @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1280 at r4 (raw file):
"vers.txt", ScrVers.English, "customVersification"
Why change the name?
Previously, mudiagaobrikisil wrote…
Great - I look forward to seeing the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 34 of 35 files reviewed, 2 unresolved discussions (waiting on @Enkidu93 and @mudiagaobrikisil)
investigate possible alignment and versification issues with deuterocanonical books
83fc7a3
to
e193846
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 30 of 35 files reviewed, 2 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Great - I look forward to seeing the changes.
Changes pushed.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1280 at r4 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Why change the name?
This was an old change just to debug what the error was. I have refactored it to handle the error properly. The error was that it was looping through the books I specified and then trying to load a versification with the same key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 30 of 35 files reviewed, 2 unresolved discussions (waiting on @johnml1135 and @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1280 at r4 (raw file):
Previously, mudiagaobrikisil wrote…
This was an old change just to debug what the error was. I have refactored it to handle the error properly. The error was that it was looping through the books I specified and then trying to load a versification with the same key.
OK, thank you 👍
Fixes sillsdev/serval#478 |
Previously, mudiagaobrikisil wrote…
Thanks for the updates Mudi! As I look at them, I notice that you are just counting the number of mismatches, not verifying each proper match or mismatch. Can you verify each one in a way that makes it clear what each match or mismatch is intending to verify (either self-documented or with comments as needed)? |
Eli - just for my sake, can you place a comment next to each test that verifies each "issue" from the word doc - that is, the ones that need to be verified: https://docs.google.com/document/d/1lV3Sa0IDxFkx8wcQ73xeBg8hs9A60OstzvqvOCOPw9o/edit?tab=t.0? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. @johnml1135! Issue 1, I think is covered at a book level indirectly many places but more directly in GetVersesInVersification_ButNotInSourceOrTarget
and Issue 2 is covered by GetVersesInSourceOrTarget_ButNotInVersification
. Is that right, Mudi? Without digging into the specific USFM files, it's hard to tell. Could you verify that that test covers both with specific examples.
Issues 3 and 4 don't require any action in the code. Issue 5, I believe, is already addressed in existing tests outside this PR indirectly. Would you like a test specific to the deuterocanon related to this, @johnml1135?
Issue 6: I'm not sure that this is covered; I'm also not 100% sure which cases this is meant to cover. Are you concerned about multiple verses mapping to the same verse?
Sorry about all the new items to address - I seemed to have missed some of the changes the first time through / forgot to get back to them. Thanks, Mudi!
Reviewable status: 30 of 35 files reviewed, 11 unresolved discussions (waiting on @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1179 at r5 (raw file):
new MemoryText("Baruch", new[] { TextRow("Baruch", 5, "target segment 5 .") }), new MemoryText("1Maccabees", new[] { TextRow("1Maccabees", 6, "target segment 6 .") }), new MemoryText("2Maccabees", new[] { TextRow("2Maccabees", 7, "target segment 7 .") })
I don't believe that this is testing what you think it's testing since you're just passing a number as the row ref, not the actual refs that would exist for these books' rows.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1193 at r5 (raw file):
[Test] public void GetRows_AllDeuterocanonicalBooks_WithAlignments()
If you have this test, I don't think you really need the previous one at all.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1332 at r5 (raw file):
AllTargetRows = true }; ParallelTextRow[] rows = parallelCorpus.ToArray();
You can just iterate through the parallelCorpus - no need to call ToArray
here, I don't think.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1414 at r5 (raw file):
[TestCase("SUS", "SUS 1:1", Description = "Validate SUS Source and Target References")] [TestCase("BEL", "BEL 1:1", Description = "Validate BEL Source and Target References")] public void ValidateSourceAndTargetReferencesForDeuterocanonicals(string bookId, string verseRef)
It's conceivable that this could get messed up during alignment but more so in the case when either the target or source is missing. I think you could remove this test and put more specific asserts in the previous test to better cover this.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1499 at r5 (raw file):
} if (issues.Any())
Like above, this test should fail if there's an issue. You should have specific asserts.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1741 at r5 (raw file):
} Assert.Pass("Test completed");
Same as above. Have specific asserts, not just a report. I think there was some miscommunication about this, so no worries :). It should make things a lot simpler too!
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 141 at r5 (raw file):
} public static (string Book, int Chapter, int StartVerse, int EndVerse, bool IsSingleVerse) ParseRange(string range)
This can be made private
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 162 at r5 (raw file):
/// Removes unwanted characters in a corpus string. /// </summary> public static string CleanString(string input, string[] unwanted)
Remind me why this is needed? You only use this once to remove the division symbol, but since you're removing it from both source and target, couldn't you just as well leave it in and the comparison would still be the same?
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 174 at r5 (raw file):
/// Replace multiple spaces with a single space. /// </summary> public static string NormalizeSpaces(string input)
Same as above - not sure this is really needed although I like the idea of cleaning up the data. Also, even if you did want to clean the data, why not just normalize the data in the files instead of renormalizing it each time you test it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 30 of 35 files reviewed, 11 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 141 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
This can be made private
Done
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 162 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Remind me why this is needed? You only use this once to remove the division symbol, but since you're removing it from both source and target, couldn't you just as well leave it in and the comparison would still be the same?
I removed it in the source in case it appeared in the source (just playing safe), but in actual sense, it appeared only in the target. That's why I couldn't leave it in, as that would mean the comparison would not be the same
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 174 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Same as above - not sure this is really needed although I like the idea of cleaning up the data. Also, even if you did want to clean the data, why not just normalize the data in the files instead of renormalizing it each time you test it?
I didn't know if it was right for me to do that in the SFM files, which was why I just handled it in the code. What I think would be better is making the NormalizeSpaces part of the CleanString since the normalization occurs the same places I needed to clean the string. What do you think @Enkidu93 ?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Thanks for the updates Mudi! As I look at them, I notice that you are just counting the number of mismatches, not verifying each proper match or mismatch. Can you verify each one in a way that makes it clear what each match or mismatch is intending to verify (either self-documented or with comments as needed)?
Made some changes
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1179 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I don't believe that this is testing what you think it's testing since you're just passing a number as the row ref, not the actual refs that would exist for these books' rows.
This test has been removed
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1193 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
If you have this test, I don't think you really need the previous one at all.
I agree.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1332 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
You can just iterate through the parallelCorpus - no need to call
ToArray
here, I don't think.
You are correct. Thanks
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1414 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
It's conceivable that this could get messed up during alignment but more so in the case when either the target or source is missing. I think you could remove this test and put more specific asserts in the previous test to better cover this.
I have removed this and I am refactoring the above test to better cover this
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1499 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Like above, this test should fail if there's an issue. You should have specific asserts.
Done
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1741 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Same as above. Have specific asserts, not just a report. I think there was some miscommunication about this, so no worries :). It should make things a lot simpler too!
Done
Thanks Eli for doing the review. Issues 1-5 I am satisfied with your response on. For Issue 6 (mapping onto existing verses), I would like at least one test to verify the actual behavior (whatever it is) and to make sure that it does not crash. An example test is in the doc for Option B - Gen 1:3-4 mapping onto Gen 1:1-2. |
Previously, mudiagaobrikisil wrote…
Thank you for the updates, but I would really like the mismatches spelled out that are expected, not just counted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi John. Thanks. I will add a test for that.
Reviewable status: 28 of 35 files reviewed, 11 unresolved discussions (waiting on @Enkidu93)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 28 of 35 files reviewed, 7 unresolved discussions (waiting on @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 162 at r5 (raw file):
Previously, mudiagaobrikisil wrote…
I removed it in the source in case it appeared in the source (just playing safe), but in actual sense, it appeared only in the target. That's why I couldn't leave it in, as that would mean the comparison would not be the same
See comment below.
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 174 at r5 (raw file):
Previously, mudiagaobrikisil wrote…
I didn't know if it was right for me to do that in the SFM files, which was why I just handled it in the code. What I think would be better is making the NormalizeSpaces part of the CleanString since the normalization occurs the same places I needed to clean the string. What do you think @Enkidu93 ?
I think it's reasonable to combine those two functions. I'm still not convinced that removing the division signs needs to happen at all. If the source and target in the test are the same corpus, then they will both have the division symbols, so there's no reason to remove them. If they're different, then even after you remove the division symbols, they'll still be different. What am I missing?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1179 at r5 (raw file):
Previously, mudiagaobrikisil wrote…
This test has been removed
Thank you! Great.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1414 at r5 (raw file):
Previously, mudiagaobrikisil wrote…
I have removed this and I am refactoring the above test to better cover this
Sounds good.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1741 at r5 (raw file):
Previously, mudiagaobrikisil wrote…
Done
Like John was saying above, specific asserts about the mappings is what we want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 28 of 35 files reviewed, 7 unresolved discussions (waiting on @Enkidu93)
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 174 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I think it's reasonable to combine those two functions. I'm still not convinced that removing the division signs needs to happen at all. If the source and target in the test are the same corpus, then they will both have the division symbols, so there's no reason to remove them. If they're different, then even after you remove the division symbols, they'll still be different. What am I missing?
Yes you are correct. The issue was from the initial source and target files Peter sent. Removing it made the test fail as only the target had the division sign. For the new files he sent, just tested and there was no need to remove the signs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 28 of 35 files reviewed, 7 unresolved discussions (waiting on @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 174 at r5 (raw file):
Previously, mudiagaobrikisil wrote…
Yes you are correct. The issue was from the initial source and target files Peter sent. Removing it made the test fail as only the target had the division sign. For the new files he sent, just tested and there was no need to remove the signs.
Great! Thank you - sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 28 of 35 files reviewed, 7 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Thank you for the updates, but I would really like the mismatches spelled out that are expected, not just counted.
When you said spelled out, what do you mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 28 of 35 files reviewed, 7 unresolved discussions (waiting on @johnml1135 and @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, mudiagaobrikisil wrote…
When you said spelled out, what do you mean
Add asserts that describe the specific 'mismatches' - i.e., as you iterate or just via indexing, add asserts that describe specific expectations about the source and target in that row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 28 of 35 files reviewed, 7 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Add asserts that describe the specific 'mismatches' - i.e., as you iterate or just via indexing, add asserts that describe specific expectations about the source and target in that row.
Okay. Understood.
Contains new tests to check for possible issues with alignment, mapping and versification of deuterocanonical books.
The tests are as follows:
It also contains some helper methods.
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)