Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing Test: Lucene.Net.Search.Join.TestJoinUtil::TestMultiValueRandomJoin() #549

Closed
NightOwl888 opened this issue Nov 19, 2021 · 2 comments · Fixed by #566
Closed

Failing Test: Lucene.Net.Search.Join.TestJoinUtil::TestMultiValueRandomJoin() #549

NightOwl888 opened this issue Nov 19, 2021 · 2 comments · Fixed by #566

Comments

@NightOwl888
Copy link
Contributor

NightOwl888 commented Nov 19, 2021

For information about how to help us debug tests, see #269.

This test only started failing after fixing the test framework to randomize the codecs, culture and time zone (NUnit's Randomizer was always initialized with a seed of 0, so none of these were actually random).

It only fails on .NET Framework under x86 on Windows, and only when optimizations are enabled (Release mode). I have confirmed that the test is 100% repeatable with the below seed and culture.

So far, all similar failures have been due to one of two things:

  1. Floating point numbers are compared directly using ==, >=, <=, >, or <. To fix these, converting the float data type to an int for both sides of the comparison has worked using Lucene.Net.Util.NumericUtils.SingleToSortableInt32().
  2. A float is being assigned to a double, which makes it change in precision. A place to watch out for are the Math functions that accept double as a parameter but are passed a float. For score values which can only be between 0 and 1, doing an intermediate cast to a decimal works (double x = (double)(decimal)theFloat). However, we haven't found a solution that works in more general cases because this could overflow the decimal for large numbers.

Failure 1

Lucene.Net.Tests.Join - net461 - x86 - Windows | Tests failed: 1, passed: 68, ignored: 0TestMultiValueRandomJoin
Failed 1h ago on WIN-P13CC4RF62R
Duration0:00:01.950
Ownernot available
Date started11/23/2021, 5:00:25 PM
Date completed11/23/2021, 5:00:27 PM
Failing since10m ago
Failing since build4.8.0-ci0000002686

Expected: 6.7838048934936523d +/- 0.0d
But was:  6.7838053703308105d
Off by:   -4.76837158203125E-07d

To reproduce this test result:

Option 1:

Apply the following assembly-level attributes:

[assembly: Lucene.Net.Util.RandomSeed(0x989675dd22f03da2L)]
[assembly: NUnit.Framework.SetCulture("tn")]

Option 2:

Use the following .runsettings file:

<RunSettings>
  <TestRunParameters>
    <Parameter name="tests:seed" value="0x989675dd22f03da2" />
    <Parameter name="tests:culture" value="tn" />
  </TestRunParameters>
</RunSettings>

See the .runsettings documentation at: https://docs.microsoft.com/en-us/visualstudio/test/configure-unit-tests-by-using-a-dot-runsettings-file.
at Lucene.Net.Util.LuceneTestCase.assertEquals(Single d1, Single d2, Single delta) in D:\a\1\s\src\Lucene.Net.TestFramework\Support\JavaCompatibility\LuceneTestCase.cs:line 174
at Lucene.Net.Search.Join.TestJoinUtil.ExecuteRandomJoin(Boolean multipleValuesPerDocument, Int32 maxIndexIter, Int32 maxSearchIter, Int32 numberOfDocumentsToIndex) in D:\a\1\s\src\Lucene.Net.Tests.Join\TestJoinUtil.cs:line 501
at Lucene.Net.Search.Join.TestJoinUtil.TestMultiValueRandomJoin() in D:\a\1\s\src\Lucene.Net.Tests.Join\TestJoinUtil.cs:line 396

See: https://dev.azure.com/LuceneNET-Temp/Lucene.NET/_build/results?buildId=1793&view=ms.vss-test-web.build-test-results-tab&runId=727036&resultId=100024&paneView=debug to download the test artifacts, but do note that the build will be cleaned up after 30 days.

Failure 2

Lucene.Net.Tests.Join - net48 - x86 - Windows | Tests failed: 1, passed: 69, ignored: 0TestMultiValueRandomJoin
Failed 8h ago on fv-az292-983
Duration0:00:00.763
Ownernot available
Date started11/19/2021, 3:10:28 PM
Date completed11/19/2021, 3:10:29 PM
Failing since8h ago
Failing since buildCurrent build

Expected: 5.5041933059692383d +/- 0.0d
But was:  5.5041937828063965d
Off by:   -4.76837158203125E-07d

To reproduce this test result:

Option 1:

Apply the following assembly-level attributes:

[assembly: Lucene.Net.Util.RandomSeed(0x4ce16e9f591fde54L)]
[assembly: NUnit.Framework.SetCulture("sr-Cyrl-ME")]

Option 2:

Use the following .runsettings file:

<RunSettings>
  <TestRunParameters>
    <Parameter name="tests:seed" value="0x4ce16e9f591fde54" />
    <Parameter name="tests:culture" value="sr-Cyrl-ME" />
  </TestRunParameters>
</RunSettings>

See the .runsettings documentation at: https://docs.microsoft.com/en-us/visualstudio/test/configure-unit-tests-by-using-a-dot-runsettings-file.
at Lucene.Net.Util.LuceneTestCase.assertEquals(Single d1, Single d2, Single delta) in D:\a\1\s\src\Lucene.Net.TestFramework\Support\JavaCompatibility\LuceneTestCase.cs:line 174
at Lucene.Net.Tests.Join.TestJoinUtil.ExecuteRandomJoin(Boolean multipleValuesPerDocument, Int32 maxIndexIter, Int32 maxSearchIter, Int32 numberOfDocumentsToIndex) in D:\a\1\s\src\Lucene.Net.Tests.Join\Support\TestJoinUtil.cs:line 504
at Lucene.Net.Tests.Join.TestJoinUtil.TestMultiValueRandomJoin() in D:\a\1\s\src\Lucene.Net.Tests.Join\Support\TestJoinUtil.cs:line 399

See: https://dev.azure.com/lucene-net-temp4/Main/_build/results?buildId=77&view=ms.vss-test-web.build-test-results-tab&runId=60166&resultId=100022&paneView=debug to download the test artifacts, but do note that the build will be cleaned up after 30 days.

Failure 3

Lucene.Net.Tests.Join - net48 - x86 - Windows | Tests failed: 1, passed: 69, ignored: 0TestMultiValueRandomJoin
Failed 11h ago on WIN-VF2R71D3ST9
Duration0:00:02.167
Ownernot available
Date started11/19/2021, 12:18:29 PM
Date completed11/19/2021, 12:18:31 PM
Failing since11h ago
Failing since build4.8.0-ci0000001842

Expected: 3.7507178783416748d +/- 0.0d
But was:  3.7507176399230957d
Off by:   2.384185791015625E-07d

To reproduce this test result:

Option 1:

Apply the following assembly-level attributes:

[assembly: Lucene.Net.Util.RandomSeed(0xc9754874a855ec2aL)]
[assembly: NUnit.Framework.SetCulture("it-VA")]

Option 2:

Use the following .runsettings file:

<RunSettings>
  <TestRunParameters>
    <Parameter name="tests:seed" value="0xc9754874a855ec2a" />
    <Parameter name="tests:culture" value="it-VA" />
  </TestRunParameters>
</RunSettings>

See the .runsettings documentation at: https://docs.microsoft.com/en-us/visualstudio/test/configure-unit-tests-by-using-a-dot-runsettings-file.
at Lucene.Net.Util.LuceneTestCase.assertEquals(Single d1, Single d2, Single delta) in D:\a\1\s\src\Lucene.Net.TestFramework\Support\JavaCompatibility\LuceneTestCase.cs:line 174
at Lucene.Net.Search.Join.TestJoinUtil.ExecuteRandomJoin(Boolean multipleValuesPerDocument, Int32 maxIndexIter, Int32 maxSearchIter, Int32 numberOfDocumentsToIndex) in D:\a\1\s\src\Lucene.Net.Tests.Join\TestJoinUtil.cs:line 501
at Lucene.Net.Search.Join.TestJoinUtil.TestMultiValueRandomJoin() in D:\a\1\s\src\Lucene.Net.Tests.Join\TestJoinUtil.cs:line 396

See: https://dev.azure.com/lucene-net-temp3/Lucene.NET/_build/results?buildId=593&view=ms.vss-test-web.build-test-results-tab&runId=432744&resultId=100024&paneView=debug to download the test artifacts, but do note that the build will be cleaned up after 30 days.

@NightOwl888 NightOwl888 added up-for-grabs This issue is open to be worked on by anyone help-wanted Extra attention is needed is:bug test-failure pri:normal labels Nov 19, 2021
@NightOwl888 NightOwl888 self-assigned this Dec 7, 2021
@NightOwl888 NightOwl888 removed up-for-grabs This issue is open to be worked on by anyone help-wanted Extra attention is needed labels Dec 7, 2021
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Dec 7, 2021
…NET Framework x86 with optimizations enabled (fixes apache#549 / Lucene.Net.Search.TestJoinUtil::TestMultiValueRandomJoin())
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Dec 7, 2021
…NET Framework x86 with optimizations enabled (fixes apache#549 / Lucene.Net.Search.TestJoinUtil::TestMultiValueRandomJoin())
NightOwl888 added a commit that referenced this issue Dec 7, 2021
…NET Framework x86 with optimizations enabled (fixes #549 / Lucene.Net.Search.TestJoinUtil::TestMultiValueRandomJoin())
@rclabo
Copy link
Contributor

rclabo commented Dec 7, 2021

Wow! You found one of the really hard bugs -- one that only fails in release mode with optimizations enabled. Wow -- Well done!

@NightOwl888
Copy link
Contributor Author

Actually, I was able to rule a lot of stuff out right away because it only failed under rare circumstances, I was able to isolate it to the Lucene.Net assembly by disabling optimizations, and the actual number of calls into Lucene.Net were limited for this test. The part about where it was failing was a bit confusing, though - the method doing the actual float math was returning the correct value, but this intermediate method was truncating it when it returned.

That said, I haven't come up with a good explanation why casting fixes it but using MethodImplOptons.NoOptimization on this method and all methods that it calls does not fix it. But casting seems to make the jitter happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants