Failing Test: Lucene.Net.Search.Join.TestJoinUtil::TestMultiValueRandomJoin() #549

NightOwl888 · 2021-11-19T16:17:33Z

For information about how to help us debug tests, see #269.

This test only started failing after fixing the test framework to randomize the codecs, culture and time zone (NUnit's Randomizer was always initialized with a seed of 0, so none of these were actually random).

It only fails on .NET Framework under x86 on Windows, and only when optimizations are enabled (Release mode). I have confirmed that the test is 100% repeatable with the below seed and culture.

So far, all similar failures have been due to one of two things:

Floating point numbers are compared directly using ==, >=, <=, >, or <. To fix these, converting the float data type to an int for both sides of the comparison has worked using Lucene.Net.Util.NumericUtils.SingleToSortableInt32().
A float is being assigned to a double, which makes it change in precision. A place to watch out for are the Math functions that accept double as a parameter but are passed a float. For score values which can only be between 0 and 1, doing an intermediate cast to a decimal works (double x = (double)(decimal)theFloat). However, we haven't found a solution that works in more general cases because this could overflow the decimal for large numbers.

Failure 1

Lucene.Net.Tests.Join - net461 - x86 - Windows | Tests failed: 1, passed: 68, ignored: 0TestMultiValueRandomJoin
Failed 1h ago on WIN-P13CC4RF62R
Duration0:00:01.950
Ownernot available
Date started11/23/2021, 5:00:25 PM
Date completed11/23/2021, 5:00:27 PM
Failing since10m ago
Failing since build4.8.0-ci0000002686

Expected: 6.7838048934936523d +/- 0.0d
But was:  6.7838053703308105d
Off by:   -4.76837158203125E-07d

To reproduce this test result:

Option 1:

Apply the following assembly-level attributes:

[assembly: Lucene.Net.Util.RandomSeed(0x989675dd22f03da2L)]
[assembly: NUnit.Framework.SetCulture("tn")]

Option 2:

Use the following .runsettings file:

<RunSettings>
  <TestRunParameters>
    <Parameter name="tests:seed" value="0x989675dd22f03da2" />
    <Parameter name="tests:culture" value="tn" />
  </TestRunParameters>
</RunSettings>

See the .runsettings documentation at: https://docs.microsoft.com/en-us/visualstudio/test/configure-unit-tests-by-using-a-dot-runsettings-file.
at Lucene.Net.Util.LuceneTestCase.assertEquals(Single d1, Single d2, Single delta) in D:\a\1\s\src\Lucene.Net.TestFramework\Support\JavaCompatibility\LuceneTestCase.cs:line 174
at Lucene.Net.Search.Join.TestJoinUtil.ExecuteRandomJoin(Boolean multipleValuesPerDocument, Int32 maxIndexIter, Int32 maxSearchIter, Int32 numberOfDocumentsToIndex) in D:\a\1\s\src\Lucene.Net.Tests.Join\TestJoinUtil.cs:line 501
at Lucene.Net.Search.Join.TestJoinUtil.TestMultiValueRandomJoin() in D:\a\1\s\src\Lucene.Net.Tests.Join\TestJoinUtil.cs:line 396

See: https://dev.azure.com/LuceneNET-Temp/Lucene.NET/_build/results?buildId=1793&view=ms.vss-test-web.build-test-results-tab&runId=727036&resultId=100024&paneView=debug to download the test artifacts, but do note that the build will be cleaned up after 30 days.

Failure 2

Lucene.Net.Tests.Join - net48 - x86 - Windows | Tests failed: 1, passed: 69, ignored: 0TestMultiValueRandomJoin
Failed 8h ago on fv-az292-983
Duration0:00:00.763
Ownernot available
Date started11/19/2021, 3:10:28 PM
Date completed11/19/2021, 3:10:29 PM
Failing since8h ago
Failing since buildCurrent build

Expected: 5.5041933059692383d +/- 0.0d
But was:  5.5041937828063965d
Off by:   -4.76837158203125E-07d

To reproduce this test result:

Option 1:

Apply the following assembly-level attributes:

[assembly: Lucene.Net.Util.RandomSeed(0x4ce16e9f591fde54L)]
[assembly: NUnit.Framework.SetCulture("sr-Cyrl-ME")]

Option 2:

Use the following .runsettings file:

<RunSettings>
  <TestRunParameters>
    <Parameter name="tests:seed" value="0x4ce16e9f591fde54" />
    <Parameter name="tests:culture" value="sr-Cyrl-ME" />
  </TestRunParameters>
</RunSettings>

See the .runsettings documentation at: https://docs.microsoft.com/en-us/visualstudio/test/configure-unit-tests-by-using-a-dot-runsettings-file.
at Lucene.Net.Util.LuceneTestCase.assertEquals(Single d1, Single d2, Single delta) in D:\a\1\s\src\Lucene.Net.TestFramework\Support\JavaCompatibility\LuceneTestCase.cs:line 174
at Lucene.Net.Tests.Join.TestJoinUtil.ExecuteRandomJoin(Boolean multipleValuesPerDocument, Int32 maxIndexIter, Int32 maxSearchIter, Int32 numberOfDocumentsToIndex) in D:\a\1\s\src\Lucene.Net.Tests.Join\Support\TestJoinUtil.cs:line 504
at Lucene.Net.Tests.Join.TestJoinUtil.TestMultiValueRandomJoin() in D:\a\1\s\src\Lucene.Net.Tests.Join\Support\TestJoinUtil.cs:line 399

See: https://dev.azure.com/lucene-net-temp4/Main/_build/results?buildId=77&view=ms.vss-test-web.build-test-results-tab&runId=60166&resultId=100022&paneView=debug to download the test artifacts, but do note that the build will be cleaned up after 30 days.

Failure 3

Lucene.Net.Tests.Join - net48 - x86 - Windows | Tests failed: 1, passed: 69, ignored: 0TestMultiValueRandomJoin
Failed 11h ago on WIN-VF2R71D3ST9
Duration0:00:02.167
Ownernot available
Date started11/19/2021, 12:18:29 PM
Date completed11/19/2021, 12:18:31 PM
Failing since11h ago
Failing since build4.8.0-ci0000001842

Expected: 3.7507178783416748d +/- 0.0d
But was:  3.7507176399230957d
Off by:   2.384185791015625E-07d

To reproduce this test result:

Option 1:

Apply the following assembly-level attributes:

[assembly: Lucene.Net.Util.RandomSeed(0xc9754874a855ec2aL)]
[assembly: NUnit.Framework.SetCulture("it-VA")]

Option 2:

Use the following .runsettings file:

<RunSettings>
  <TestRunParameters>
    <Parameter name="tests:seed" value="0xc9754874a855ec2a" />
    <Parameter name="tests:culture" value="it-VA" />
  </TestRunParameters>
</RunSettings>

See the .runsettings documentation at: https://docs.microsoft.com/en-us/visualstudio/test/configure-unit-tests-by-using-a-dot-runsettings-file.
at Lucene.Net.Util.LuceneTestCase.assertEquals(Single d1, Single d2, Single delta) in D:\a\1\s\src\Lucene.Net.TestFramework\Support\JavaCompatibility\LuceneTestCase.cs:line 174
at Lucene.Net.Search.Join.TestJoinUtil.ExecuteRandomJoin(Boolean multipleValuesPerDocument, Int32 maxIndexIter, Int32 maxSearchIter, Int32 numberOfDocumentsToIndex) in D:\a\1\s\src\Lucene.Net.Tests.Join\TestJoinUtil.cs:line 501
at Lucene.Net.Search.Join.TestJoinUtil.TestMultiValueRandomJoin() in D:\a\1\s\src\Lucene.Net.Tests.Join\TestJoinUtil.cs:line 396

See: https://dev.azure.com/lucene-net-temp3/Lucene.NET/_build/results?buildId=593&view=ms.vss-test-web.build-test-results-tab&runId=432744&resultId=100024&paneView=debug to download the test artifacts, but do note that the build will be cleaned up after 30 days.

The text was updated successfully, but these errors were encountered:

…NET Framework x86 with optimizations enabled (fixes apache#549 / Lucene.Net.Search.TestJoinUtil::TestMultiValueRandomJoin())

…NET Framework x86 with optimizations enabled (fixes #549 / Lucene.Net.Search.TestJoinUtil::TestMultiValueRandomJoin())

rclabo · 2021-12-07T13:26:30Z

Wow! You found one of the really hard bugs -- one that only fails in release mode with optimizations enabled. Wow -- Well done!

NightOwl888 · 2021-12-07T13:47:04Z

Actually, I was able to rule a lot of stuff out right away because it only failed under rare circumstances, I was able to isolate it to the Lucene.Net assembly by disabling optimizations, and the actual number of calls into Lucene.Net were limited for this test. The part about where it was failing was a bit confusing, though - the method doing the actual float math was returning the correct value, but this intermediate method was truncating it when it returned.

That said, I haven't come up with a good explanation why casting fixes it but using MethodImplOptons.NoOptimization on this method and all methods that it calls does not fix it. But casting seems to make the jitter happy.

NightOwl888 added up-for-grabs This issue is open to be worked on by anyone help-wanted Extra attention is needed is:bug test-failure pri:normal labels Nov 19, 2021

NightOwl888 self-assigned this Dec 7, 2021

NightOwl888 removed up-for-grabs This issue is open to be worked on by anyone help-wanted Extra attention is needed labels Dec 7, 2021

NightOwl888 mentioned this issue Dec 7, 2021

BUG: Lucene.Net.Search.TestJoinUtil::TestMultiValueRandomJoin() (fixes #549) #566

Merged

NightOwl888 closed this as completed in #566 Dec 7, 2021

NightOwl888 added a commit that referenced this issue Dec 7, 2021

BUG: Lucene.Net.Search.TermScorer: Added cast to fix calculation in .…

1458fe2

…NET Framework x86 with optimizations enabled (fixes #549 / Lucene.Net.Search.TestJoinUtil::TestMultiValueRandomJoin())

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing Test: Lucene.Net.Search.Join.TestJoinUtil::TestMultiValueRandomJoin() #549

Failing Test: Lucene.Net.Search.Join.TestJoinUtil::TestMultiValueRandomJoin() #549

NightOwl888 commented Nov 19, 2021 •

edited

Loading

rclabo commented Dec 7, 2021

NightOwl888 commented Dec 7, 2021

Failing Test: Lucene.Net.Search.Join.TestJoinUtil::TestMultiValueRandomJoin() #549

Failing Test: Lucene.Net.Search.Join.TestJoinUtil::TestMultiValueRandomJoin() #549

Comments

NightOwl888 commented Nov 19, 2021 • edited Loading

Failure 1

Failure 2

Failure 3

rclabo commented Dec 7, 2021

NightOwl888 commented Dec 7, 2021

NightOwl888 commented Nov 19, 2021 •

edited

Loading