Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-42030: [Java] Update Unit Tests for Adapter Module #42038

Merged
merged 21 commits into from
Jun 14, 2024

Conversation

llama90
Copy link
Contributor

@llama90 llama90 commented Jun 8, 2024

Rationale for this change

Update package from JUnit 4(org.junit) to JUnit 5(org.junit.jupiter).

What changes are included in this PR?

  • avro and jdbc module
    • Replacing org.junit with org.junit.jupiter.api.
    • Updating Assertions.assertXXX to assertXXX using static imports.
    • Updating annotations such as @Before, @After.
      • @Before -> @BeforeEach
      • @After -> @AfterEach
      • @Test -> @Test with org.junit.jupiter
      • @ClassRule -> @TempDir and @BeforeAll
    • Updating Parameterized test
    • Doing self review for avro
    • Dealing with java.io.IOException: Failed to delete temp directory on Windows with JDK 11
    • Exploring a more effective structure for ParameterizedTest in JDBC tests.
    • Doing self review for jdbc
  • orc module
    • Reviewing the build method
    • Updating annotations such as @BeforeAll, @Rule, @TemporaryFolder
    • Doing self review

Are these changes tested?

Yes, existing tests have passed.

Are there any user-facing changes?

No.

@llama90 llama90 requested a review from lidavidm as a code owner June 8, 2024 16:25
Copy link

github-actions bot commented Jun 8, 2024

⚠️ GitHub issue #42030 has been automatically assigned in GitHub to PR creator.

@llama90
Copy link
Contributor Author

llama90 commented Jun 8, 2024

I currently use explicit calls to initializeDatabase in each test method. However, I believe there is a more efficient structure for this process.

Reason

  • In JUnit 4, the method calling sequence is: constructor -> @Before -> @ParameterizedTest.
    • In JUnit 5, the constructor is not used for @ParameterizedTest.

/**
* This method creates Connection object and DB table and also populate data into table for test.
*
* @throws SQLException on error
* @throws ClassNotFoundException on error
*/
@Before
public void setUp() throws SQLException, ClassNotFoundException {
protected void initializeDatabase(Table table) throws SQLException, ClassNotFoundException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vibhatha I'll look into it more, but if you have any suggestions for a better structure, I would really appreciate your advice.

cc @lidavidm

classDiagram
    direction LR
    class AbstractJdbcToArrowTest {
    }

    class JdbcToArrowCharSetTest {
    }

    class JdbcToArrowDataTypesTest {
    }

    class JdbcToArrowMapDataTypeTest {
    }

    class JdbcToArrowNullTest {
    }

    class JdbcToArrowOptionalColumnsTest {
    }

    class JdbcToArrowTest {
    }

    class JdbcToArrowTimeZoneTest {
    }

    class JdbcToArrowVectorIteratorTest {
    }

    AbstractJdbcToArrowTest <|-- JdbcToArrowCharSetTest
    AbstractJdbcToArrowTest <|-- JdbcToArrowDataTypesTest
    AbstractJdbcToArrowTest <|-- JdbcToArrowMapDataTypeTest
    AbstractJdbcToArrowTest <|-- JdbcToArrowNullTest
    AbstractJdbcToArrowTest <|-- JdbcToArrowOptionalColumnsTest
    AbstractJdbcToArrowTest <|-- JdbcToArrowTest
    AbstractJdbcToArrowTest <|-- JdbcToArrowTimeZoneTest
    JdbcToArrowTest <|-- JdbcToArrowVectorIteratorTest
Loading

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@llama90 I will review this tomorrow. Thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what you are concerned about is making the database connection for each test case with @BeforeEach setup method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to create a database connection with a different .yml for each test, but @BeforeEach cannot take arguments for @ParameterizedTest.

I described something similar here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 8, 2024
@llama90 llama90 force-pushed the ARROW-42030 branch 4 times, most recently from a5a328a to 64f8b60 Compare June 9, 2024 15:02
@llama90
Copy link
Contributor Author

llama90 commented Jun 9, 2024

Ummm....

There is a JUnit 5 bug on Windows (Server 2022) with JDK 11, specifically in the CI used by Arrow.

Should we consider a rollback because of this, or is there another solution we could explore?

error message
Error:  Tests run: 5, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 3.553 s <<< FAILURE! -- in org.apache.arrow.adapter.avro.AvroToArrowIteratorTest
[INFO] --- maven-surefire-plugin:3.2.5:test (default-test) @ arrow-jdbc ---
Error:  org.apache.arrow.adapter.avro.AvroToArrowIteratorTest.testArrayType -- Time elapsed: 0.125 s <<< ERROR!
java.io.IOException: Failed to delete temp directory D:\a\arrow\arrow\java\adapter\avro\target\junit4444716940079148427. The following paths could not be deleted (see suppressed exceptions for details): <root>, test.avro
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
	at java.base/java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:395)
	at java.base/java.util.stream.Sink$ChainedReference.end(Sink.java:258)
	at java.base/java.util.stream.Sink$ChainedReference.end(Sink.java:258)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
	Suppressed: java.nio.file.DirectoryNotEmptyException: D:\a\arrow\arrow\java\adapter\avro\target\junit4444716940079148427
		at java.base/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:271)
		at java.base/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)
		at java.base/java.nio.file.Files.delete(Files.java:1142)
		at java.base/java.nio.file.Files.walkFileTree(Files.java:2743)
		at java.base/java.nio.file.Files.walkFileTree(Files.java:2797)
		... 13 more
	Suppressed: java.nio.file.FileSystemException: D:\a\arrow\arrow\java\adapter\avro\target\junit4444716940079148427\test.avro: The process cannot access the file because it is being used by another process.

		at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
		at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
		at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:108)
		at java.base/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:274)
		at java.base/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)
		at java.base/java.nio.file.Files.delete(Files.java:1142)
		at java.base/java.nio.file.Files.walkFileTree(Files.java:2725)
		at java.base/java.nio.file.Files.walkFileTree(Files.java:2797)
		... 13 more
		Suppressed: java.nio.file.FileSystemException: D:\a\arrow\arrow\java\adapter\avro\target\junit4444716940079148427\test.avro: The process cannot access the file because it is being used by another process.

			... 21 more

UPDATE:

I tried two kinds of approaches

Using try-with-resources Statements for FileOutputStream and FileInputStream

try (FileOutputStream outStream = new FileOutputStream(dataFile);
     FileInputStream inStream = new FileInputStream(dataFile)) {
    BinaryEncoder encoder = new EncoderFactory().directBinaryEncoder(outStream, null);
    DatumWriter<Object> writer = new GenericDatumWriter<>(schema);
    for (Object value : data) {
        writer.write(value, encoder);
    }
    outStream.flush();

    BinaryDecoder decoder = new DecoderFactory().directBinaryDecoder(inStream, null);
    return AvroToArrow.avroToArrow(schema, decoder, config);
}

Separating FileOutputStream and FileInputStream Usage to Avoid Concurrent Access

  protected VectorSchemaRoot writeAndRead(Schema schema, List data) throws Exception {
    File dataFile = new File(TMP, "test.avro");

    try (FileOutputStream out = new FileOutputStream(dataFile)) {
      BinaryEncoder encoder = new EncoderFactory().directBinaryEncoder(out, null);
      DatumWriter<Object> writer = new GenericDatumWriter<>(schema);
      for (Object value : data) {
        writer.write(value, encoder);
      }
      out.flush();
    }

    try (FileInputStream in = new FileInputStream(dataFile)) {
      BinaryDecoder decoder = new DecoderFactory().directBinaryDecoder(in, null);
      return AvroToArrow.avroToArrow(schema, decoder, config);
    }
  }

@vibhatha
Copy link
Collaborator

vibhatha commented Jun 9, 2024

Ummm....

There is a JUnit 5 bug on Windows (Server 2022) with JDK 11, specifically in the CI used by Arrow.

* [@TempDir directory cannot be deleted on Windows with Java 11 junit-team/junit5#2811](https://github.com/junit-team/junit5/issues/2811)

* https://github.com/apache/arrow/actions/runs/9437309980/job/25993017637?pr=42038#step:5:2647

Should we consider a rollback because of this, or is there another solution we could explore?
error message

Error:  Tests run: 5, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 3.553 s <<< FAILURE! -- in org.apache.arrow.adapter.avro.AvroToArrowIteratorTest
[INFO] --- maven-surefire-plugin:3.2.5:test (default-test) @ arrow-jdbc ---
Error:  org.apache.arrow.adapter.avro.AvroToArrowIteratorTest.testArrayType -- Time elapsed: 0.125 s <<< ERROR!
java.io.IOException: Failed to delete temp directory D:\a\arrow\arrow\java\adapter\avro\target\junit4444716940079148427. The following paths could not be deleted (see suppressed exceptions for details): <root>, test.avro
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
	at java.base/java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:395)
	at java.base/java.util.stream.Sink$ChainedReference.end(Sink.java:258)
	at java.base/java.util.stream.Sink$ChainedReference.end(Sink.java:258)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
	Suppressed: java.nio.file.DirectoryNotEmptyException: D:\a\arrow\arrow\java\adapter\avro\target\junit4444716940079148427
		at java.base/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:271)
		at java.base/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)
		at java.base/java.nio.file.Files.delete(Files.java:1142)
		at java.base/java.nio.file.Files.walkFileTree(Files.java:2743)
		at java.base/java.nio.file.Files.walkFileTree(Files.java:2797)
		... 13 more
	Suppressed: java.nio.file.FileSystemException: D:\a\arrow\arrow\java\adapter\avro\target\junit4444716940079148427\test.avro: The process cannot access the file because it is being used by another process.

		at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
		at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
		at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:108)
		at java.base/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:274)
		at java.base/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)
		at java.base/java.nio.file.Files.delete(Files.java:1142)
		at java.base/java.nio.file.Files.walkFileTree(Files.java:2725)
		at java.base/java.nio.file.Files.walkFileTree(Files.java:2797)
		... 13 more
		Suppressed: java.nio.file.FileSystemException: D:\a\arrow\arrow\java\adapter\avro\target\junit4444716940079148427\test.avro: The process cannot access the file because it is being used by another process.

			... 21 more

UPDATE:

I tried two kinds of approaches

Using try-with-resources Statements for FileOutputStream and FileInputStream

try (FileOutputStream outStream = new FileOutputStream(dataFile);
     FileInputStream inStream = new FileInputStream(dataFile)) {
    BinaryEncoder encoder = new EncoderFactory().directBinaryEncoder(outStream, null);
    DatumWriter<Object> writer = new GenericDatumWriter<>(schema);
    for (Object value : data) {
        writer.write(value, encoder);
    }
    outStream.flush();

    BinaryDecoder decoder = new DecoderFactory().directBinaryDecoder(inStream, null);
    return AvroToArrow.avroToArrow(schema, decoder, config);
}

Separating FileOutputStream and FileInputStream Usage to Avoid Concurrent Access

  protected VectorSchemaRoot writeAndRead(Schema schema, List data) throws Exception {
    File dataFile = new File(TMP, "test.avro");

    try (FileOutputStream out = new FileOutputStream(dataFile)) {
      BinaryEncoder encoder = new EncoderFactory().directBinaryEncoder(out, null);
      DatumWriter<Object> writer = new GenericDatumWriter<>(schema);
      for (Object value : data) {
        writer.write(value, encoder);
      }
      out.flush();
    }

    try (FileInputStream in = new FileInputStream(dataFile)) {
      BinaryDecoder decoder = new DecoderFactory().directBinaryDecoder(in, null);
      return AvroToArrow.avroToArrow(schema, decoder, config);
    }
  }

@llama90 thanks for taking a very thorough assessment on this. I think the issue still persists. Furthermore looking into the JUNIT5 bug report, it seems that there isn't a solid fix yet. So my suggestion would be to fix it once that's resolved.

Although, we can create two sub-issues for this ticket, one for functional conversion (orc) and other for non-functional bits.
So that we can track the failing one later. But I have doubts about that approach.

cc @lidavidm Appreciate your feedback.

@ClassRule
public static final TemporaryFolder TMP = new TemporaryFolder();
@TempDir
public File TMP;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason for this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new JUnit API is different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that in JUnit 5, the alternative for temporary directories is to use @TempDir. So I modified it accordingly.

@TempDir can be used to annotate a field in a test class or a parameter in a lifecycle method or test method of type Path or File that should be resolved into a temporary directory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that seems to be related, though can we re-check by running CIs?

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a merge might have gone wrong? There are changes from other PRs

@ClassRule
public static final TemporaryFolder TMP = new TemporaryFolder();
@TempDir
public File TMP;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new JUnit API is different.

/**
* This method creates Connection object and DB table and also populate data into table for test.
*
* @throws SQLException on error
* @throws ClassNotFoundException on error
*/
@Before
public void setUp() throws SQLException, ClassNotFoundException {
protected void initializeDatabase(Table table) throws SQLException, ClassNotFoundException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine

java/bom/pom.xml Outdated
Comment on lines 188 to 191
<licenseHeader>
<file>${maven.multiModuleProjectDirectory}/dev/license/asf-xml.license</file>
<delimiter>(&lt;configuration|&lt;project)</delimiter>
</licenseHeader>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this come from?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe this is a merge artifact...it seems to already be on main?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a rebase and saw that changes from another PR were left out.

What do you think about issues with @TempDir when using JDK 11 on Windows?

If possible, can I fix the problem with a new PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to fix it here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Jun 12, 2024
@llama90 llama90 force-pushed the ARROW-42030 branch 2 times, most recently from 35e7777 to dd84928 Compare June 13, 2024 10:04
@llama90 llama90 force-pushed the ARROW-42030 branch 3 times, most recently from bc536e6 to fb22cad Compare June 13, 2024 11:04
@vibhatha
Copy link
Collaborator

vibhatha commented Jun 13, 2024

@llama90 the CI failures are due to spotless issue. Could you please try mvn spotless:apply before building?
We will integrate pre-commit sooner, so that this won't be an issue.

@llama90
Copy link
Contributor Author

llama90 commented Jun 13, 2024

@llama90 the CI failures are due to spotless issue. Could you please try mvn spotless:apply before building? We will integrate pre-commit sooner, so that this won't be an issue.

Yes, I tried to check (mvn spotless:apply) before PR, but sometimes I missed it 🥲

I can't test the environment for Windows locally, I am repeatedly testing it through CI.

@llama90 llama90 force-pushed the ARROW-42030 branch 2 times, most recently from 8fde75e to b496322 Compare June 13, 2024 13:44
@vibhatha
Copy link
Collaborator

Yes, I tried to check (mvn spotless:apply) before PR, but sometimes I missed it 🥲

Yeah that happens, and after pre-commit that will be warned before committing.

I can't test the environment for Windows locally, I am repeatedly testing it through CI.

I understand. Thanks.

Copy link
Contributor Author

@llama90 llama90 Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting an error in Windows saying "failed to delete temp directory" because "The process cannot access the file because it is being used by another process."

To fix this, I found out that using try-with-resources to properly close resources would help.

But for the convert method, the FileInputStream (fis) needs to stay open for the iterator after the convert method is called. This caused the error because the fis wasn't properly closed. So, I made a writeDataToFile method for the FileOutputStream and used try-with-resources for fis in the same method where the iterator is used.

Now, the tests run fine on Windows too.

Here are the different approaches I tried:

Attempt 1:

convert method:

  • FileOutputStream with try-with-resources
  • FileInputStream with try-with-resources

Error: Stream Closed - because the fis was closed while the iterator still needed it.

Attempt 2:

convert method:

  • FileOutputStream with try-with-resources
  • FileInputStream

Error: failed to delete temp directory - because the fis wasn't closed.

Attempt 3:

convert method:

  • FileOutputStream with try-with-resources
  • Added FileInputStream as a parameter to the convert method and explicitly closed it in the test method.

Didn't work as expected.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this one and evaluating diverse approaches.

@llama90
Copy link
Contributor Author

llama90 commented Jun 13, 2024

@vibhatha @lidavidm lol. Finally, it looks I've resolved the problem!

I would appreciate it if you could review this when you have free time.

Copy link
Collaborator

@vibhatha vibhatha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes look okay to me. And using try-with-resources is actually encouraged, may be this upgrade showed an issue in the code?

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for figuring it out! So basically, we weren't properly closing the file in all cases and that stopped Windows from deleting the temp files?

@lidavidm lidavidm merged commit 870b315 into apache:main Jun 14, 2024
20 checks passed
@lidavidm lidavidm removed the awaiting merge Awaiting merge label Jun 14, 2024
@llama90
Copy link
Contributor Author

llama90 commented Jun 15, 2024

Thanks for figuring it out! So basically, we weren't properly closing the file in all cases and that stopped Windows from deleting the temp files?

File deletion operations differ across operating systems. Specifically, Windows only marks a file for deletion on close. Therefore, it's important to use explicit resource management, such as try-with-resources, to ensure proper release of resources.

Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 870b315.

There were 12 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 59 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants