[Kernel][Java to Scala test conversion] Convert and refactor TestParquetBatchReader #2714

tlm365 · 2024-03-03T13:36:14Z

Which Delta project/connector is this regarding?

Description

Resolves #2638

How was this patch tested?

Unit tests added.

Does this PR introduce any user-facing changes?

No.

…uetBatchReader Signed-off-by: Tai Le Manh <[email protected]>

vkorukanti

Looks great! few comment.

vkorukanti · 2024-03-04T16:38:09Z

...faults/src/test/scala/io/delta/kernel/defaults/internal/parquet/ParquetFileReaderSuite.scala

+        .add("ac", new StructType().add("aca", IntegerType.INTEGER)))
+      .add("array_of_prims", new ArrayType(IntegerType.INTEGER, true))
+
+    val actResult = readParquetFilesUsingSpark(ALL_TYPES_FILE, readSchema)


actual results should be read by Kernel reader. Use readParquetFilesUsingKernel.

For the expected results use readParquetFilesUsingSpark, so that we can remove the methods that currently hardcoded the expected results: `generateRowsFromAllTypesFile

Same comment for few other tests.

actual results should be ready by Kernel reader. Use readParquetFilesUsingKernel.

@vkorukanti got it. but, could you explain the reason for me pls. this part i just copy old code from TestParquetBatchReader to here.

For the expected results use readParquetFilesUsingSpark, so that we can remove the methods that currently hardcoded the expected results: `generateRowsFromAllTypesFile

i'm sorry, i don't understand this one, readParquetFilesUsingSpark from what?

In the current tests (before this PR):

Generate the actual results using the Kernel Parquet reader.

Generate the expected results using generateRowsFromAllTypesFile which basically hardcoded the expected results.

compare the actual and expected results

We want to change this to:

Generate the actual results using the Kernel Parquet reader. This step is same as before, but use the new utility method readParquetFilesUsingKernel that directly returns the Seq[TestRow]

Generate the expected results using readParquetFilesUsingSpark(parquet file directory). This step is different. Instead of using the hardcoded expected results, we are relying on the spark to read the Parquet files and get the contents.

compare the actual and expected results. This step is same as before.

val inputLocation = goldenTablePath("parquet-all-types") - this gives the directory of Parquet file contents. Pass this as input to readParquetFilesUsingSpark.

Oh, got it. Thank you so much!

vkorukanti · 2024-03-04T16:39:59Z

...faults/src/test/scala/io/delta/kernel/defaults/internal/parquet/ParquetFileReaderSuite.scala

+  //////////////////////////////////////////////////////////////////////////////////
+  // Timestamp type tests
+  //////////////////////////////////////////////////////////////////////////////////
+  // TODO move over from DeltaTableReadsSuite once there is better testing infra


Remove this comment.

vkorukanti · 2024-03-04T16:40:52Z

...faults/src/test/scala/io/delta/kernel/defaults/internal/parquet/ParquetFileReaderSuite.scala

+    val readSchema = new StructType()
+      .add("id", IntegerType.INTEGER)
+      .add("col1", new DecimalType(5, 1)) // INT32: 1 <= precision <= 9
+      .add("col2", new DecimalType(10, 5)) // INT64: 10 <= precision <= 18
+      .add("col3", new DecimalType(20, 5)) // FIXED_LEN_BYTE_ARRAY


you can use the tableSchema API to fetch the schema. Keep/update the comment mentioning three different types of storage for decimal columns

vkorukanti · 2024-03-04T16:42:08Z

...faults/src/test/scala/io/delta/kernel/defaults/internal/parquet/ParquetFileReaderSuite.scala

+    val readSchema = new StructType()
+      .add("id", IntegerType.INTEGER)
+      .add("col1", new DecimalType(9, 0)) // INT32: 1 <= precision <= 9
+      .add("col2", new DecimalType(12, 0)) // INT64: 10 <= precision <= 18
+      .add("col3", new DecimalType(25, 0)) // FIXED_LEN_BYTE_ARRAY


same comment. Use tableSchema

vkorukanti · 2024-03-04T17:41:45Z

...faults/src/test/scala/io/delta/kernel/defaults/internal/parquet/ParquetFileReaderSuite.scala

+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package io.delta.kernel.defaults.internal.parquet


nice! you moved this file to the correct package.

…uetBatchReader Signed-off-by: Tai Le Manh <[email protected]>

vkorukanti

LGTM, just a couple of minor comments, once fixed this PR is ready to go.

vkorukanti · 2024-03-04T18:17:57Z

...faults/src/test/scala/io/delta/kernel/defaults/internal/parquet/ParquetFileReaderSuite.scala

  test("decimals encoded using dictionary encoding ") {
+    val DECIMAL_DICT_FILE_V1 = goldenTableFile("parquet-decimal-dictionaries-v1").getAbsolutePath


nit: can you convert the variable name to camelCase: decimalDictFileV1? Same for other other ones.

vkorukanti · 2024-03-04T18:18:54Z

...faults/src/test/scala/io/delta/kernel/defaults/internal/parquet/ParquetFileReaderSuite.scala

  test("decimals encoded using dictionary encoding ") {
+    val DECIMAL_DICT_FILE_V1 = goldenTableFile("parquet-decimal-dictionaries-v1").getAbsolutePath


Add a comment: Below golden tables contains three decimal columns each stored in a different physical format: int32, int64 and fixed binary

@vkorukanti TYSM, I have updated it.

…uetBatchReader Signed-off-by: Tai Le Manh <[email protected]>

tlm365 added 3 commits February 24, 2024 22:51

[Kernel][Java to Scala test conversion] Convert and refactor TestParq…

188688d

…uetBatchReader Signed-off-by: Tai Le Manh <[email protected]>

[Kernel][Java to Scala test conversion] Convert and refactor TestParq…

b153a37

…uetBatchReader Signed-off-by: Tai Le Manh <[email protected]>

[Kernel][Java to Scala test conversion] Convert and refactor TestParq…

3ba2b41

…uetBatchReader Signed-off-by: Tai Le Manh <[email protected]>

vkorukanti added the kernel label Mar 4, 2024

vkorukanti requested changes Mar 4, 2024

View reviewed changes

vkorukanti reviewed Mar 4, 2024

View reviewed changes

[Kernel][Java to Scala test conversion] Convert and refactor TestParq…

a688dee

…uetBatchReader Signed-off-by: Tai Le Manh <[email protected]>

vkorukanti requested changes Mar 4, 2024

View reviewed changes

[Kernel][Java to Scala test conversion] Convert and refactor TestParq…

aff2f56

…uetBatchReader Signed-off-by: Tai Le Manh <[email protected]>

vkorukanti approved these changes Mar 4, 2024

View reviewed changes

vkorukanti merged commit fda41dd into delta-io:master Mar 4, 2024
6 of 7 checks passed

tlm365 deleted the 2638 branch March 7, 2024 08:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel][Java to Scala test conversion] Convert and refactor TestParquetBatchReader #2714

[Kernel][Java to Scala test conversion] Convert and refactor TestParquetBatchReader #2714

tlm365 commented Mar 3, 2024

vkorukanti left a comment

vkorukanti Mar 4, 2024 •

edited

Loading

vkorukanti Mar 4, 2024

tlm365 Mar 4, 2024

vkorukanti Mar 4, 2024

tlm365 Mar 4, 2024

vkorukanti Mar 4, 2024

vkorukanti Mar 4, 2024

vkorukanti Mar 4, 2024

vkorukanti Mar 4, 2024

vkorukanti left a comment

vkorukanti Mar 4, 2024

vkorukanti Mar 4, 2024

tlm365 Mar 4, 2024

		test("decimals encoded using dictionary encoding ") {
		val DECIMAL_DICT_FILE_V1 = goldenTableFile("parquet-decimal-dictionaries-v1").getAbsolutePath

[Kernel][Java to Scala test conversion] Convert and refactor TestParquetBatchReader #2714

[Kernel][Java to Scala test conversion] Convert and refactor TestParquetBatchReader #2714

Conversation

tlm365 commented Mar 3, 2024

Which Delta project/connector is this regarding?

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

vkorukanti left a comment

Choose a reason for hiding this comment

vkorukanti Mar 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vkorukanti left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vkorukanti Mar 4, 2024 •

edited

Loading