Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48177][BUILD] Upgrade Apache Parquet to 1.14.1 #46447

Closed
wants to merge 1 commit into from

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented May 7, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Fixes quite a few bugs on the Parquet side: https://github.com/apache/parquet-mr/blob/master/CHANGES.md#version-1140

Does this PR introduce any user-facing change?

No

How was this patch tested?

Using the existing unit tests

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the BUILD label May 7, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, finally.

Please run the following and attach the updated dependency file, @Fokko .

dev/test-dependencies.sh --replace-manifest

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-48177][BUILD]: Bump Apache Parquet to 1.14.0 [SPARK-48177][BUILD] Upgrade Apache Parquet to 1.14.0 May 7, 2024
@dongjoon-hyun
Copy link
Member

cc @cloud-fan , @HyukjinKwon , @mridulm , @sunchao , @yaooqinn , @LuciferYang , @steveloughran , @viirya , @huaxin, @parthchandra , too.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it seems that there exist many unit test failures.

[info] *** 189 TESTS FAILED ***
[error] Failed: Total 1526, Failed 189, Errors 0, Passed 1337, Ignored 597
[error] Failed tests:
[error] 	org.apache.spark.sql.hive.execution.SQLQuerySuite
[error] 	org.apache.spark.sql.hive.execution.HiveResolutionSuite
[error] 	org.apache.spark.sql.hive.execution.HiveDDLSuite
[error] 	org.apache.spark.sql.hive.execution.HiveQuerySuite
[error] 	org.apache.spark.sql.hive.execution.SQLQuerySuiteAE
[error] 	org.apache.spark.sql.hive.execution.HiveSQLViewSuite
[error] 	org.apache.spark.sql.hive.execution.HashUDAQuerySuite
[error] 	org.apache.spark.sql.hive.execution.PruneHiveTablePartitionsSuite
[error] 	org.apache.spark.sql.hive.execution.HiveUDAFSuite
[error] 	org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite
[error] 	org.apache.spark.sql.hive.execution.HiveTableScanSuite
[error] 	org.apache.spark.sql.hive.execution.HashAggregationQueryWithControlledFallbackSuite
[error] 	org.apache.spark.sql.hive.execution.HiveCommandSuite
[error] 	org.apache.spark.sql.hive.execution.HashUDAQueryWithControlledFallbackSuite
[error] 	org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite
[error] 	org.apache.spark.sql.hive.execution.HiveUDFSuite
[error] 	org.apache.spark.sql.hive.HiveSparkSubmitSuite
[error] 	org.apache.spark.sql.hive.execution.HashAggregationQuerySuite
[error] (hive / Test / test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 1448 s (24:08), completed May 7, 2024, 9:07:49 PM

For example,

- SPARK-6851: Self-joined converted parquet tables *** FAILED *** (4 seconds, 473 milliseconds)
[info]   java.util.concurrent.ExecutionException: org.apache.spark.SparkException:
[FAILED_READ_FILE.NO_HINT] Encountered error while reading file 

file:///home/runner/work/spark/spark/target/tmp/warehouse-75fc0262-e914-40da-98bf-ad2460270fb5/orders/state=CA/month=20151/part-00000-d46019ae-951c-4974-96da-2b38ade7b49e.c000.snappy.parquet.  SQLSTATE: KD001

@Fokko Fokko force-pushed the fd-bump-parquet branch from 31cb518 to a309990 Compare May 7, 2024 21:29
@github-actions github-actions bot added the CORE label May 7, 2024
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented May 7, 2024

Oh, it seems that wrong target folder files are added.

FYI, this PR is supposed to have two files: pom.xml and dev/deps/spark-deps-hadoop-3-hive-2.3.

@Fokko Fokko force-pushed the fd-bump-parquet branch from a309990 to 587a808 Compare May 7, 2024 21:36
@github-actions github-actions bot removed the CORE label May 7, 2024
@Fokko
Copy link
Contributor Author

Fokko commented May 7, 2024

Thanks for pointing out @dongjoon-hyun. I've fixed it right away 👍

@Fokko
Copy link
Contributor Author

Fokko commented May 7, 2024

I have to look into the tests 👀

@rshkv
Copy link
Contributor

rshkv commented May 21, 2024

I think the toPrettyJson errors seen here are reported in PARQUET-2468 and being addressed in apache/parquet-java#1349. We might have to wait for 1.14.1.

Cause: java.lang.RuntimeException: shaded.parquet.com.fasterxml.jackson.databind.exc.InvalidDefinitionException: No serializer found for class org.apache.parquet.schema.LogicalTypeAnnotation$StringLogicalTypeAnnotation and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationFeature.FAIL_ON_EMPTY_BEANS) (through reference chain: org.apache.parquet.hadoop.metadata.ParquetMetadata["fileMetaData"]->org.apache.parquet.hadoop.metadata.FileMetaData["schema"]->org.apache.parquet.schema.MessageType["fields"]->java.util.ArrayList[1]->org.apache.parquet.schema.PrimitiveType["logicalTypeAnnotation"])
	at org.apache.parquet.hadoop.metadata.ParquetMetadata.toJSON(ParquetMetadata.java:68)
	at org.apache.parquet.hadoop.metadata.ParquetMetadata.toPrettyJSON(ParquetMetadata.java:48)
	at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1592)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:629)
Caused by: shaded.parquet.com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Java 8 optional type `java.util.Optional<java.lang.Long>` not supported by default: add Module "shaded.parquet.com.fasterxml.jackson.datatype:jackson-datatype-jdk8" to enable handling (through reference chain: org.apache.parquet.hadoop.metadata.ParquetMetadata["blocks"]->java.util.ArrayList[0]->org.apache.parquet.hadoop.metadata.BlockMetaData["columns"]->java.util.Collections$UnmodifiableRandomAccessList[0]->org.apache.parquet.hadoop.metadata.IntColumnChunkMetaData["sizeStatistics"]->org.apache.parquet.column.statistics.SizeStatistics["unencodedByteArrayDataBytes"])
	at shaded.parquet.com.fasterxml.jackson.databind.exc.InvalidDefinitionException.from(InvalidDefinitionException.java:77)
	...
	at shaded.parquet.com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:1114)
	at org.apache.parquet.hadoop.metadata.ParquetMetadata.toJSON(ParquetMetadata.java:62)
	at org.apache.parquet.hadoop.metadata.ParquetMetadata.toPrettyJSON(ParquetMetadata.java:48)
	at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1592)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:629)

@Fokko
Copy link
Contributor Author

Fokko commented May 21, 2024

Thanks for digging into this @rshkv, let's follow up on the Parquet side

@dongjoon-hyun
Copy link
Member

Thank you, @rshkv and @Fokko .

@Fokko Fokko force-pushed the fd-bump-parquet branch from 587a808 to 47965c7 Compare June 17, 2024 16:21
@Fokko Fokko changed the title [SPARK-48177][BUILD] Upgrade Apache Parquet to 1.14.0 [SPARK-48177][BUILD] Upgrade Apache Parquet to 1.14.1 Jun 17, 2024
@Fokko
Copy link
Contributor Author

Fokko commented Jun 17, 2024

Apache Parquet 1.14.1 has been released, thanks @wgtmac 🙌

@dongjoon-hyun
Copy link
Member

Thank you, @Fokko and @wgtmac .

@dongjoon-hyun
Copy link
Member

Could you make CI happy, @Fokko ?

[info] - SPARK-30269 failed to update partition stats if it's equal to table's old stats *** FAILED *** (414 milliseconds)
[info]   690 did not equal 657 (StatisticsSuite.scala:1610)
[info] - Runtime bloom filter join: BF rewrite triggering threshold test *** FAILED *** (1 second, 469 milliseconds)
[info]   2 did not equal 0 (InjectRuntimeFilterSuite.scala:248)

@LuciferYang
Copy link
Contributor

[info] - primitive type - no column index *** FAILED *** (12 milliseconds)
[info]   java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
[info]   at org.apache.parquet.column.statistics.SizeStatistics$Builder.add(SizeStatistics.java:83)
[info]   at org.apache.parquet.column.statistics.SizeStatistics$Builder.add(SizeStatistics.java:95)
[info]   at org.apache.parquet.column.impl.ColumnValueCollector.write(ColumnValueCollector.java:92)
[info]   at org.apache.parquet.column.impl.ColumnWriterBase.write(ColumnWriterBase.java:197)
[info]   at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$writeDataPage$1(ParquetVectorizedSuite.scala:607)
[info]   at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$writeDataPage$1$adapted(ParquetVectorizedSuite.scala:591)
[info]   at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619)
[info]   at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617)
[info]   at scala.collection.AbstractIterable.foreach(Iterable.scala:935)
[info]   at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.writeDataPage(ParquetVectorizedSuite.scala:591)
[info]   at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$testPrimitiveString$4(ParquetVectorizedSuite.scala:515)
[info]   at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.scala:18)
[info]   at scala.collection.immutable.List.foreach(List.scala:334)
[info]   at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.testPrimitiveString(ParquetVectorizedSuite.scala:511)
[info]   at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$new$4(ParquetVectorizedSuite.scala:62)
[info]   at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$new$4$adapted(ParquetVectorizedSuite.scala:60)
[info]   at scala.collection.immutable.List.foreach(List.scala:334)
[info]   at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$new$3(ParquetVectorizedSuite.scala:60)
[info]   at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$new$3$adapted(ParquetVectorizedSuite.scala:59)
[info]   at scala.collection.immutable.List.foreach(List.scala:334)
[info]   at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$new$2(ParquetVectorizedSuite.scala:59)
[info]   at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.scala:18)
[info]   at scala.collection.immutable.List.foreach(List.scala:334)
[info]   at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$new$1(ParquetVectorizedSuite.scala:58)

1.14.1 still seem to have this error in writing statistics. Does this indicate an incompatibility?

@wgtmac
Copy link
Member

wgtmac commented Jun 18, 2024

@LuciferYang Could you please check the test case? It seems to be writing def_level=1 to a column with max_def_level=0.

@LuciferYang
Copy link
Contributor

@LuciferYang Could you please check the test case? It seems to be writing def_level=1 to a column with max_def_level=0.

This is an existing test case in Spark, this error does not occur when using version 1.13.x.

@wgtmac
Copy link
Member

wgtmac commented Jun 18, 2024

Yes I know that. The exception is thrown when building size statistics, which is a new feature and has caught similar issues in the test cases of parquet-mr. So I'd suggest to check if the existing test violates the rule of 0 <= def_level <= max_def_level.

@wgtmac
Copy link
Member

wgtmac commented Jun 18, 2024

These lines are suspicious:

val maxDef = if (inputValues.contains(null)) 1 else 0
val ty = parquetSchema.asGroupType().getType("a").asPrimitiveType()
val cd = new ColumnDescriptor(Seq("a").toArray, ty, 0, maxDef)
val repetitionLevels = Array.fill[Int](inputValues.length)(0)
val definitionLevels = inputValues.map(v => if (v == null) 0 else 1)

If inputValues do not have any null, maxDef is set to 0. However, definitionLevels for non-null value is set to 1, which exactly violates the rule I mentioned.

@LuciferYang
Copy link
Contributor

LuciferYang commented Jun 18, 2024

@wgtmac Thank you for your explanation, it seems you are correct, should Line 505 be changed from

 val definitionLevels = inputValues.map(v => if (v == null) 0 else 1) 

to

val definitionLevels = inputValues.map(v => if (v == null) 0 else maxDef)

? I manually tested it, and this way ParquetVectorizedSuite can pass.

@wgtmac
Copy link
Member

wgtmac commented Jun 18, 2024

Yes, that change looks reasonable. Thanks for verification! @LuciferYang

@wgtmac
Copy link
Member

wgtmac commented Jun 18, 2024

(I have to admit that it is a little bit aggressive to enable a new feature by default on the parquet side, sigh)

@Fokko Fokko force-pushed the fd-bump-parquet branch from 47965c7 to 7f52ae6 Compare June 18, 2024 16:38
@github-actions github-actions bot added the SQL label Jun 18, 2024
@Fokko
Copy link
Contributor Author

Fokko commented Jun 18, 2024

@LuciferYang Thanks for the pointer, I've updated the PR 👍

@LuciferYang
Copy link
Contributor

LuciferYang commented Jun 19, 2024

Could you make CI happy, @Fokko ?

[info] - SPARK-30269 failed to update partition stats if it's equal to table's old stats *** FAILED *** (414 milliseconds)
[info]   690 did not equal 657 (StatisticsSuite.scala:1610)
[info] - Runtime bloom filter join: BF rewrite triggering threshold test *** FAILED *** (1 second, 469 milliseconds)
[info]   2 did not equal 0 (InjectRuntimeFilterSuite.scala:248)

@Fokko It seems that the data written by 1.14.1 is larger than that by 1.13.1.

The expectedSize needs to be changed to 690.

withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "32",
SQLConf.RUNTIME_BLOOM_FILTER_CREATION_SIDE_THRESHOLD.key -> "4000") {
// Test that the max scan size rather than an individual scan size on the filter
// application side matters. `bf5filtered` has 14168 bytes and `bf2` has 3409 bytes.
withSQLConf(
SQLConf.RUNTIME_BLOOM_FILTER_APPLICATION_SIDE_SCAN_SIZE_THRESHOLD.key -> "5000") {
assertRewroteWithBloomFilter("select * from " +
"(select * from bf5filtered union all select * from bf2) t " +
"join bf3 on t.c5 = bf3.c3 where bf3.a3 = 5", 2)
}
withSQLConf(
SQLConf.RUNTIME_BLOOM_FILTER_APPLICATION_SIDE_SCAN_SIZE_THRESHOLD.key -> "15000") {
assertDidNotRewriteWithBloomFilter("select * from " +
"(select * from bf5filtered union all select * from bf2) t " +
"join bf3 on t.c5 = bf3.c3 where bf3.a3 = 5")
}
}

The log on line 489 needs to be fixed, the statement "bf5filtered has 14168 bytes and bf2 has 3409 bytes" is likely no longer accurate now. And the threshold on line 498 can be changed to 16000, the exact value is 15049.

@Fokko Fokko force-pushed the fd-bump-parquet branch from 7f52ae6 to 268dea7 Compare June 19, 2024 08:10
@Fokko
Copy link
Contributor Author

Fokko commented Jun 19, 2024

@LuciferYang Thanks again for the elaborate pointers. I just switched jobs and got a new laptop, so I have to reconfigure everything :) I'll keep an eye on the CI

@sunchao
Copy link
Member

sunchao commented Jun 19, 2024

Looks like very promising! Thanks all for the work! The failed tests do not seem related - I just re-triggered the CI jobs to be sure.

One thing nice to have is to do a bit perf comparison using benchmarks like DataSourceReadBenchmark and DataSourceWriteBenchmark, just to make sure there is no regression.

@Fokko Fokko force-pushed the fd-bump-parquet branch from 268dea7 to 5fecb05 Compare June 21, 2024 14:37
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, it looks good to me too if CI passes.

BTW, as @sunchao mentioned, do you think you can run the benchmark to make it sure, @Fokko ?

@steveloughran
Copy link
Contributor

has anyone set up a nightly jenkins with stable spark and its tests set to run off a nightly build of parquet? would seem a good way to catch regressions early -provided the test failures get attention. That's always a problem with cross project builds

@LuciferYang
Copy link
Contributor

@Fokko Do you have time to move this pr forward?

@Fokko Fokko force-pushed the fd-bump-parquet branch from 5fecb05 to 03ab2ce Compare July 1, 2024 07:04
@Fokko
Copy link
Contributor Author

Fokko commented Jul 1, 2024

@LuciferYang Yes, let me get right to it!

@Fokko
Copy link
Contributor Author

Fokko commented Jul 1, 2024

Sorry for the long wait, that's quite a comprehensive test suite. I've ran the benchmarks both on the main branch and this branch:

This branch

DataSourceReadBenchmark

[info] running (fork) org.apache.spark.sql.execution.benchmark.DataSourceReadBenchmark 
[error] WARNING: Using incubator modules: jdk.incubator.vector
[info] 10:54:40.855 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[info] Running benchmark: SQL Single BOOLEAN Column Scan
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 11370 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 7099 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 42 iterations, 2032 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 39 iterations, 2033 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 2695 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 2472 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 37 iterations, 2048 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 2723 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single BOOLEAN Column Scan:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            5677           5685          12          2.8         360.9       1.0X
[info] SQL Json                                           3517           3550          46          4.5         223.6       1.6X
[info] SQL Parquet Vectorized: DataPageV1                   42             48           5        372.1           2.7     134.3X
[info] SQL Parquet Vectorized: DataPageV2                   45             52           6        350.1           2.9     126.4X
[info] SQL Parquet MR: DataPageV1                         1347           1348           0         11.7          85.7       4.2X
[info] SQL Parquet MR: DataPageV2                         1220           1236          23         12.9          77.6       4.7X
[info] SQL ORC Vectorized                                   52             55           3        300.6           3.3     108.5X
[info] SQL ORC MR                                         1306           1362          78         12.0          83.0       4.3X
[info] Running benchmark: Parquet Reader Single BOOLEAN Column Scan
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 45 iterations, 2010 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 34 iterations, 2063 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV1
[info]   Stopped after 70 iterations, 2000 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV2
[info]   Stopped after 60 iterations, 2007 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single BOOLEAN Column Scan:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1                    41             45           3        382.7           2.6       1.0X
[info] ParquetReader Vectorized: DataPageV2                    51             61           9        308.2           3.2       0.8X
[info] ParquetReader Vectorized -> Row: DataPageV1             22             29           4        717.9           1.4       1.9X
[info] ParquetReader Vectorized -> Row: DataPageV2             30             33           3        533.1           1.9       1.4X
[info] Running benchmark: SQL Single TINYINT Column Scan
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 12625 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 8105 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 32 iterations, 2040 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 33 iterations, 2036 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3119 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 2517 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 31 iterations, 2060 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3078 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single TINYINT Column Scan:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            6208           6313         148          2.5         394.7       1.0X
[info] SQL Json                                           4022           4053          43          3.9         255.7       1.5X
[info] SQL Parquet Vectorized: DataPageV1                   56             64           8        282.6           3.5     111.5X
[info] SQL Parquet Vectorized: DataPageV2                   58             62           6        273.2           3.7     107.8X
[info] SQL Parquet MR: DataPageV1                         1538           1560          31         10.2          97.8       4.0X
[info] SQL Parquet MR: DataPageV2                         1255           1259           6         12.5          79.8       4.9X
[info] SQL ORC Vectorized                                   58             66           7        272.8           3.7     107.7X
[info] SQL ORC MR                                         1519           1539          29         10.4          96.6       4.1X
[info] Running benchmark: Parquet Reader Single TINYINT Column Scan
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 30 iterations, 2052 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 30 iterations, 2009 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV1
[info]   Stopped after 66 iterations, 2017 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV2
[info]   Stopped after 70 iterations, 2023 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single TINYINT Column Scan:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1                    66             68           1        239.4           4.2       1.0X
[info] ParquetReader Vectorized: DataPageV2                    65             67           2        242.9           4.1       1.0X
[info] ParquetReader Vectorized -> Row: DataPageV1             25             31           1        624.0           1.6       2.6X
[info] ParquetReader Vectorized -> Row: DataPageV2             25             29           2        628.7           1.6       2.6X
[info] Running benchmark: SQL Single SMALLINT Column Scan
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 12916 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 8438 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 27 iterations, 2061 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 26 iterations, 2070 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3174 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 2953 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 23 iterations, 2056 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 2881 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single SMALLINT Column Scan:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            6439           6458          28          2.4         409.4       1.0X
[info] SQL Json                                           4186           4219          48          3.8         266.1       1.5X
[info] SQL Parquet Vectorized: DataPageV1                   66             76          13        239.7           4.2      98.1X
[info] SQL Parquet Vectorized: DataPageV2                   72             80           7        218.8           4.6      89.6X
[info] SQL Parquet MR: DataPageV1                         1550           1587          52         10.1          98.6       4.2X
[info] SQL Parquet MR: DataPageV2                         1453           1477          34         10.8          92.4       4.4X
[info] SQL ORC Vectorized                                   84             89           5        186.7           5.4      76.4X
[info] SQL ORC MR                                         1439           1441           3         10.9          91.5       4.5X
[info] Running benchmark: Parquet Reader Single SMALLINT Column Scan
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 24 iterations, 2016 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 21 iterations, 2001 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV1
[info]   Stopped after 25 iterations, 2049 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV2
[info]   Stopped after 22 iterations, 2052 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single SMALLINT Column Scan:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1                    80             84           4        195.6           5.1       1.0X
[info] ParquetReader Vectorized: DataPageV2                    92             95           3        170.4           5.9       0.9X
[info] ParquetReader Vectorized -> Row: DataPageV1             81             82           1        194.6           5.1       1.0X
[info] ParquetReader Vectorized -> Row: DataPageV2             92             93           1        171.0           5.8       0.9X
[info] Running benchmark: SQL Single INT Column Scan
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 13074 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 8973 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 29 iterations, 2003 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 18 iterations, 2017 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3067 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 2820 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 18 iterations, 2054 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 2859 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single INT Column Scan:               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            6436           6537         143          2.4         409.2       1.0X
[info] SQL Json                                           4486           4487           2          3.5         285.2       1.4X
[info] SQL Parquet Vectorized: DataPageV1                   59             69          18        268.3           3.7     109.8X
[info] SQL Parquet Vectorized: DataPageV2                  106            112           5        148.4           6.7      60.7X
[info] SQL Parquet MR: DataPageV1                         1528           1534           9         10.3          97.1       4.2X
[info] SQL Parquet MR: DataPageV2                         1402           1410          11         11.2          89.1       4.6X
[info] SQL ORC Vectorized                                  110            114           4        143.5           7.0      58.7X
[info] SQL ORC MR                                         1411           1430          26         11.1          89.7       4.6X
[info] Running benchmark: Parquet Reader Single INT Column Scan
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 22 iterations, 2054 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 14 iterations, 2048 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV1
[info]   Stopped after 26 iterations, 2019 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV2
[info]   Stopped after 17 iterations, 2115 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single INT Column Scan:       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1                    92             93           1        171.6           5.8       1.0X
[info] ParquetReader Vectorized: DataPageV2                   142            146           6        111.0           9.0       0.6X
[info] ParquetReader Vectorized -> Row: DataPageV1             76             78           2        206.0           4.9       1.2X
[info] ParquetReader Vectorized -> Row: DataPageV2            123            124           1        128.0           7.8       0.7X
[info] Running benchmark: SQL Single BIGINT Column Scan
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 13054 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 8794 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 13 iterations, 2079 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 16 iterations, 2027 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3519 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3042 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 18 iterations, 2076 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3202 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single BIGINT Column Scan:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            6451           6527         109          2.4         410.1       1.0X
[info] SQL Json                                           4394           4397           4          3.6         279.4       1.5X
[info] SQL Parquet Vectorized: DataPageV1                  142            160          15        110.7           9.0      45.4X
[info] SQL Parquet Vectorized: DataPageV2                  119            127           6        132.2           7.6      54.2X
[info] SQL Parquet MR: DataPageV1                         1746           1760          19          9.0         111.0       3.7X
[info] SQL Parquet MR: DataPageV2                         1499           1521          32         10.5          95.3       4.3X
[info] SQL ORC Vectorized                                  108            115           8        145.2           6.9      59.5X
[info] SQL ORC MR                                         1580           1601          29         10.0         100.5       4.1X
[info] Running benchmark: Parquet Reader Single BIGINT Column Scan
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 12 iterations, 2136 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 13 iterations, 2071 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV1
[info]   Stopped after 14 iterations, 2151 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV2
[info]   Stopped after 15 iterations, 2075 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single BIGINT Column Scan:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1                   172            178          10         91.6          10.9       1.0X
[info] ParquetReader Vectorized: DataPageV2                   158            159           1         99.6          10.0       1.1X
[info] ParquetReader Vectorized -> Row: DataPageV1            152            154           2        103.4           9.7       1.1X
[info] ParquetReader Vectorized -> Row: DataPageV2            137            138           1        114.9           8.7       1.3X
[info] Running benchmark: SQL Single FLOAT Column Scan
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 13474 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 9857 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 26 iterations, 2006 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 30 iterations, 2048 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3290 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3089 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 17 iterations, 2025 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3319 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single FLOAT Column Scan:             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            6650           6737         124          2.4         422.8       1.0X
[info] SQL Json                                           4923           4929           8          3.2         313.0       1.4X
[info] SQL Parquet Vectorized: DataPageV1                   60             77          22        264.0           3.8     111.6X
[info] SQL Parquet Vectorized: DataPageV2                   58             68           9        270.8           3.7     114.5X
[info] SQL Parquet MR: DataPageV1                         1633           1645          18          9.6         103.8       4.1X
[info] SQL Parquet MR: DataPageV2                         1543           1545           3         10.2          98.1       4.3X
[info] SQL ORC Vectorized                                  113            119           5        139.1           7.2      58.8X
[info] SQL ORC MR                                         1659           1660           1          9.5         105.5       4.0X
[info] Running benchmark: Parquet Reader Single FLOAT Column Scan
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 22 iterations, 2040 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 22 iterations, 2010 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV1
[info]   Stopped after 27 iterations, 2048 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV2
[info]   Stopped after 27 iterations, 2048 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single FLOAT Column Scan:     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1                    90             93           1        175.1           5.7       1.0X
[info] ParquetReader Vectorized: DataPageV2                    88             91           2        178.1           5.6       1.0X
[info] ParquetReader Vectorized -> Row: DataPageV1             72             76           3        217.2           4.6       1.2X
[info] ParquetReader Vectorized -> Row: DataPageV2             72             76           3        217.6           4.6       1.2X
[info] Running benchmark: SQL Single DOUBLE Column Scan
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 14018 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 10027 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 14 iterations, 2079 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 14 iterations, 2021 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3516 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3553 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 7 iterations, 2210 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3581 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single DOUBLE Column Scan:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            6969           7009          57          2.3         443.1       1.0X
[info] SQL Json                                           5001           5014          18          3.1         318.0       1.4X
[info] SQL Parquet Vectorized: DataPageV1                  143            149           6        109.7           9.1      48.6X
[info] SQL Parquet Vectorized: DataPageV2                  140            144           2        112.0           8.9      49.6X
[info] SQL Parquet MR: DataPageV1                         1709           1758          69          9.2         108.7       4.1X
[info] SQL Parquet MR: DataPageV2                         1710           1777          95          9.2         108.7       4.1X
[info] SQL ORC Vectorized                                  311            316           4         50.5          19.8      22.4X
[info] SQL ORC MR                                         1779           1791          16          8.8         113.1       3.9X
[info] Running benchmark: Parquet Reader Single DOUBLE Column Scan
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 12 iterations, 2041 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 12 iterations, 2049 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV1
[info]   Stopped after 13 iterations, 2010 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV2
[info]   Stopped after 13 iterations, 2004 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single DOUBLE Column Scan:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1                   168            170           2         93.6          10.7       1.0X
[info] ParquetReader Vectorized: DataPageV2                   168            171           2         93.6          10.7       1.0X
[info] ParquetReader Vectorized -> Row: DataPageV1            153            155           1        102.7           9.7       1.1X
[info] ParquetReader Vectorized -> Row: DataPageV2            152            154           1        103.2           9.7       1.1X
[info] Running benchmark: SQL Single TINYINT Column Scan in Struct
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3814 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3902 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Enabled)
[info]   Stopped after 11 iterations, 2028 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3372 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3688 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info]   Stopped after 29 iterations, 2007 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3113 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3562 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info]   Stopped after 31 iterations, 2021 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single TINYINT Column Scan in Struct:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR                                                            1793           1907         161          8.8         114.0       1.0X
[info] SQL ORC Vectorized (Nested Column Disabled)                           1781           1951         241          8.8         113.2       1.0X
[info] SQL ORC Vectorized (Nested Column Enabled)                             166            184          20         94.7          10.6      10.8X
[info] SQL Parquet MR: DataPageV1                                            1658           1686          40          9.5         105.4       1.1X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)           1838           1844           9          8.6         116.8       1.0X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)              58             69          12        269.6           3.7      30.7X
[info] SQL Parquet MR: DataPageV2                                            1533           1557          34         10.3          97.5       1.2X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)           1775           1781           9          8.9         112.8       1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)              58             65           7        272.1           3.7      31.0X
[info] Running benchmark: SQL Single SMALLINT Column Scan in Struct
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3021 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Disabled)
[info]   Stopped after 2 iterations, 2975 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Enabled)
[info]   Stopped after 11 iterations, 2063 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3388 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3653 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info]   Stopped after 29 iterations, 2069 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3165 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3454 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info]   Stopped after 18 iterations, 2053 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single SMALLINT Column Scan in Struct:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR                                                            1468           1511          60         10.7          93.4       1.0X
[info] SQL ORC Vectorized (Nested Column Disabled)                           1484           1488           5         10.6          94.4       1.0X
[info] SQL ORC Vectorized (Nested Column Enabled)                             184            188           5         85.7          11.7       8.0X
[info] SQL Parquet MR: DataPageV1                                            1690           1694           6          9.3         107.4       0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)           1819           1827          11          8.6         115.6       0.8X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)              61             71           8        255.9           3.9      23.9X
[info] SQL Parquet MR: DataPageV2                                            1581           1583           3          9.9         100.5       0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)           1724           1727           4          9.1         109.6       0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)             107            114           8        147.4           6.8      13.8X
[info] Running benchmark: SQL Single INT Column Scan in Struct
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3417 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3437 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Enabled)
[info]   Stopped after 10 iterations, 2103 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3545 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3797 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info]   Stopped after 28 iterations, 2035 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3402 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3709 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info]   Stopped after 22 iterations, 2040 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single INT Column Scan in Struct:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR                                                            1703           1709           9          9.2         108.3       1.0X
[info] SQL ORC Vectorized (Nested Column Disabled)                           1709           1719          14          9.2         108.7       1.0X
[info] SQL ORC Vectorized (Nested Column Enabled)                             194            210          34         80.9          12.4       8.8X
[info] SQL Parquet MR: DataPageV1                                            1754           1773          27          9.0         111.5       1.0X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)           1897           1899           2          8.3         120.6       0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)              61             73           9        258.6           3.9      28.0X
[info] SQL Parquet MR: DataPageV2                                            1692           1701          12          9.3         107.6       1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)           1854           1855           1          8.5         117.9       0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)              85             93           4        184.2           5.4      19.9X
[info] Running benchmark: SQL Single BIGINT Column Scan in Struct
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3061 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3032 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Enabled)
[info]   Stopped after 11 iterations, 2150 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3677 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3977 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info]   Stopped after 14 iterations, 2056 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3255 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3508 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info]   Stopped after 15 iterations, 2005 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single BIGINT Column Scan in Struct:                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR                                                            1526           1531           6         10.3          97.0       1.0X
[info] SQL ORC Vectorized (Nested Column Disabled)                           1515           1516           1         10.4          96.4       1.0X
[info] SQL ORC Vectorized (Nested Column Enabled)                             190            195           5         82.8          12.1       8.0X
[info] SQL Parquet MR: DataPageV1                                            1821           1839          25          8.6         115.8       0.8X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)           1969           1989          28          8.0         125.2       0.8X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)             143            147           4        110.1           9.1      10.7X
[info] SQL Parquet MR: DataPageV2                                            1617           1628          15          9.7         102.8       0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)           1754           1754           1          9.0         111.5       0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)             120            134          10        130.6           7.7      12.7X
[info] Running benchmark: SQL Single FLOAT Column Scan in Struct
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 2902 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Disabled)
[info]   Stopped after 2 iterations, 2906 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Enabled)
[info]   Stopped after 11 iterations, 2140 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3582 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3938 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info]   Stopped after 28 iterations, 2025 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3295 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3646 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info]   Stopped after 28 iterations, 2001 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single FLOAT Column Scan in Struct:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR                                                            1443           1451          11         10.9          91.8       1.0X
[info] SQL ORC Vectorized (Nested Column Disabled)                           1440           1453          19         10.9          91.5       1.0X
[info] SQL ORC Vectorized (Nested Column Enabled)                             184            195           9         85.4          11.7       7.8X
[info] SQL Parquet MR: DataPageV1                                            1778           1791          19          8.8         113.0       0.8X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)           1947           1969          31          8.1         123.8       0.7X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)              63             72           5        248.4           4.0      22.8X
[info] SQL Parquet MR: DataPageV2                                            1648           1648           0          9.5         104.8       0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)           1820           1823           5          8.6         115.7       0.8X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)              65             71           9        242.4           4.1      22.2X
[info] Running benchmark: SQL Single DOUBLE Column Scan in Struct
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3537 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3609 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Enabled)
[info]   Stopped after 6 iterations, 2320 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3647 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3956 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info]   Stopped after 14 iterations, 2071 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3463 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3819 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info]   Stopped after 14 iterations, 2092 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single DOUBLE Column Scan in Struct:                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR                                                            1763           1769           8          8.9         112.1       1.0X
[info] SQL ORC Vectorized (Nested Column Disabled)                           1791           1805          20          8.8         113.8       1.0X
[info] SQL ORC Vectorized (Nested Column Enabled)                             377            387          10         41.7          24.0       4.7X
[info] SQL Parquet MR: DataPageV1                                            1809           1824          20          8.7         115.0       1.0X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)           1977           1978           1          8.0         125.7       0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)             146            148           2        107.9           9.3      12.1X
[info] SQL Parquet MR: DataPageV2                                            1718           1732          19          9.2         109.2       1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)           1905           1910           7          8.3         121.1       0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)             146            149           3        107.5           9.3      12.1X
[info] Running benchmark: SQL Nested Column Scan
[info]   Running case: SQL ORC MR
[info]   Stopped after 10 iterations, 62690 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Disabled)
[info]   Stopped after 10 iterations, 61859 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Enabled)
[info]   Stopped after 10 iterations, 23893 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 10 iterations, 41404 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info]   Stopped after 10 iterations, 43456 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info]   Stopped after 10 iterations, 22753 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 10 iterations, 46835 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info]   Stopped after 10 iterations, 49182 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info]   Stopped after 10 iterations, 18211 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Nested Column Scan:                                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR                                                            6146           6269          78          0.2        5861.1       1.0X
[info] SQL ORC Vectorized (Nested Column Disabled)                           6024           6186         116          0.2        5745.0       1.0X
[info] SQL ORC Vectorized (Nested Column Enabled)                            2363           2389          25          0.4        2253.7       2.6X
[info] SQL Parquet MR: DataPageV1                                            4106           4140          20          0.3        3916.2       1.5X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)           4288           4346          41          0.2        4089.6       1.4X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)            2131           2275         101          0.5        2032.0       2.9X
[info] SQL Parquet MR: DataPageV2                                            4636           4684          31          0.2        4421.4       1.3X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)           4873           4918          34          0.2        4647.3       1.3X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)            1795           1821          12          0.6        1711.7       3.4X
[info] Running benchmark: Int and String Scan
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 12344 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 9227 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 3 iterations, 2641 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 2 iterations, 2070 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 5220 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 4813 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 2 iterations, 2185 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 5517 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Int and String Scan:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            6150           6172          32          1.7         586.5       1.0X
[info] SQL Json                                           4610           4614           6          2.3         439.6       1.3X
[info] SQL Parquet Vectorized: DataPageV1                  875            880           7         12.0          83.5       7.0X
[info] SQL Parquet Vectorized: DataPageV2                 1027           1035          11         10.2          98.0       6.0X
[info] SQL Parquet MR: DataPageV1                         2609           2610           2          4.0         248.8       2.4X
[info] SQL Parquet MR: DataPageV2                         2406           2407           1          4.4         229.4       2.6X
[info] SQL ORC Vectorized                                 1092           1093           1          9.6         104.2       5.6X
[info] SQL ORC MR                                         2705           2759          75          3.9         258.0       2.3X
[info] Running benchmark: Repeated String
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 7169 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 5833 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 6 iterations, 2338 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 6 iterations, 2362 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 2564 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 2277 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 8 iterations, 2058 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 2217 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Repeated String:                          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            3579           3585           8          2.9         341.3       1.0X
[info] SQL Json                                           2904           2917          17          3.6         277.0       1.2X
[info] SQL Parquet Vectorized: DataPageV1                  386            390           4         27.2          36.8       9.3X
[info] SQL Parquet Vectorized: DataPageV2                  390            394           3         26.9          37.2       9.2X
[info] SQL Parquet MR: DataPageV1                         1281           1282           2          8.2         122.2       2.8X
[info] SQL Parquet MR: DataPageV2                         1127           1139          17          9.3         107.5       3.2X
[info] SQL ORC Vectorized                                  242            257          28         43.4          23.0      14.8X
[info] SQL ORC MR                                         1104           1109           6          9.5         105.3       3.2X
[info] Running benchmark: Partitioned Table
[info]   Running case: Data column - CSV
[info]   Stopped after 2 iterations, 13828 ms
[info]   Running case: Data column - Json
[info]   Stopped after 2 iterations, 8637 ms
[info]   Running case: Data column - Parquet Vectorized: DataPageV1
[info]   Stopped after 27 iterations, 2021 ms
[info]   Running case: Data column - Parquet Vectorized: DataPageV2
[info]   Stopped after 22 iterations, 2010 ms
[info]   Running case: Data column - Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3474 ms
[info]   Running case: Data column - Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3346 ms
[info]   Running case: Data column - ORC Vectorized
[info]   Stopped after 17 iterations, 2015 ms
[info]   Running case: Data column - ORC MR
[info]   Stopped after 2 iterations, 3251 ms
[info]   Running case: Partition column - CSV
[info]   Stopped after 2 iterations, 4098 ms
[info]   Running case: Partition column - Json
[info]   Stopped after 2 iterations, 7987 ms
[info]   Running case: Partition column - Parquet Vectorized: DataPageV1
[info]   Stopped after 71 iterations, 2004 ms
[info]   Running case: Partition column - Parquet Vectorized: DataPageV2
[info]   Stopped after 77 iterations, 2009 ms
[info]   Running case: Partition column - Parquet MR: DataPageV1
[info]   Stopped after 3 iterations, 2716 ms
[info]   Running case: Partition column - Parquet MR: DataPageV2
[info]   Stopped after 3 iterations, 2664 ms
[info]   Running case: Partition column - ORC Vectorized
[info]   Stopped after 71 iterations, 2019 ms
[info]   Running case: Partition column - ORC MR
[info]   Stopped after 2 iterations, 2034 ms
[info]   Running case: Both columns - CSV
[info]   Stopped after 2 iterations, 13219 ms
[info]   Running case: Both columns - Json
[info]   Stopped after 2 iterations, 8792 ms
[info]   Running case: Both columns - Parquet Vectorized: DataPageV1
[info]   Stopped after 27 iterations, 2022 ms
[info]   Running case: Both columns - Parquet Vectorized: DataPageV2
[info]   Stopped after 20 iterations, 2034 ms
[info]   Running case: Both columns - Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3653 ms
[info]   Running case: Both columns - Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3523 ms
[info]   Running case: Both columns - ORC Vectorized
[info]   Stopped after 15 iterations, 2070 ms
[info]   Running case: Both columns - ORC MR
[info]   Stopped after 2 iterations, 3394 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Partitioned Table:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------
[info] Data column - CSV                                           6861           6914          76          2.3         436.2       1.0X
[info] Data column - Json                                          4307           4319          17          3.7         273.8       1.6X
[info] Data column - Parquet Vectorized: DataPageV1                  59             75          12        267.1           3.7     116.5X
[info] Data column - Parquet Vectorized: DataPageV2                  82             91           7        190.8           5.2      83.2X
[info] Data column - Parquet MR: DataPageV1                        1722           1737          21          9.1         109.5       4.0X
[info] Data column - Parquet MR: DataPageV2                        1660           1673          19          9.5         105.5       4.1X
[info] Data column - ORC Vectorized                                 112            119           5        140.3           7.1      61.2X
[info] Data column - ORC MR                                        1575           1626          72         10.0         100.1       4.4X
[info] Partition column - CSV                                      2043           2049           9          7.7         129.9       3.4X
[info] Partition column - Json                                     3986           3994          10          3.9         253.4       1.7X
[info] Partition column - Parquet Vectorized: DataPageV1             24             28           5        668.3           1.5     291.5X
[info] Partition column - Parquet Vectorized: DataPageV2             23             26           3        697.1           1.4     304.1X
[info] Partition column - Parquet MR: DataPageV1                    903            906           3         17.4          57.4       7.6X
[info] Partition column - Parquet MR: DataPageV2                    860            888          29         18.3          54.7       8.0X
[info] Partition column - ORC Vectorized                             25             28           3        640.0           1.6     279.2X
[info] Partition column - ORC MR                                    980           1017          53         16.1          62.3       7.0X
[info] Both columns - CSV                                          6606           6610           5          2.4         420.0       1.0X
[info] Both columns - Json                                         4383           4396          19          3.6         278.6       1.6X
[info] Both columns - Parquet Vectorized: DataPageV1                 70             75           3        224.0           4.5      97.7X
[info] Both columns - Parquet Vectorized: DataPageV2                 97            102           6        161.9           6.2      70.6X
[info] Both columns - Parquet MR: DataPageV1                       1809           1827          25          8.7         115.0       3.8X
[info] Both columns - Parquet MR: DataPageV2                       1735           1762          38          9.1         110.3       4.0X
[info] Both columns - ORC Vectorized                                133            138           3        118.2           8.5      51.6X
[info] Both columns - ORC MR                                       1630           1697          95          9.7         103.6       4.2X
[info] Running benchmark: String with Nulls Scan (0.0%)
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 8947 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 7904 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 4 iterations, 2146 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 3 iterations, 2207 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 4289 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 4373 ms
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 6 iterations, 2002 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 4 iterations, 2153 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 5 iterations, 2367 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3630 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] String with Nulls Scan (0.0%):            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            4370           4474         147          2.4         416.7       1.0X
[info] SQL Json                                           3939           3952          19          2.7         375.6       1.1X
[info] SQL Parquet Vectorized: DataPageV1                  535            537           3         19.6          51.0       8.2X
[info] SQL Parquet Vectorized: DataPageV2                  735            736           1         14.3          70.1       5.9X
[info] SQL Parquet MR: DataPageV1                         2140           2145           7          4.9         204.0       2.0X
[info] SQL Parquet MR: DataPageV2                         2179           2187          11          4.8         207.8       2.0X
[info] ParquetReader Vectorized: DataPageV1                332            334           2         31.6          31.6      13.2X
[info] ParquetReader Vectorized: DataPageV2                536            538           2         19.5          51.2       8.1X
[info] SQL ORC Vectorized                                  463            474          12         22.6          44.2       9.4X
[info] SQL ORC MR                                         1805           1815          15          5.8         172.1       2.4X
[info] Running benchmark: String with Nulls Scan (50.0%)
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 6799 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 7101 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 5 iterations, 2189 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 4 iterations, 2306 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3879 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 4176 ms
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 6 iterations, 2138 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 4 iterations, 2004 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 4 iterations, 2553 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 4107 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] String with Nulls Scan (50.0%):           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            3387           3400          18          3.1         323.0       1.0X
[info] SQL Json                                           3524           3551          37          3.0         336.1       1.0X
[info] SQL Parquet Vectorized: DataPageV1                  436            438           2         24.1          41.5       7.8X
[info] SQL Parquet Vectorized: DataPageV2                  572            577           5         18.3          54.5       5.9X
[info] SQL Parquet MR: DataPageV1                         1940           1940           0          5.4         185.0       1.7X
[info] SQL Parquet MR: DataPageV2                         2087           2088           2          5.0         199.0       1.6X
[info] ParquetReader Vectorized: DataPageV1                355            356           1         29.6          33.8       9.5X
[info] ParquetReader Vectorized: DataPageV2                497            501           4         21.1          47.4       6.8X
[info] SQL ORC Vectorized                                  636            638           3         16.5          60.6       5.3X
[info] SQL ORC MR                                         2042           2054          17          5.1         194.8       1.7X
[info] Running benchmark: String with Nulls Scan (95.0%)
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 5083 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 5235 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 20 iterations, 2067 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 19 iterations, 2008 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 2805 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 2727 ms
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 22 iterations, 2049 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 21 iterations, 2090 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 11 iterations, 2135 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 2567 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] String with Nulls Scan (95.0%):           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            2526           2542          23          4.2         240.9       1.0X
[info] SQL Json                                           2610           2618          11          4.0         248.9       1.0X
[info] SQL Parquet Vectorized: DataPageV1                   97            103           5        108.2           9.2      26.1X
[info] SQL Parquet Vectorized: DataPageV2                  100            106           3        104.4           9.6      25.1X
[info] SQL Parquet MR: DataPageV1                         1378           1403          35          7.6         131.4       1.8X
[info] SQL Parquet MR: DataPageV2                         1363           1364           1          7.7         130.0       1.9X
[info] ParquetReader Vectorized: DataPageV1                 91             93           2        115.2           8.7      27.8X
[info] ParquetReader Vectorized: DataPageV2                 98            100           1        107.5           9.3      25.9X
[info] SQL ORC Vectorized                                  189            194           4         55.5          18.0      13.4X
[info] SQL ORC MR                                         1240           1284          62          8.5         118.2       2.0X
[info] Running benchmark: Single Column Scan from 10 columns
[info]   Running case: SQL CSV
[info]   Stopped after 3 iterations, 2377 ms
[info]   Running case: SQL Json
[info]   Stopped after 3 iterations, 2445 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 80 iterations, 2003 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 79 iterations, 2000 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 15 iterations, 2037 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 16 iterations, 2112 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 74 iterations, 2001 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 17 iterations, 2025 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Single Column Scan from 10 columns:       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                             792            792           0          1.3         755.3       1.0X
[info] SQL Json                                            811            815           4          1.3         773.6       1.0X
[info] SQL Parquet Vectorized: DataPageV1                   21             25           5         49.9          20.0      37.7X
[info] SQL Parquet Vectorized: DataPageV2                   22             25           3         47.4          21.1      35.8X
[info] SQL Parquet MR: DataPageV1                          133            136           2          7.9         127.0       5.9X
[info] SQL Parquet MR: DataPageV2                          127            132           3          8.3         120.8       6.3X
[info] SQL ORC Vectorized                                   23             27           3         44.9          22.3      33.9X
[info] SQL ORC MR                                          105            119           5         10.0          99.9       7.6X
[info] 11:32:00.966 WARN org.apache.spark.sql.catalyst.util.SparkStringUtils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
[info] Running benchmark: Single Column Scan from 50 columns
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 2741 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 5693 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 73 iterations, 2012 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 70 iterations, 2007 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 16 iterations, 2127 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 16 iterations, 2002 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 62 iterations, 2021 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 16 iterations, 2092 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Single Column Scan from 50 columns:       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            1356           1371          21          0.8        1292.8       1.0X
[info] SQL Json                                           2846           2847           1          0.4        2714.2       0.5X
[info] SQL Parquet Vectorized: DataPageV1                   23             28           5         45.9          21.8      59.4X
[info] SQL Parquet Vectorized: DataPageV2                   25             29           3         42.6          23.5      55.1X
[info] SQL Parquet MR: DataPageV1                          129            133           3          8.1         123.0      10.5X
[info] SQL Parquet MR: DataPageV2                          115            125           5          9.1         110.1      11.7X
[info] SQL ORC Vectorized                                   27             33           6         38.3          26.1      49.6X
[info] SQL ORC MR                                          127            131           2          8.3         120.7      10.7X
[info] Running benchmark: Single Column Scan from 100 columns
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 4301 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 11071 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 51 iterations, 2029 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 61 iterations, 2029 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 15 iterations, 2040 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 16 iterations, 2005 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 54 iterations, 2028 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 15 iterations, 2108 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Single Column Scan from 100 columns:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            2140           2151          16          0.5        2040.4       1.0X
[info] SQL Json                                           5309           5536         320          0.2        5063.2       0.4X
[info] SQL Parquet Vectorized: DataPageV1                   31             40          13         34.4          29.1      70.1X
[info] SQL Parquet Vectorized: DataPageV2                   30             33           3         35.5          28.2      72.5X
[info] SQL Parquet MR: DataPageV1                          127            136           5          8.2         121.4      16.8X
[info] SQL Parquet MR: DataPageV2                          121            125           4          8.7         115.0      17.7X
[info] SQL ORC Vectorized                                   33             38           3         31.8          31.5      64.8X
[info] SQL ORC MR                                          138            141           3          7.6         131.3      15.5X
[success] Total time: 2498 s (41:38), completed Jul 1, 2024, 11:35:08 AM

BuiltInDataSourceWriteBenchmark

[info] running (fork) org.apache.spark.sql.execution.benchmark.BuiltInDataSourceWriteBenchmark parquet
[error] WARNING: Using incubator modules: jdk.incubator.vector
[info] 11:37:49.340 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[info] Running benchmark: parquet writer benchmark
[info]   Running case: Output Single Int Column
[info]   Stopped after 2 iterations, 2962 ms
[info]   Running case: Output Single Double Column
[info]   Stopped after 2 iterations, 3030 ms
[info]   Running case: Output Int and String Column
[info]   Stopped after 2 iterations, 5383 ms
[info]   Running case: Output Partitions
[info]   Stopped after 2 iterations, 4587 ms
[info]   Running case: Output Buckets
[info]   Stopped after 2 iterations, 5951 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] parquet writer benchmark:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Output Single Int Column                           1405           1481         107         11.2          89.4       1.0X
[info] Output Single Double Column                        1507           1515          12         10.4          95.8       0.9X
[info] Output Int and String Column                       2689           2692           4          5.8         171.0       0.5X
[info] Output Partitions                                  2289           2294           6          6.9         145.6       0.6X
[info] Output Buckets                                     2973           2976           4          5.3         189.0       0.5X
[success] Total time: 108 s (01:48), completed Jul 1, 2024, 11:38:28 AM

Main branch

DataSourceReadBenchmark

[info] running (fork) org.apache.spark.sql.execution.benchmark.DataSourceReadBenchmark 
[error] WARNING: Using incubator modules: jdk.incubator.vector
[info] 11:40:13.033 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[info] Running benchmark: SQL Single BOOLEAN Column Scan
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 11283 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 6570 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 42 iterations, 2031 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 42 iterations, 2004 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 2693 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 2559 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 37 iterations, 2031 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 2971 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single BOOLEAN Column Scan:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            5637           5642           7          2.8         358.4       1.0X
[info] SQL Json                                           3237           3285          69          4.9         205.8       1.7X
[info] SQL Parquet Vectorized: DataPageV1                   40             48           8        390.8           2.6     140.1X
[info] SQL Parquet Vectorized: DataPageV2                   42             48           5        371.3           2.7     133.1X
[info] SQL Parquet MR: DataPageV1                         1316           1347          43         11.9          83.7       4.3X
[info] SQL Parquet MR: DataPageV2                         1269           1280          15         12.4          80.7       4.4X
[info] SQL ORC Vectorized                                   50             55           3        312.1           3.2     111.8X
[info] SQL ORC MR                                         1465           1486          28         10.7          93.2       3.8X
[info] Running benchmark: Parquet Reader Single BOOLEAN Column Scan
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 50 iterations, 2017 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 43 iterations, 2024 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV1
[info]   Stopped after 110 iterations, 2010 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV2
[info]   Stopped after 77 iterations, 2022 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single BOOLEAN Column Scan:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1                    35             40           3        443.8           2.3       1.0X
[info] ParquetReader Vectorized: DataPageV2                    41             47           3        383.4           2.6       0.9X
[info] ParquetReader Vectorized -> Row: DataPageV1             17             18           1        924.0           1.1       2.1X
[info] ParquetReader Vectorized -> Row: DataPageV2             25             26           1        641.3           1.6       1.4X
[info] Running benchmark: SQL Single TINYINT Column Scan
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 11597 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 7678 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 34 iterations, 2008 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 36 iterations, 2109 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3264 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 2792 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 34 iterations, 2065 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 2931 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single TINYINT Column Scan:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            5622           5799         250          2.8         357.4       1.0X
[info] SQL Json                                           3820           3839          27          4.1         242.9       1.5X
[info] SQL Parquet Vectorized: DataPageV1                   53             59           5        297.8           3.4     106.5X
[info] SQL Parquet Vectorized: DataPageV2                   51             59          10        311.3           3.2     111.3X
[info] SQL Parquet MR: DataPageV1                         1626           1632           9          9.7         103.4       3.5X
[info] SQL Parquet MR: DataPageV2                         1379           1396          25         11.4          87.7       4.1X
[info] SQL ORC Vectorized                                   57             61           3        276.5           3.6      98.8X
[info] SQL ORC MR                                         1360           1466         150         11.6          86.5       4.1X
[info] Running benchmark: Parquet Reader Single TINYINT Column Scan
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 47 iterations, 2035 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 48 iterations, 2021 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV1
[info]   Stopped after 78 iterations, 2007 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV2
[info]   Stopped after 77 iterations, 2017 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single TINYINT Column Scan:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1                    41             43           2        386.8           2.6       1.0X
[info] ParquetReader Vectorized: DataPageV2                    40             42           1        388.7           2.6       1.0X
[info] ParquetReader Vectorized -> Row: DataPageV1             24             26           1        644.9           1.6       1.7X
[info] ParquetReader Vectorized -> Row: DataPageV2             25             26           1        640.3           1.6       1.7X
[info] Running benchmark: SQL Single SMALLINT Column Scan
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 12016 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 8230 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 27 iterations, 2061 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 27 iterations, 2065 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3281 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 2944 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 24 iterations, 2072 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3375 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single SMALLINT Column Scan:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            5939           6008          98          2.6         377.6       1.0X
[info] SQL Json                                           4103           4115          17          3.8         260.9       1.4X
[info] SQL Parquet Vectorized: DataPageV1                   62             76          18        255.0           3.9      96.3X
[info] SQL Parquet Vectorized: DataPageV2                   69             77          10        229.1           4.4      86.5X
[info] SQL Parquet MR: DataPageV1                         1610           1641          44          9.8         102.3       3.7X
[info] SQL Parquet MR: DataPageV2                         1451           1472          31         10.8          92.2       4.1X
[info] SQL ORC Vectorized                                   82             86           4        191.3           5.2      72.2X
[info] SQL ORC MR                                         1682           1688           8          9.4         106.9       3.5X
[info] Running benchmark: Parquet Reader Single SMALLINT Column Scan
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 20 iterations, 2022 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 19 iterations, 2034 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV1
[info]   Stopped after 25 iterations, 2059 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV2
[info]   Stopped after 22 iterations, 2030 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single SMALLINT Column Scan:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1                    98            101           1        160.1           6.2       1.0X
[info] ParquetReader Vectorized: DataPageV2                   106            107           1        149.0           6.7       0.9X
[info] ParquetReader Vectorized -> Row: DataPageV1             80             82           7        196.5           5.1       1.2X
[info] ParquetReader Vectorized -> Row: DataPageV2             91             92           1        172.6           5.8       1.1X
[info] Running benchmark: SQL Single INT Column Scan
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 12657 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 8705 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 28 iterations, 2025 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 18 iterations, 2034 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 2939 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 2783 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 18 iterations, 2097 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3103 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single INT Column Scan:               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            6310           6329          26          2.5         401.2       1.0X
[info] SQL Json                                           4331           4353          30          3.6         275.4       1.5X
[info] SQL Parquet Vectorized: DataPageV1                   60             72          13        260.7           3.8     104.6X
[info] SQL Parquet Vectorized: DataPageV2                  105            113          11        149.2           6.7      59.9X
[info] SQL Parquet MR: DataPageV1                         1463           1470           9         10.8          93.0       4.3X
[info] SQL Parquet MR: DataPageV2                         1379           1392          17         11.4          87.7       4.6X
[info] SQL ORC Vectorized                                  108            117           8        145.7           6.9      58.5X
[info] SQL ORC MR                                         1524           1552          38         10.3          96.9       4.1X
[info] Running benchmark: Parquet Reader Single INT Column Scan
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 22 iterations, 2056 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 17 iterations, 2077 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV1
[info]   Stopped after 26 iterations, 2025 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV2
[info]   Stopped after 20 iterations, 2075 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single INT Column Scan:       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1                    90             93           5        175.7           5.7       1.0X
[info] ParquetReader Vectorized: DataPageV2                   119            122           4        132.5           7.5       0.8X
[info] ParquetReader Vectorized -> Row: DataPageV1             77             78           1        205.3           4.9       1.2X
[info] ParquetReader Vectorized -> Row: DataPageV2            102            104           1        153.5           6.5       0.9X
[info] Running benchmark: SQL Single BIGINT Column Scan
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 13403 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 8273 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 14 iterations, 2133 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 16 iterations, 2012 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3322 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3042 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 18 iterations, 2083 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 2902 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single BIGINT Column Scan:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            6564           6702         194          2.4         417.4       1.0X
[info] SQL Json                                           4125           4137          16          3.8         262.3       1.6X
[info] SQL Parquet Vectorized: DataPageV1                  144            152           8        109.1           9.2      45.6X
[info] SQL Parquet Vectorized: DataPageV2                  119            126           5        132.6           7.5      55.3X
[info] SQL Parquet MR: DataPageV1                         1638           1661          33          9.6         104.1       4.0X
[info] SQL Parquet MR: DataPageV2                         1517           1521           7         10.4          96.4       4.3X
[info] SQL ORC Vectorized                                  110            116           5        143.0           7.0      59.7X
[info] SQL ORC MR                                         1435           1451          24         11.0          91.2       4.6X
[info] Running benchmark: Parquet Reader Single BIGINT Column Scan
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 12 iterations, 2049 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 13 iterations, 2021 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV1
[info]   Stopped after 13 iterations, 2013 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV2
[info]   Stopped after 15 iterations, 2046 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single BIGINT Column Scan:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1                   168            171           2         93.7          10.7       1.0X
[info] ParquetReader Vectorized: DataPageV2                   152            156           4        103.2           9.7       1.1X
[info] ParquetReader Vectorized -> Row: DataPageV1            152            155           2        103.6           9.7       1.1X
[info] ParquetReader Vectorized -> Row: DataPageV2            135            136           1        116.5           8.6       1.2X
[info] Running benchmark: SQL Single FLOAT Column Scan
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 13299 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 9999 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 27 iterations, 2023 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 31 iterations, 2062 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3360 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3398 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 17 iterations, 2014 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3162 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single FLOAT Column Scan:             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            6619           6650          44          2.4         420.8       1.0X
[info] SQL Json                                           4968           5000          45          3.2         315.9       1.3X
[info] SQL Parquet Vectorized: DataPageV1                   64             75          14        244.9           4.1     103.0X
[info] SQL Parquet Vectorized: DataPageV2                   62             67           3        252.4           4.0     106.2X
[info] SQL Parquet MR: DataPageV1                         1646           1680          48          9.6         104.6       4.0X
[info] SQL Parquet MR: DataPageV2                         1670           1699          41          9.4         106.2       4.0X
[info] SQL ORC Vectorized                                  114            119           4        137.9           7.3      58.0X
[info] SQL ORC MR                                         1571           1581          15         10.0          99.9       4.2X
[info] Running benchmark: Parquet Reader Single FLOAT Column Scan
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 23 iterations, 2058 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 23 iterations, 2089 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV1
[info]   Stopped after 26 iterations, 2030 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV2
[info]   Stopped after 26 iterations, 2024 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single FLOAT Column Scan:     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1                    86             90           3        182.9           5.5       1.0X
[info] ParquetReader Vectorized: DataPageV2                    88             91           1        178.2           5.6       1.0X
[info] ParquetReader Vectorized -> Row: DataPageV1             77             78           1        203.5           4.9       1.1X
[info] ParquetReader Vectorized -> Row: DataPageV2             77             78           1        204.2           4.9       1.1X
[info] Running benchmark: SQL Single DOUBLE Column Scan
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 13141 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 10028 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 13 iterations, 2096 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 14 iterations, 2121 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3624 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3625 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 7 iterations, 2205 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3786 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single DOUBLE Column Scan:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            6562           6571          13          2.4         417.2       1.0X
[info] SQL Json                                           4994           5014          29          3.1         317.5       1.3X
[info] SQL Parquet Vectorized: DataPageV1                  146            161          23        107.7           9.3      44.9X
[info] SQL Parquet Vectorized: DataPageV2                  142            152          15        110.5           9.0      46.1X
[info] SQL Parquet MR: DataPageV1                         1789           1812          32          8.8         113.8       3.7X
[info] SQL Parquet MR: DataPageV2                         1806           1813          10          8.7         114.8       3.6X
[info] SQL ORC Vectorized                                  312            315           4         50.5          19.8      21.1X
[info] SQL ORC MR                                         1871           1893          32          8.4         118.9       3.5X
[info] Running benchmark: Parquet Reader Single DOUBLE Column Scan
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 13 iterations, 2165 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 12 iterations, 2010 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV1
[info]   Stopped after 13 iterations, 2007 ms
[info]   Running case: ParquetReader Vectorized -> Row: DataPageV2
[info]   Stopped after 13 iterations, 2002 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single DOUBLE Column Scan:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1                   164            167           2         95.8          10.4       1.0X
[info] ParquetReader Vectorized: DataPageV2                   166            168           1         94.6          10.6       1.0X
[info] ParquetReader Vectorized -> Row: DataPageV1            153            154           1        102.6           9.7       1.1X
[info] ParquetReader Vectorized -> Row: DataPageV2            153            154           1        102.9           9.7       1.1X
[info] Running benchmark: SQL Single TINYINT Column Scan in Struct
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3385 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3170 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Enabled)
[info]   Stopped after 26 iterations, 2065 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3339 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3737 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info]   Stopped after 29 iterations, 2048 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3278 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3460 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info]   Stopped after 32 iterations, 2048 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single TINYINT Column Scan in Struct:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR                                                            1662           1693          43          9.5         105.7       1.0X
[info] SQL ORC Vectorized (Nested Column Disabled)                           1540           1585          64         10.2          97.9       1.1X
[info] SQL ORC Vectorized (Nested Column Enabled)                              68             79          15        230.3           4.3      24.3X
[info] SQL Parquet MR: DataPageV1                                            1664           1670           8          9.5         105.8       1.0X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)           1840           1869          41          8.5         117.0       0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)              66             71           4        239.3           4.2      25.3X
[info] SQL Parquet MR: DataPageV2                                            1633           1639           9          9.6         103.8       1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)           1723           1730          11          9.1         109.5       1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)              57             64           5        274.6           3.6      29.0X
[info] Running benchmark: SQL Single SMALLINT Column Scan in Struct
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3679 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3897 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Enabled)
[info]   Stopped after 12 iterations, 2041 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3546 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3925 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info]   Stopped after 29 iterations, 2002 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3401 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3733 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info]   Stopped after 18 iterations, 2032 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single SMALLINT Column Scan in Struct:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR                                                            1807           1840          46          8.7         114.9       1.0X
[info] SQL ORC Vectorized (Nested Column Disabled)                           1941           1949          11          8.1         123.4       0.9X
[info] SQL ORC Vectorized (Nested Column Enabled)                             163            170           7         96.5          10.4      11.1X
[info] SQL Parquet MR: DataPageV1                                            1762           1773          15          8.9         112.1       1.0X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)           1958           1963           7          8.0         124.5       0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)              60             69           5        261.0           3.8      30.0X
[info] SQL Parquet MR: DataPageV2                                            1694           1701          10          9.3         107.7       1.1X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)           1861           1867           7          8.5         118.3       1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)             110            113           2        143.5           7.0      16.5X
[info] Running benchmark: SQL Single INT Column Scan in Struct
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3094 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3060 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Enabled)
[info]   Stopped after 11 iterations, 2141 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3591 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3845 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info]   Stopped after 29 iterations, 2031 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3360 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3654 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info]   Stopped after 17 iterations, 2078 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single INT Column Scan in Struct:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR                                                            1482           1547          92         10.6          94.2       1.0X
[info] SQL ORC Vectorized (Nested Column Disabled)                           1528           1530           4         10.3          97.1       1.0X
[info] SQL ORC Vectorized (Nested Column Enabled)                             189            195           7         83.4          12.0       7.9X
[info] SQL Parquet MR: DataPageV1                                            1773           1796          32          8.9         112.7       0.8X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)           1906           1923          23          8.3         121.2       0.8X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)              60             70           7        263.1           3.8      24.8X
[info] SQL Parquet MR: DataPageV2                                            1664           1680          23          9.5         105.8       0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)           1821           1827           9          8.6         115.8       0.8X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)             116            122           4        135.2           7.4      12.7X
[info] Running benchmark: SQL Single BIGINT Column Scan in Struct
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3220 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3203 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Enabled)
[info]   Stopped after 11 iterations, 2110 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3757 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 4039 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info]   Stopped after 14 iterations, 2039 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3222 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3525 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info]   Stopped after 16 iterations, 2063 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single BIGINT Column Scan in Struct:                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR                                                            1606           1610           6          9.8         102.1       1.0X
[info] SQL ORC Vectorized (Nested Column Disabled)                           1593           1602          13          9.9         101.3       1.0X
[info] SQL ORC Vectorized (Nested Column Enabled)                             186            192           5         84.4          11.9       8.6X
[info] SQL Parquet MR: DataPageV1                                            1857           1879          31          8.5         118.1       0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)           2005           2020          21          7.8         127.5       0.8X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)             143            146           5        110.3           9.1      11.3X
[info] SQL Parquet MR: DataPageV2                                            1590           1611          30          9.9         101.1       1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)           1737           1763          36          9.1         110.5       0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)             119            129          11        132.6           7.5      13.5X
[info] Running benchmark: SQL Single FLOAT Column Scan in Struct
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3283 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3325 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Enabled)
[info]   Stopped after 11 iterations, 2123 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3516 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3827 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info]   Stopped after 29 iterations, 2024 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3216 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3558 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info]   Stopped after 29 iterations, 2036 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single FLOAT Column Scan in Struct:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR                                                            1636           1642           9          9.6         104.0       1.0X
[info] SQL ORC Vectorized (Nested Column Disabled)                           1663           1663           0          9.5         105.7       1.0X
[info] SQL ORC Vectorized (Nested Column Enabled)                             189            193           4         83.4          12.0       8.7X
[info] SQL Parquet MR: DataPageV1                                            1733           1758          35          9.1         110.2       0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)           1900           1914          20          8.3         120.8       0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)              63             70           4        249.3           4.0      25.9X
[info] SQL Parquet MR: DataPageV2                                            1596           1608          17          9.9         101.5       1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)           1768           1779          16          8.9         112.4       0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)              60             70           6        262.2           3.8      27.3X
[info] Running benchmark: SQL Single DOUBLE Column Scan in Struct
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3478 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3556 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Enabled)
[info]   Stopped after 5 iterations, 2160 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3677 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 4027 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info]   Stopped after 13 iterations, 2061 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3508 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info]   Stopped after 2 iterations, 3748 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info]   Stopped after 13 iterations, 2043 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single DOUBLE Column Scan in Struct:                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR                                                            1735           1739           7          9.1         110.3       1.0X
[info] SQL ORC Vectorized (Nested Column Disabled)                           1744           1778          49          9.0         110.9       1.0X
[info] SQL ORC Vectorized (Nested Column Enabled)                             424            432           6         37.1          26.9       4.1X
[info] SQL Parquet MR: DataPageV1                                            1834           1839           7          8.6         116.6       0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)           2008           2014           8          7.8         127.7       0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)             151            159           8        103.9           9.6      11.5X
[info] SQL Parquet MR: DataPageV2                                            1749           1754           8          9.0         111.2       1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)           1866           1874          11          8.4         118.6       0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)             148            157          16        106.2           9.4      11.7X
[info] Running benchmark: SQL Nested Column Scan
[info]   Running case: SQL ORC MR
[info]   Stopped after 10 iterations, 63927 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Disabled)
[info]   Stopped after 10 iterations, 63452 ms
[info]   Running case: SQL ORC Vectorized (Nested Column Enabled)
[info]   Stopped after 10 iterations, 24147 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 10 iterations, 39452 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info]   Stopped after 10 iterations, 42079 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info]   Stopped after 10 iterations, 21439 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 10 iterations, 44573 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info]   Stopped after 10 iterations, 46903 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info]   Stopped after 10 iterations, 19188 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Nested Column Scan:                                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR                                                            6284           6393          77          0.2        5992.7       1.0X
[info] SQL ORC Vectorized (Nested Column Disabled)                           6230           6345          80          0.2        5941.3       1.0X
[info] SQL ORC Vectorized (Nested Column Enabled)                            2403           2415          15          0.4        2291.7       2.6X
[info] SQL Parquet MR: DataPageV1                                            3908           3945          51          0.3        3726.8       1.6X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)           4178           4208          23          0.3        3984.3       1.5X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)            2057           2144          88          0.5        1961.9       3.1X
[info] SQL Parquet MR: DataPageV2                                            4415           4457          31          0.2        4210.8       1.4X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)           4658           4690          19          0.2        4442.0       1.3X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)            1845           1919          39          0.6        1759.2       3.4X
[info] Running benchmark: Int and String Scan
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 12518 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 9416 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 3 iterations, 2708 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 2 iterations, 2137 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 4865 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 5053 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 2 iterations, 2212 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 4912 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Int and String Scan:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            6194           6259          92          1.7         590.7       1.0X
[info] SQL Json                                           4696           4708          17          2.2         447.9       1.3X
[info] SQL Parquet Vectorized: DataPageV1                  899            903           7         11.7          85.7       6.9X
[info] SQL Parquet Vectorized: DataPageV2                 1038           1069          44         10.1          99.0       6.0X
[info] SQL Parquet MR: DataPageV1                         2426           2433           9          4.3         231.4       2.6X
[info] SQL Parquet MR: DataPageV2                         2487           2527          56          4.2         237.2       2.5X
[info] SQL ORC Vectorized                                 1096           1106          15          9.6         104.5       5.7X
[info] SQL ORC MR                                         2425           2456          44          4.3         231.2       2.6X
[info] Running benchmark: Repeated String
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 7111 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 5912 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 6 iterations, 2354 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 6 iterations, 2371 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 2417 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 2345 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 9 iterations, 2157 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 2438 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Repeated String:                          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            3541           3556          21          3.0         337.7       1.0X
[info] SQL Json                                           2950           2956           9          3.6         281.3       1.2X
[info] SQL Parquet Vectorized: DataPageV1                  380            392          18         27.6          36.3       9.3X
[info] SQL Parquet Vectorized: DataPageV2                  381            395          16         27.6          36.3       9.3X
[info] SQL Parquet MR: DataPageV1                         1188           1209          29          8.8         113.3       3.0X
[info] SQL Parquet MR: DataPageV2                         1143           1173          42          9.2         109.0       3.1X
[info] SQL ORC Vectorized                                  235            240           7         44.5          22.5      15.0X
[info] SQL ORC MR                                         1204           1219          22          8.7         114.8       2.9X
[info] Running benchmark: Partitioned Table
[info]   Running case: Data column - CSV
[info]   Stopped after 2 iterations, 13313 ms
[info]   Running case: Data column - Json
[info]   Stopped after 2 iterations, 8077 ms
[info]   Running case: Data column - Parquet Vectorized: DataPageV1
[info]   Stopped after 27 iterations, 2074 ms
[info]   Running case: Data column - Parquet Vectorized: DataPageV2
[info]   Stopped after 22 iterations, 2056 ms
[info]   Running case: Data column - Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3593 ms
[info]   Running case: Data column - Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3460 ms
[info]   Running case: Data column - ORC Vectorized
[info]   Stopped after 18 iterations, 2106 ms
[info]   Running case: Data column - ORC MR
[info]   Stopped after 2 iterations, 3226 ms
[info]   Running case: Partition column - CSV
[info]   Stopped after 2 iterations, 3998 ms
[info]   Running case: Partition column - Json
[info]   Stopped after 2 iterations, 7534 ms
[info]   Running case: Partition column - Parquet Vectorized: DataPageV1
[info]   Stopped after 49 iterations, 2022 ms
[info]   Running case: Partition column - Parquet Vectorized: DataPageV2
[info]   Stopped after 74 iterations, 2024 ms
[info]   Running case: Partition column - Parquet MR: DataPageV1
[info]   Stopped after 3 iterations, 2594 ms
[info]   Running case: Partition column - Parquet MR: DataPageV2
[info]   Stopped after 3 iterations, 2601 ms
[info]   Running case: Partition column - ORC Vectorized
[info]   Stopped after 69 iterations, 2032 ms
[info]   Running case: Partition column - ORC MR
[info]   Stopped after 3 iterations, 2215 ms
[info]   Running case: Both columns - CSV
[info]   Stopped after 2 iterations, 14015 ms
[info]   Running case: Both columns - Json
[info]   Stopped after 2 iterations, 8828 ms
[info]   Running case: Both columns - Parquet Vectorized: DataPageV1
[info]   Stopped after 26 iterations, 2013 ms
[info]   Running case: Both columns - Parquet Vectorized: DataPageV2
[info]   Stopped after 20 iterations, 2027 ms
[info]   Running case: Both columns - Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3386 ms
[info]   Running case: Both columns - Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 3123 ms
[info]   Running case: Both columns - ORC Vectorized
[info]   Stopped after 14 iterations, 2039 ms
[info]   Running case: Both columns - ORC MR
[info]   Stopped after 2 iterations, 3456 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Partitioned Table:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------
[info] Data column - CSV                                           6546           6657         157          2.4         416.2       1.0X
[info] Data column - Json                                          4025           4039          20          3.9         255.9       1.6X
[info] Data column - Parquet Vectorized: DataPageV1                  59             77          11        265.8           3.8     110.6X
[info] Data column - Parquet Vectorized: DataPageV2                  81             93          12        193.2           5.2      80.4X
[info] Data column - Parquet MR: DataPageV1                        1795           1797           2          8.8         114.1       3.6X
[info] Data column - Parquet MR: DataPageV2                        1716           1730          20          9.2         109.1       3.8X
[info] Data column - ORC Vectorized                                 112            117           5        140.7           7.1      58.6X
[info] Data column - ORC MR                                        1574           1613          55         10.0         100.1       4.2X
[info] Partition column - CSV                                      1996           1999           5          7.9         126.9       3.3X
[info] Partition column - Json                                     3708           3767          83          4.2         235.8       1.8X
[info] Partition column - Parquet Vectorized: DataPageV1             27             41          17        589.2           1.7     245.2X
[info] Partition column - Parquet Vectorized: DataPageV2             23             27           2        675.0           1.5     280.9X
[info] Partition column - Parquet MR: DataPageV1                    826            865          38         19.0          52.5       7.9X
[info] Partition column - Parquet MR: DataPageV2                    846            867          26         18.6          53.8       7.7X
[info] Partition column - ORC Vectorized                             25             29           4        627.2           1.6     261.0X
[info] Partition column - ORC MR                                    737            739           2         21.3          46.9       8.9X
[info] Both columns - CSV                                          6987           7008          29          2.3         444.2       0.9X
[info] Both columns - Json                                         4397           4414          25          3.6         279.5       1.5X
[info] Both columns - Parquet Vectorized: DataPageV1                 72             77           3        219.9           4.5      91.5X
[info] Both columns - Parquet Vectorized: DataPageV2                 97            101           3        162.0           6.2      67.4X
[info] Both columns - Parquet MR: DataPageV1                       1676           1693          25          9.4         106.5       3.9X
[info] Both columns - Parquet MR: DataPageV2                       1555           1562          10         10.1          98.8       4.2X
[info] Both columns - ORC Vectorized                                140            146           5        112.1           8.9      46.6X
[info] Both columns - ORC MR                                       1646           1728         117          9.6         104.6       4.0X
[info] Running benchmark: String with Nulls Scan (0.0%)
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 8548 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 8036 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 4 iterations, 2112 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 3 iterations, 2202 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3887 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 4609 ms
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 6 iterations, 2017 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 4 iterations, 2370 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 4 iterations, 2078 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 3689 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] String with Nulls Scan (0.0%):            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            4238           4274          51          2.5         404.2       1.0X
[info] SQL Json                                           4007           4018          16          2.6         382.1       1.1X
[info] SQL Parquet Vectorized: DataPageV1                  523            528           6         20.1          49.9       8.1X
[info] SQL Parquet Vectorized: DataPageV2                  728            734           6         14.4          69.5       5.8X
[info] SQL Parquet MR: DataPageV1                         1939           1944           7          5.4         184.9       2.2X
[info] SQL Parquet MR: DataPageV2                         2298           2305          10          4.6         219.1       1.8X
[info] ParquetReader Vectorized: DataPageV1                331            336           3         31.7          31.6      12.8X
[info] ParquetReader Vectorized: DataPageV2                585            593           6         17.9          55.8       7.2X
[info] SQL ORC Vectorized                                  487            520          24         21.5          46.4       8.7X
[info] SQL ORC MR                                         1765           1845         113          5.9         168.3       2.4X
[info] Running benchmark: String with Nulls Scan (50.0%)
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 6797 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 7274 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 5 iterations, 2176 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 4 iterations, 2376 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 3895 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 4081 ms
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 6 iterations, 2235 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 4 iterations, 2097 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 3 iterations, 2099 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 4133 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] String with Nulls Scan (50.0%):           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            3362           3399          52          3.1         320.7       1.0X
[info] SQL Json                                           3608           3637          42          2.9         344.1       0.9X
[info] SQL Parquet Vectorized: DataPageV1                  422            435          20         24.9          40.2       8.0X
[info] SQL Parquet Vectorized: DataPageV2                  580            594          27         18.1          55.3       5.8X
[info] SQL Parquet MR: DataPageV1                         1938           1948          14          5.4         184.8       1.7X
[info] SQL Parquet MR: DataPageV2                         2014           2041          37          5.2         192.1       1.7X
[info] ParquetReader Vectorized: DataPageV1                365            373          13         28.7          34.8       9.2X
[info] ParquetReader Vectorized: DataPageV2                522            524           2         20.1          49.8       6.4X
[info] SQL ORC Vectorized                                  696            700           3         15.1          66.4       4.8X
[info] SQL ORC MR                                         2057           2067          13          5.1         196.2       1.6X
[info] Running benchmark: String with Nulls Scan (95.0%)
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 5129 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 5273 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 16 iterations, 2114 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 15 iterations, 2004 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 2 iterations, 2410 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 2 iterations, 2365 ms
[info]   Running case: ParquetReader Vectorized: DataPageV1
[info]   Stopped after 22 iterations, 2056 ms
[info]   Running case: ParquetReader Vectorized: DataPageV2
[info]   Stopped after 19 iterations, 2062 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 11 iterations, 2153 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 2 iterations, 2660 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] String with Nulls Scan (95.0%):           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            2545           2565          29          4.1         242.7       1.0X
[info] SQL Json                                           2577           2637          84          4.1         245.8       1.0X
[info] SQL Parquet Vectorized: DataPageV1                  116            132          13         90.7          11.0      22.0X
[info] SQL Parquet Vectorized: DataPageV2                  128            134           5         82.2          12.2      20.0X
[info] SQL Parquet MR: DataPageV1                         1188           1205          24          8.8         113.3       2.1X
[info] SQL Parquet MR: DataPageV2                         1141           1183          59          9.2         108.8       2.2X
[info] ParquetReader Vectorized: DataPageV1                 90             93           5        116.8           8.6      28.3X
[info] ParquetReader Vectorized: DataPageV2                106            109           2         99.0          10.1      24.0X
[info] SQL ORC Vectorized                                  188            196           7         55.7          18.0      13.5X
[info] SQL ORC MR                                         1303           1330          39          8.0         124.2       2.0X
[info] Running benchmark: Single Column Scan from 10 columns
[info]   Running case: SQL CSV
[info]   Stopped after 3 iterations, 2123 ms
[info]   Running case: SQL Json
[info]   Stopped after 3 iterations, 2333 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 75 iterations, 2008 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 78 iterations, 2013 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 16 iterations, 2105 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 16 iterations, 2109 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 76 iterations, 2015 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 17 iterations, 2118 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Single Column Scan from 10 columns:       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                             696            708          17          1.5         663.7       1.0X
[info] SQL Json                                            776            778           2          1.4         739.9       0.9X
[info] SQL Parquet Vectorized: DataPageV1                   22             27           5         48.0          20.9      31.8X
[info] SQL Parquet Vectorized: DataPageV2                   23             26           3         45.9          21.8      30.4X
[info] SQL Parquet MR: DataPageV1                          126            132           4          8.3         120.3       5.5X
[info] SQL Parquet MR: DataPageV2                          121            132           7          8.7         115.3       5.8X
[info] SQL ORC Vectorized                                   23             27           3         44.8          22.3      29.7X
[info] SQL ORC MR                                          118            125           3          8.9         113.0       5.9X
[info] 12:17:16.486 WARN org.apache.spark.sql.catalyst.util.SparkStringUtils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
[info] Running benchmark: Single Column Scan from 50 columns
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 2604 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 5403 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 67 iterations, 2016 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 72 iterations, 2011 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 15 iterations, 2116 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 15 iterations, 2002 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 65 iterations, 2008 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 15 iterations, 2014 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Single Column Scan from 50 columns:       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            1300           1302           3          0.8        1240.0       1.0X
[info] SQL Json                                           2669           2702          47          0.4        2545.0       0.5X
[info] SQL Parquet Vectorized: DataPageV1                   24             30           7         44.2          22.6      54.8X
[info] SQL Parquet Vectorized: DataPageV2                   23             28           3         45.0          22.2      55.8X
[info] SQL Parquet MR: DataPageV1                          127            141           9          8.3         120.7      10.3X
[info] SQL Parquet MR: DataPageV2                          131            134           2          8.0         124.5      10.0X
[info] SQL ORC Vectorized                                   26             31           4         39.7          25.2      49.2X
[info] SQL ORC MR                                          131            134           2          8.0         125.2       9.9X
[info] Running benchmark: Single Column Scan from 100 columns
[info]   Running case: SQL CSV
[info]   Stopped after 2 iterations, 4357 ms
[info]   Running case: SQL Json
[info]   Stopped after 2 iterations, 9909 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV1
[info]   Stopped after 52 iterations, 2016 ms
[info]   Running case: SQL Parquet Vectorized: DataPageV2
[info]   Stopped after 57 iterations, 2017 ms
[info]   Running case: SQL Parquet MR: DataPageV1
[info]   Stopped after 14 iterations, 2041 ms
[info]   Running case: SQL Parquet MR: DataPageV2
[info]   Stopped after 15 iterations, 2128 ms
[info]   Running case: SQL ORC Vectorized
[info]   Stopped after 53 iterations, 2033 ms
[info]   Running case: SQL ORC MR
[info]   Stopped after 15 iterations, 2121 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Single Column Scan from 100 columns:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV                                            2169           2179          13          0.5        2069.0       1.0X
[info] SQL Json                                           4811           4955         204          0.2        4587.9       0.5X
[info] SQL Parquet Vectorized: DataPageV1                   30             39           5         34.7          28.8      71.8X
[info] SQL Parquet Vectorized: DataPageV2                   30             35           8         35.1          28.5      72.7X
[info] SQL Parquet MR: DataPageV1                          139            146           5          7.6         132.2      15.7X
[info] SQL Parquet MR: DataPageV2                          130            142           7          8.1         123.8      16.7X
[info] SQL ORC Vectorized                                   34             38           3         31.3          32.0      64.7X
[info] SQL ORC MR                                          138            141           2          7.6         131.5      15.7X
[success] Total time: 2477 s (41:17), completed Jul 1, 2024, 12:20:20 PM

BuiltInDataSourceWriteBenchmark

[info] running (fork) org.apache.spark.sql.execution.benchmark.BuiltInDataSourceWriteBenchmark parquet
[error] WARNING: Using incubator modules: jdk.incubator.vector
[info] 12:27:34.730 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[info] Running benchmark: parquet writer benchmark
[info]   Running case: Output Single Int Column
[info]   Stopped after 2 iterations, 2964 ms
[info]   Running case: Output Single Double Column
[info]   Stopped after 2 iterations, 2910 ms
[info]   Running case: Output Int and String Column
[info]   Stopped after 2 iterations, 5380 ms
[info]   Running case: Output Partitions
[info]   Stopped after 2 iterations, 4753 ms
[info]   Running case: Output Buckets
[info]   Stopped after 2 iterations, 6044 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] parquet writer benchmark:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Output Single Int Column                           1443           1482          55         10.9          91.8       1.0X
[info] Output Single Double Column                        1427           1455          40         11.0          90.7       1.0X
[info] Output Int and String Column                       2686           2690           6          5.9         170.7       0.5X
[info] Output Partitions                                  2368           2377          12          6.6         150.6       0.6X
[info] Output Buckets                                     3010           3022          18          5.2         191.4       0.5X
[success] Total time: 106 s (01:46), completed Jul 1, 2024, 12:28:13 PM

Verdict

I don't see any huge deviations from main. Sometimes this branch is a bit faster, sometimes the the main branch is just a bit faster. Does the deviation look acceptable to you?

@LuciferYang
Copy link
Contributor

We should run the corresponding benchmarks using GitHub Actions and update their results in the pr, both Java 17 and 21

@Fokko
Copy link
Contributor Author

Fokko commented Jul 1, 2024

Kicked them off: https://github.com/Fokko/spark/actions/workflows/benchmark.yml

@Fokko Fokko force-pushed the fd-bump-parquet branch from 03ab2ce to e516f04 Compare July 1, 2024 19:55
@Fokko
Copy link
Contributor Author

Fokko commented Jul 1, 2024

@LuciferYang I've updated the PR. Sorry, I wasn't aware that you'll need to run the benchmarks in the GA. I was assuming that the runners would be too noisy.

ParquetReader Vectorized: DataPageV2 69 71 2 228.4 4.4 1.0X
ParquetReader Vectorized -> Row: DataPageV1 47 48 1 332.4 3.0 1.5X
ParquetReader Vectorized -> Row: DataPageV2 47 48 1 334.0 3.0 1.5X
ParquetReader Vectorized: DataPageV1 93 94 1 169.9 5.9 1.0X
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the result , it appears that there is a slight decrease in throughput in the TINYINT scenario.
cc @dongjoon-hyun @sunchao

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is probably due to other factors. It's fine as long as the relative stay the same?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Although there is a comment from @LuciferYang , given that this is GitHub Action environment, the result looks good overall to me.

@dongjoon-hyun
Copy link
Member

Thank you, @Fokko and all. Let me merge this for Apache Spark 4.0.0-preview2.

@Fokko Fokko deleted the fd-bump-parquet branch July 2, 2024 17:44
@Fokko
Copy link
Contributor Author

Fokko commented Jul 2, 2024

Thanks @dongjoon-hyun for merging, and @LuciferYang and @sunchao for the pointers 👍

@sunchao
Copy link
Member

sunchao commented Jul 2, 2024

The benchmark results look OK to me as well - there is no big deviation from the previous result. Thanks @Fokko for the PR!

attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?

### Why are the changes needed?

Fixes quite a few bugs on the Parquet side: https://github.com/apache/parquet-mr/blob/master/CHANGES.md#version-1140

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Using the existing unit tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#46447 from Fokko/fd-bump-parquet.

Authored-by: Fokko Driesprong <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants