Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Illegal Parquet type: INT64 (TIMESTAMP(NANOS,true)) #463

Open
gitperson1980 opened this issue Dec 24, 2022 · 4 comments
Open

Illegal Parquet type: INT64 (TIMESTAMP(NANOS,true)) #463

gitperson1980 opened this issue Dec 24, 2022 · 4 comments

Comments

@gitperson1980
Copy link

gitperson1980 commented Dec 24, 2022

When I upgraded the version of this library in go.mod as follows, I get this error. If I switched back to the commented older version then all is fine. Just curious as to what happened here.

github.com/segmentio/parquet-go v0.0.0-20221214174709-7a0ad59e0540
//github.com/segmentio/parquet-go v0.0.0-20220914222423-67dbe8d21ca5
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 206) (DESKTOP-BB5QDC9 executor driver): org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (TIMESTAMP(NANOS,true))
        at org.apache.spark.sql.errors.QueryCompilationErrors$.illegalParquetTypeError(QueryCompilationErrors.scala:1317)
        at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.illegalType$1(ParquetSchemaConverter.scala:187)
        at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertPrimitiveField$2(ParquetSchemaConverter.scala:260)```
@kevinburkesegment
Copy link
Contributor

Can you provide a little bit more information about what you're doing to produce this error? For example, "I generated Parquet files using the Go library and attempting to read them with Spark produces an error." Is that an accurate summary?

Can you provide information about which version of Spark or Parquet you are using in the other library?

@gitperson1980
Copy link
Author

gitperson1980 commented Dec 29, 2022

I am using spark-3.3.1-bin-hadoop3-scala2.13 on windows to read the file. The error shows up on many version of spark even on BSD and Linux platforms. I used the parquet-go library to create the parquet file.

When using the version below .. there is no problem.
//github.com/segmentio/parquet-go v0.0.0-20220914222423-67dbe8d21ca5

The problem shows up when I switched to:
github.com/segmentio/parquet-go v0.0.0-20221214174709-7a0ad59e0540

I use the following converters for go time.Time.

	err := copier.CopyWithOption(&ip, i, copier.Option{
		IgnoreEmpty: false,
		DeepCopy:    true,
		Converters: []copier.TypeConverter{
			{
				SrcType: time.Time{},
				DstType: int64(0),
				Fn: func(src interface{}) (interface{}, error) {
					return src.(time.Time).Unix(), nil
				},
			},
{
				SrcType: FlagsHistMap{},
				DstType: FlagsHistP{},
		Fn: func(src interface{}) (interface{}, error) {
					opts := copier.Option{
						IgnoreEmpty: false,
						DeepCopy:    true,
						Converters: []copier.TypeConverter{
							{
								SrcType: time.Time{},
								DstType: int64(0),
								Fn: func(src interface{}) (interface{}, error) {
									return src.(time.Time).Unix(), nil
								},
							},
						},
					}

@kevinburkesegment
Copy link
Contributor

That's helpful, thanks!

@kevinburkesegment
Copy link
Contributor

Sorry - can you describe how the converters come into this? You are using github.com/jinzhu/copier to copy from one struct to another and then passing the second struct to parquet-go?

Could you share a bit of the Go struct you are passing to Parquet?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants