Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] file format is orc use zstd occur: ZstdException: Data corruption detected #5045

Open
1 of 2 tasks
jerry-024 opened this issue Feb 10, 2025 · 0 comments
Open
1 of 2 tasks
Labels
bug Something isn't working

Comments

@jerry-024
Copy link
Contributor

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

detail:

Caused by: com.github.luben.zstd.ZstdException: Data corruption detected
	at com.github.luben.zstd.ZstdDecompressCtx.decompressByteArray(ZstdDecompressCtx.java:205)
	at com.github.luben.zstd.Zstd.decompressByteArray(Zstd.java:439)
	at org.apache.paimon.shade.org.apache.orc.impl.ZstdCodec.decompress(ZstdCodec.java:259)
	at org.apache.paimon.shade.org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:521)
	at org.apache.paimon.shade.org.apache.orc.impl.InStream$CompressedStream.ensureUncompressed(InStream.java:548)
	at org.apache.paimon.shade.org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:535)
	at org.apache.paimon.shade.org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:2060)
	at org.apache.paimon.shade.org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:2079)
	at org.apache.paimon.shade.org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:2177)
	at org.apache.paimon.shade.org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:2009)
	at org.apache.paimon.shade.org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2633)
	at org.apache.paimon.shade.org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65)
	at org.apache.paimon.shade.org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100)
	at org.apache.paimon.shade.org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77)
	at org.apache.paimon.shade.org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1579)
	at org.apache.paimon.format.orc.OrcReaderFactory.nextBatch(OrcReaderFactory.java:322)
	at org.apache.paimon.format.orc.OrcReaderFactory.access$100(OrcReaderFactory.java:66)
	at org.apache.paimon.format.orc.OrcReaderFactory$OrcVectorizedReader.readBatch(OrcReaderFactory.java:235)
	at org.apache.paimon.format.orc.OrcReaderFactory$OrcVectorizedReader.readBatch(OrcReaderFactory.java:217)
	at org.apache.paimon.reader.RecordReaderIterator.<init>(RecordReaderIterator.java:37)

Compute Engine

Flink

Minimal reproduce step

Current we don't know how reproduce this problem. Read the file find the problem is in header.

What doesn't meet your expectations?

If anyone meet this problem could give more context.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@jerry-024 jerry-024 added the bug Something isn't working label Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant