Configure Compression Settings #18

robertoarnetoli · 2023-06-11T17:18:40Z

I have tried target-s3 and I was able to change prefix and stream_name_path_override but inside the S3 folder I get a compressed ".json.gz". Instead, I need a file called "data.json" without compression.

crowemi · 2023-06-12T15:00:36Z

Unfortunately this is hardcoded ATM - we need to modify this section to be dynamic. We should do this by adding a compression configuration node (maybe under the format node, or perhaps its own node 🤔)

Thinking big picture, we want to satisfy the following requirements:

Enable/disable compression
Support multiple compressions types (e.g. {‘NONE’, ‘SNAPPY’, ‘GZIP’, ‘BROTLI’, ‘LZ4’, ‘ZSTD’})

robertoarnetoli · 2023-06-12T15:11:35Z

Thank you @crowemi for the fast response.
As for the ".json.gz" file is there a way to add a name to the file like "data.json.gz". At the moment is literally ".json.gz" without name

crowemi · 2023-06-12T15:28:09Z

The problem is here, doesn't look like we're handling any cases where those two config elements aren't set -- if it meets your reqs, you should be able to add a file name by setting the append_date_to_filename in your config here.

robertoarnetoli · 2023-06-12T15:36:52Z

ok. I was looking for a static filename rather than a timestamp, but thank you anyway.

rstml · 2023-11-23T11:36:54Z

+1 for this.

Current default option (gzip) isn't he best option for Parquet files:

Based on our tests, gzip decompression is very slow (< 100MB/s), making queries decompression bound. Snappy can decompress at ~ 500MB/s on a single core.

https://issues.apache.org/jira/browse/SPARK-14482

kronnk · 2024-07-17T08:53:01Z

Hi, i would like to work on this issue. Since there is no contribution guidelines, is there anything i should pay attention to ?
Thanks!

crowemi changed the title ~~Ability to name the actual file (not just the stream) and to turn off .gz compression~~ Configure Compression Settings Jun 12, 2023

crowemi mentioned this issue Jun 22, 2023

Empty file name when append_date_to_filename=False #19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configure Compression Settings #18

Configure Compression Settings #18

robertoarnetoli commented Jun 11, 2023

crowemi commented Jun 12, 2023 •

edited

Loading

robertoarnetoli commented Jun 12, 2023

crowemi commented Jun 12, 2023

robertoarnetoli commented Jun 12, 2023

rstml commented Nov 23, 2023

kronnk commented Jul 17, 2024

Configure Compression Settings #18

Configure Compression Settings #18

Comments

robertoarnetoli commented Jun 11, 2023

crowemi commented Jun 12, 2023 • edited Loading

robertoarnetoli commented Jun 12, 2023

crowemi commented Jun 12, 2023

robertoarnetoli commented Jun 12, 2023

rstml commented Nov 23, 2023

kronnk commented Jul 17, 2024

crowemi commented Jun 12, 2023 •

edited

Loading