Add ds.write_dataset kwarg overrides to write_deltalake function #1683

someaveragepunter · 2023-09-30T09:53:15Z

Description

allow passing ds.write_dataset kwarg overrides into write_deltalake function

Lines 327 to 344 in fcfd1bf

    
           ds.write_dataset( 
        
               data, 
        
               base_dir="/", 
        
               basename_template=f"{current_version + 1}-{uuid.uuid4()}-{{i}}.parquet", 
        
               format="parquet", 
        
               partitioning=partitioning, 
        
               # It will not accept a schema if using a RBR 
        
               schema=schema if not isinstance(data, RecordBatchReader) else None, 
        
               file_visitor=visitor, 
        
               existing_data_behavior="overwrite_or_ignore", 
        
               file_options=file_options, 
        
               max_open_files=max_open_files, 
        
               max_rows_per_file=max_rows_per_file, 
        
               min_rows_per_group=min_rows_per_group, 
        
               max_rows_per_group=max_rows_per_group, 
        
               filesystem=filesystem, 
        
               max_partitions=max_partitions, 
        
           )

Use Case
my specific use case is to override the basename_template param because at times when I'm bombarding S3 with thousands of concurrent tasks, hitting the parquet file directly without using the txn log to find the files yields performance benefits. hence, explicitly naming the parquet file allows me to statically deterministically specify the filename (as opposed to querying for it at runtime)

Furthermore, this would future proof and expose any additional kwargs / enhancements to the pyarrow datasets api

I'm happy to propose the change and submit PR if this is an acceptable enhancement.

The text was updated successfully, but these errors were encountered:

ion-elgreco · 2024-08-19T20:05:10Z

@someaveragepunter can you elaborate a bit more on the use case? We use UUIDs so that there aren't any file collisions

someaveragepunter added the enhancement New feature or request label Sep 30, 2023

ion-elgreco closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ds.write_dataset kwarg overrides to write_deltalake function #1683

Add ds.write_dataset kwarg overrides to write_deltalake function #1683

someaveragepunter commented Sep 30, 2023

ion-elgreco commented Aug 19, 2024

Add ds.write_dataset kwarg overrides to write_deltalake function #1683

Add ds.write_dataset kwarg overrides to write_deltalake function #1683

Comments

someaveragepunter commented Sep 30, 2023

Description

ion-elgreco commented Aug 19, 2024