add support for using Delta table name in create_pytorch_dataloader #20

yinxi-db · 2023-05-31T03:58:04Z

For users saving the delta table in metastore, it is more convenient to use table_name to reference the data than the path argument of create_pytorch_dataloader

create_pytorch_dataloader(
        # Path to the DeltaLake table
        path,
        # Autoincrement ID field
        id_field="id",
        # Fields which will be used during training
        fields=[
            FieldSpec("image",
                      # Load image using Pillow
                      load_image_using_pil=True, 
                      # PyTorch Transform
                      transform=transform),
            FieldSpec("label"),
        ],
        # Number of readers 
        num_workers=2,
        # Shuffle data inside the record batches
        shuffle=True,
        # Batch size        
        batch_size=batch_size,
    )

The text was updated successfully, but these errors were encountered:

krishnakalyan3 · 2023-11-28T13:21:14Z

The main issue here is that, consider that I have a csv file in S3. I need to do the following
CSV -> Delta Lake -> Delta Lake on DBFS

When table name is passed

TableNotFoundError: No snapshot or version 0 found, perhaps /Workspace/Users/... is an empty dir?

When S3 path is passed:

OSError: Generic S3 error: Missing region

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for using Delta table name in create_pytorch_dataloader #20

add support for using Delta table name in create_pytorch_dataloader #20

yinxi-db commented May 31, 2023

krishnakalyan3 commented Nov 28, 2023

add support for using Delta table name in create_pytorch_dataloader #20

add support for using Delta table name in create_pytorch_dataloader #20

Comments

yinxi-db commented May 31, 2023

krishnakalyan3 commented Nov 28, 2023