-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeltaKernel FFI error on delta_scan with columnMapping = "name" #11
Comments
|
Try to only change the table-name of your path to lower-case, and not the entire path. |
Similar errors:
|
It seems your first path works fine (it finds the table), but it has issues dealing with the complex data types of your delta table. Can you try only selecting a single column (non-complex data type) |
That brings be to a new error: D select bc_RECORD_NBR from delta_scan('file:///tank/delta/test/dbo/branch_codes/') limit 1;
IO Error: Failed to read file "/tank/delta/test/dbo/branch_codes//part-00000-dac9ffff-2d18-41dd-90e2-c00163bba729-c000.snappy.parquet": schema mismatch in glob: column "bc_RECORD_NBR" was read from the original file "/tank/delta/test/TRMSQL/dbo/branch_codes//part-00000-dac9ffff-2d18-41dd-90e2-c00163bba729-c000.snappy.parquet", but could not be found in file "/tank/delta/test/TRMSQL/dbo/branch_codes//part-00000-dac9ffff-2d18-41dd-90e2-c00163bba729-c000.snappy.parquet".
Candidate names: col-a553c046-6a4e-4e34-8717-43c3156f66d2, col-cdcdd91e-4c7a-4636-94ca-12fd6a53be8e, col-830c96c7-57c9-453a-9720-b6443512758b, col-1794286b-4cf9-4d53-97fd-bf51f77e4f56, col-69c5d36c-f892-4d82-8b94-a94f9e160654, col-2eee13ba-777c-4062-988e-5789861285f2, col-362848d8-cad7-4b56-bdad-3f1a333c694a, col-925ac107-7ef4-4f62-8433-3e87a24de19c, col-20e7666d-0fce-45ea-97fe-c388c39a625e, col-ce9e4a98-e7b7-49da-94f4-d093b48a8000, col-cf52f10d-0748-4452-a0c0-75862157342e, col-c2103c5d-fcc5-4799-b908-4f0f69cacaf3, col-1b8177f8-e2f8-4348-ba3f-c7b824a3d019, col-41c065de-0f19-4942-8edd-4aa3b867b027, col-f8c42b5a-a638-4c4c-8160-45e25a2d6548, col-7da4091e-cefb-45a0-be74-2c92572e235d, col-0685b2fc-0be1-4967-9d85-c01e40617594, col-0b1ba16f-7932-4f2d-9797-63c9d391abe9, col-5bea3349-ffc3-4494-8750-7d1dcc4f9fd6, col-c107496d-1277-4592-8be8-fb76ede88ff0, col-ccfa624a-c7d1-438f-b8b0-b4bc15b8d2da, col-678f7e9c-4658-49e2-9c0e-bc88b9900eb8, col-bc1e83e9-0b11-44ce-b20c-4dac1095a9a8, col-4f4146ea-f60e-4e45-a8b7-917888d2fbd0, col-f8dff630-1a04-45bb-b354-1c1043f36fef, file_row_number
If you are trying to read files with different schemas, try setting union_by_name=True Schema as seen by Spark: spark-sql (default)> describe table delta.`/tank/delta/test/dbo/branch_codes`;
bc_RECORD_NBR int
bc_Branch_Code string
bc_Region_Code string
bc_Office_Name string
bc_address string
bc_city string
bc_state string
bc_zip string
bc_phone string
bc_fax string
bc_toll_free string
bc_branch_type string
bc_webname string
bc_TimeZone string
bc_Acting_RVP string
bc_Acting_DM string
bc_payroll_fax string
bc_email string
branch_id int
CostCenter string
SearchArea string
BranchAccountingTypeId int
CC_Rep string
OpenDate timestamp
CloseDate timestamp
Time taken: 0.058 seconds, Fetched 25 row(s) D select bc_address from delta_scan('file:///tank/delta/test/dbo/branch_codes', union_by_name=True) limit 1;
IO Error: Failed to read file "/tank/delta/test/dbo/branch_codes/part-00000-dac9ffff-2d18-41dd-90e2-c00163bba729-c000.snappy.parquet": schema mismatch in glob: column "bc_address" was read from the original file "/tank/delta/test/dbo/branch_codes/part-00000-dac9ffff-2d18-41dd-90e2-c00163bba729-c000.snappy.parquet", but could not be found in file "/tank/delta/test/dbo/branch_codes/part-00000-dac9ffff-2d18-41dd-90e2-c00163bba729-c000.snappy.parquet".
Candidate names: col-a553c046-6a4e-4e34-8717-43c3156f66d2, col-cdcdd91e-4c7a-4636-94ca-12fd6a53be8e, col-830c96c7-57c9-453a-9720-b6443512758b, col-1794286b-4cf9-4d53-97fd-bf51f77e4f56, col-69c5d36c-f892-4d82-8b94-a94f9e160654, col-2eee13ba-777c-4062-988e-5789861285f2, col-362848d8-cad7-4b56-bdad-3f1a333c694a, col-925ac107-7ef4-4f62-8433-3e87a24de19c, col-20e7666d-0fce-45ea-97fe-c388c39a625e, col-ce9e4a98-e7b7-49da-94f4-d093b48a8000, col-cf52f10d-0748-4452-a0c0-75862157342e, col-c2103c5d-fcc5-4799-b908-4f0f69cacaf3, col-1b8177f8-e2f8-4348-ba3f-c7b824a3d019, col-41c065de-0f19-4942-8edd-4aa3b867b027, col-f8c42b5a-a638-4c4c-8160-45e25a2d6548, col-7da4091e-cefb-45a0-be74-2c92572e235d, col-0685b2fc-0be1-4967-9d85-c01e40617594, col-0b1ba16f-7932-4f2d-9797-63c9d391abe9, col-5bea3349-ffc3-4494-8750-7d1dcc4f9fd6, col-c107496d-1277-4592-8be8-fb76ede88ff0, col-ccfa624a-c7d1-438f-b8b0-b4bc15b8d2da, col-678f7e9c-4658-49e2-9c0e-bc88b9900eb8, col-bc1e83e9-0b11-44ce-b20c-4dac1095a9a8, col-4f4146ea-f60e-4e45-a8b7-917888d2fbd0, col-f8dff630-1a04-45bb-b354-1c1043f36fef, file_row_number
If you are trying to read files with different schemas, try setting union_by_name=True |
So I assume |
Yes, but it does not resolve column names correctly, but this may be due to reading Delta tables as Parquet ? The column names below are mapped correctly in D select * from read_parquet('/tank/delta/test/dbo/branch_codes/*.parquet', union_by_name=True) limit 1;
┌──────────────────────┬──────────────────────┬───┬──────────────────────┬──────────────────────┬──────────────────────┐
│ col-14946dbd-f0a4-… │ col-59fc3a4f-b087-… │ … │ col-3701717e-3efe-… │ col-f3a0a96b-ac14-… │ col-7f127cb4-ac58-… │
│ int32 │ varchar │ │ varchar │ timestamp │ timestamp │
├──────────────────────┼──────────────────────┼───┼──────────────────────┼──────────────────────┼──────────────────────┤
│ 151 │ ECOE │ … │ │ 2019-07-02 00:00:00 │ │
├──────────────────────┴──────────────────────┴───┴──────────────────────┴──────────────────────┴──────────────────────┤
│ 1 rows 25 columns (5 shown) │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ |
Yes, so from this I suppose the delta_scan extension (and specifically the delta-kernel-rs dependency) does not yet support the column-mapping: |
That is the issue. If I change to I'm using "name" mode for this reason, so I will have to see if its a requirement for my project:
|
Column mapping is indeed not yet supported, it's on the wish list :) |
|
delta_scan
fails on tables created withspark.conf.set("spark.databricks.delta.properties.defaults.columnMapping.mode", "name")
:Changing path to lowercase as in #7 results in a different error:
Delta tables written with:
The text was updated successfully, but these errors were encountered: