Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the case-sensitive comparison on the seed name #436

Merged

Conversation

stephanetrou
Copy link
Contributor

resolves #429

Description

To reproduce the Bug create a seed file named MySeed.csv in your seed folder with the following content :

adapter,version
glue,1.8.1

Run 2 times : dbt seed

The first time the table did not exist in glue metastore, so the command pass.

❯ dbt seed
14:37:32  Running with dbt=1.8.6
14:37:33  Registered adapter: glue=1.8.1
14:37:33  Found 1 seed, 879 macros
14:37:33
14:37:34  Concurrency: 1 threads (target='dev-glue')
14:37:34
14:37:34  1 of 1 START seed file dbt.MySeed .............................................. [RUN]
14:37:51  1 of 1 OK loaded seed file dbt.MySeed .......................................... [CREATE 1 in 16.85s]
14:37:51
14:37:51  Finished running 1 seed in 0 hours 0 minutes and 17.59 seconds (17.59s).
14:37:51
14:37:51  Completed successfully
14:37:51
14:37:51  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

The second time the command did not pass

❯ dbt seed
14:37:58  Running with dbt=1.8.6
14:37:58  Registered adapter: glue=1.8.1
14:37:58  Found 1 seed, 879 macros
14:37:58
14:37:59  Concurrency: 1 threads (target='dev-glue')
14:37:59
14:37:59  1 of 1 START seed file dbt.MySeed .............................................. [RUN]
14:38:03  Glue adapter: Glue returned `error` for statement None for code

csv = [{"adapter": "glue", "version": "1.8.1"}]
df = spark.createDataFrame(csv)
table_name = 'dbt.MySeed'
if (spark.sql("show tables in dbt").where("tableName == 'MySeed'").count() > 0):
    df.write        .mode("overwrite")        .format("parquet")        .insertInto(table_name, overwrite=True)
else:
    df.write        .option("path", "s3://dbt-glue-test-xxxxxxxxxxxx/simple-test/dbt/MySeed")        .format("parquet")        .saveAsTable(table_name)
SqlWrapper2.execute("""select * from dbt.MySeed limit 1""")
, AnalysisException: Table `dbt`.`MySeed` already exists.
14:38:03  1 of 1 ERROR loading seed file dbt.MySeed ...................................... [ERROR in 3.59s]
14:38:03
14:38:03  Finished running 1 seed in 0 hours 0 minutes and 4.26 seconds (4.26s).
14:38:03
14:38:03  Completed with 1 error and 0 warnings:
14:38:03
14:38:03    Database Error in seed MySeed (seeds/MySeed.csv)
  GlueCreateCsvFailed
14:38:03
14:38:03  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

I find in the Athena documentation (sorry not in Glue documentation) Database, table, and column name requirements

Acceptable characters for database names, table names, and column names in AWS Glue must be a UTF-8 string and should be in lower case.

To solve this issue I added lower function in the where statement ("tableName == lower('MySeed')").

dbt seed
14:44:46  Running with dbt=1.8.6
14:44:46  Registered adapter: glue=1.8.1
14:44:47  Found 1 seed, 879 macros
14:44:47
14:44:48  Concurrency: 1 threads (target='dev-glue')
14:44:48
14:44:48  1 of 1 START seed file dbt.MySeed .............................................. [RUN]
14:44:53  1 of 1 OK loaded seed file dbt.MySeed .......................................... [CREATE 1 in 5.74s]
14:44:53
14:44:53  Finished running 1 seed in 0 hours 0 minutes and 6.51 seconds (6.51s).
14:44:53
14:44:53  Completed successfully
14:44:53
14:44:53  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
❯ dbt seed
15:02:45  Running with dbt=1.8.6
15:02:45  Registered adapter: glue=1.8.1
15:02:46  Found 1 seed, 879 macros
15:02:46
15:03:04  Concurrency: 1 threads (target='dev-glue')
15:03:04
15:03:04  1 of 1 START seed file dbt.MySeed .............................................. [RUN]
15:04:38  1 of 1 OK loaded seed file dbt.MySeed .......................................... [CREATE 1 in 93.38s]
15:04:38
15:04:38  Finished running 1 seed in 0 hours 1 minutes and 51.96 seconds (111.96s).
15:04:38
15:04:38  Completed successfully
15:04:38
15:04:38  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-glue next" section.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@sanga8
Copy link
Contributor

sanga8 commented Sep 13, 2024

Nice, working on my side.
@moomindani could you please take a look when you have some time ? thanks!

@moomindani moomindani added the enable-functional-tests This label enable functional tests label Sep 13, 2024
@stephanetrou stephanetrou force-pushed the fix/glue_tablename_case_sensitive branch from afd1b3a to 63930d8 Compare September 16, 2024 07:39
@stephanetrou
Copy link
Contributor Author

Conflict on CHANGELOG.md fixed.

@moomindani moomindani merged commit 1569ef1 into aws-samples:main Sep 16, 2024
17 checks passed
@moomindani
Copy link
Collaborator

Thank you for your contribution :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beginning-contributor enable-functional-tests This label enable functional tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dbt seed failing on second run
3 participants