Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Default target schema vs schema #2766

Open
1 task
mjsqu opened this issue Nov 18, 2024 · 1 comment
Open
1 task

bug: Default target schema vs schema #2766

mjsqu opened this issue Nov 18, 2024 · 1 comment
Assignees

Comments

@mjsqu
Copy link
Contributor

mjsqu commented Nov 18, 2024

Singer SDK Version

0.42.1

Is this a regression?

  • Yes

Python Version

3.11

Bug scope

Targets (data type handling, batching, SQL object generation, etc.)

Operating System

No response

Description

While using the MeltanoLabs variant of target-snowflake, I had records coming from SQL Server with a schema as part of the stream ID:

{
  "type": "RECORD",
  "stream": "TAP_SCHEMA-users",
  "time_extracted": "2017-11-20T16:45:33.000Z",
  "record": {
    "id": 0,
    "name": "Chris"
  }
}

i.e. the schema for this record is TAP_SCHEMA

I setup the target snowflake configuration with a different schema:

{
    "schema": "TARGET_SCHEMA"
}

But crucially, I did not set default_target_schema. The following code executes during sink processing and if default_target_schema is not found, then the schema name is derived from the incoming stream IDs.

def schema_name(self) -> str | None:
"""Return the schema name or `None` if using names with no schema part.
Returns:
The target schema name.
"""
# Look for a default_target_scheme in the configuraion fle
default_target_schema: str = self.config.get("default_target_schema", None)
parts = self.stream_name.split("-")
# 1) When default_target_scheme is in the configuration use it
# 2) if the streams are in <schema>-<table> format use the
# stream <schema>
# 3) Return None if you don't find anything
if default_target_schema:
return default_target_schema
return self.conform_name(parts[-2], "schema") if len(parts) in {2, 3} else None

In my scenario the account connecting to Snowflake did not have access to the schema named TAP_SCHEMA (from the original tap metadata messages), so when it went to create a File Format in that location it was not permitted to. I can't recall if it would then try to create a table in that same schema.

Possible fix:

  • schema is likely to be an option present on many SQL targets, so add processing to pick up schema if it has been set

The ambiguity here is a bit difficult, if both schema and default_target_schema are set, then which should be prioritised (probably the latter as it's the existing functionality, leading to less breaking change).

Code

Link to Slack/Linen

https://meltano.slack.com/archives/C069RH0F95F/p1731616264361179

@mjsqu
Copy link
Contributor Author

mjsqu commented Nov 18, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants