-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAP HANA SQLAlchemy destination fails to synchronize #2110
Comments
Exactly, still with dlt 1.5.0. Guess it's a problem of sqlalchemy-hana or even hdbcli. Any dev's do have a more clear picture about the problem? Easy / ugly workaround:
Or easier, just use the final table (at least for version 1.5.0) create table <my_schema>._dlt_loads
(
load_id nvarchar(5000) not null,
schema_name nvarchar(5000),
status int not null,
inserted_at longdate CS_LONGDATE not null,
schema_version_hash nvarchar(5000)
) UNLOAD PRIORITY 5 AUTO merge
;
create table <my_schema>._dlt_pipeline_state
(
version int not null,
engine_version int not null,
pipeline_name nvarchar(5000) not null,
state nvarchar(5000) not null,
created_at longdate CS_LONGDATE not null,
version_hash nvarchar(5000),
_dlt_load_id nvarchar(5000) not null,
_dlt_id nvarchar(128) not null,
unique CPBTREE (_DLT_ID)
) UNLOAD PRIORITY 5 AUTO merge
;
create table <my_schema>._dlt_version
(
version integer CS_INT not null,
engine_version integer CS_INT not null,
inserted_at longdate CS_LONGDATE not null,
schema_name nvarchar(5000) not null,
version_hash nvarchar(5000) not null,
schema nclob MEMORY THRESHOLD 1000 not null
) UNLOAD PRIORITY 5 AUTO merge
; |
This behaviour of the hana driver, will also stop use of pyarrow. |
dlt version
1.4.0
Describe the problem
I'm trying to load data to a SAP HANA database using a SQLAlchemy engine as destination, defining it like so
engine = create_engine('hana+hdbcli://XXXXXX:[email protected]:12345')
. The HANA instance is up and running, and credentials are correct. Source is a simple REST API. This is how I define my pipelineImmediately, an exception is raised
which is expected since at this point, neither the schema nor the metadata tables have been created. This happens in
destinations/impl/sqlalchemy/sqlalchemy_job_client.py
inget_stored_state()
. Same thing happens in_get_stored_schema()
.Surprisingly, this is not the case with a Postgres SQLAlchemy engine where the exception is not thrown, the pipeline runs without errors and data is loaded in the destination table. It is curious though that the code block within the context manager is never reached.
Postgres and HANA SQLAlchemy engines use different drivers underneath (
psycopg2
andhdbcli
respectively) so this might have some relation to the issue. I was able to bypass this by adding a try/except block inget_stored_state()
and_get_stored_schema()
but I'm not sure that this is the proper way to solve it. I haven't tested it with other RDBMS destinations (like MySQL).Expected behavior
No error thrown, like in the Postgres case or proper exception handling for the cases it throws.
Steps to reproduce
Create a SAP HANA and a Postgres SQLAlchemy engine like so
hana_engine = create_engine('hana+hdbcli://XXXXXX:[email protected]:12345')
postgres_engine = create_engine('postgresql+psycopg2://XXXXXX:[email protected]:5432/some_db')
Run this simple pipeline
Operating system
Linux
Runtime environment
Local
Python version
3.11
dlt data source
A simple REST API
dlt destination
SQLAlchemy (SAP HANA & Postgres)
Other deployment details
No response
Additional information
Adding a try/except block solves the issue but there might be a more elegant approach.
The text was updated successfully, but these errors were encountered: