Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: DuckDB as Iceberg Catalog - Unable to drop_table #1182

Open
1 task done
WobblyRobbly opened this issue Dec 20, 2024 · 0 comments
Open
1 task done

[Question]: DuckDB as Iceberg Catalog - Unable to drop_table #1182

WobblyRobbly opened this issue Dec 20, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@WobblyRobbly
Copy link

WobblyRobbly commented Dec 20, 2024

What happened?

I'm not sure which component requires a fix pyiceberg, duckdb_engine or duckdb so I started in the middle !!

I tried using duckdb as an Iceberg Catalog running via pyiceberg

Python setup

[tool.poetry.dependencies]
python = "^3.12"
pyarrow = "^18.1.0"
duckdb = "^1.1.3"
pyiceberg = "^0.8.1"
sqlalchemy = "^2.0.36"
duckdb-engine = "^0.14.0"

with the following code if I have an empty catalog then a record is inserted to the catalog successfully

    warehouse_path = "."
    catalog = SqlCatalog(
        "default",
        **{
            "uri": f"duckdb:///{warehouse_path}/pyiceberg_catalog.duckdb",
            "warehouse": f"file://{warehouse_path}",
        },
    )

    namespaces = catalog.list_namespaces()
    if len(namespaces):
        for ns in namespaces:
            if 'wibble' not in ns:
                catalog.create_namespace('wibble')
    else:
        catalog.create_namespace('wibble')


    table_identifier = ("wibble", "test_table")

    if catalog.table_exists(table_identifier):
        catalog.drop_table(table_identifier)

    metadata_location = "files/cdc/database_latest_schema/metadata/v2.metadata.json"
    catalog.register_table(identifier=table_identifier, metadata_location=metadata_location)

On the second execution this fails when it tries to drop_table with

Traceback (most recent call last):
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
    self.dialect.do_execute(
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 941, in do_execute
    cursor.execute(statement, parameters)
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/duckdb_engine/__init__.py", line 150, in execute
    self.__c.execute(statement, parameters)
duckdb.duckdb.ParserException: Parser Error: SELECT locking clause is not supported!

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/wobblyrobbly/src/local/read_parquet/src/read_iceberg.py", line 88, in <module>
    main()
  File "/Users/wobblyrobbly/src/local/read_parquet/src/read_iceberg.py", line 63, in main
    catalog.drop_table(table_identifier)
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/pyiceberg/catalog/sql.py", line 337, in drop_table
    .one()
     ^^^^^
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/sqlalchemy/orm/query.py", line 2778, in one
    return self._iter().one()  # type: ignore
           ^^^^^^^^^^^^
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/sqlalchemy/orm/query.py", line 2827, in _iter
    result: Union[ScalarResult[_T], Result[_T]] = self.session.execute(
                                                  ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2362, in execute
    return self._execute_internal(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2247, in _execute_internal
    result: Result[Any] = compile_state_cls.orm_execute_statement(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/sqlalchemy/orm/context.py", line 305, in orm_execute_statement
    result = conn.execute(
             ^^^^^^^^^^^^^
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1418, in execute
    return meth(
           ^^^^^
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", line 515, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1640, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context
    return self._exec_single_context(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1986, in _exec_single_context
    self._handle_dbapi_exception(
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2355, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
    self.dialect.do_execute(
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 941, in do_execute
    cursor.execute(statement, parameters)
  File "/Users/wobblyrobbly/src/local/read_parquet/.venv/lib/python3.12/site-packages/duckdb_engine/__init__.py", line 150, in execute
    self.__c.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (duckdb.duckdb.ParserException) Parser Error: SELECT locking clause is not supported!
[SQL: SELECT iceberg_tables.catalog_name AS iceberg_tables_catalog_name, iceberg_tables.table_namespace AS iceberg_tables_table_namespace, iceberg_tables.table_name AS iceberg_tables_table_name, iceberg_tables.metadata_location AS iceberg_tables_metadata_location, iceberg_tables.previous_metadata_location AS iceberg_tables_previous_metadata_location 
FROM iceberg_tables 
WHERE iceberg_tables.catalog_name = $1 AND iceberg_tables.table_namespace = $2 AND iceberg_tables.table_name = $3 FOR UPDATE OF iceberg_tables]
[parameters: ('default', 'wibble', 'test_table')]
(Background on this error at: https://sqlalche.me/e/20/f405)

If I look at the pyiceberg code I can see it is looking at the supports_sane_rowcount to determine whether rowcount is available

        with Session(self.engine) as session:
            if self.engine.dialect.supports_sane_rowcount:
                res = session.execute(
                    delete(IcebergTables).where(
                        IcebergTables.catalog_name == self.name,
                        IcebergTables.table_namespace == namespace,
                        IcebergTables.table_name == table_name,
                    )
                )
                if res.rowcount < 1:
                    raise NoSuchTableError(f"Table does not exist: {namespace}.{table_name}")
            else:
                try:
                    tbl = (
                        session.query(IcebergTables)
                        .with_for_update(of=IcebergTables)
                        .filter(
                            IcebergTables.catalog_name == self.name,
                            IcebergTables.table_namespace == namespace,
                            IcebergTables.table_name == table_name,
                        )
                        .one()
                    )
                    session.delete(tbl)
                except NoResultFound as e:
                    raise NoSuchTableError(f"Table does not exist: {namespace}.{table_name}") from e
            session.commit()

As duckdb_engine sets supports_sane_rowcount it then tries to create a select statement with FOR UPDATE - which isn't supported by duckdb

If I set supports_sane_rowcount True in duckdb_engine then the record is not deleted from the catalog when

                res = session.execute(
                    delete(IcebergTables).where(
                        IcebergTables.catalog_name == self.name,
                        IcebergTables.table_namespace == namespace,
                        IcebergTables.table_name == table_name,
                    )
                )
                if res.rowcount < 1:
                    raise NoSuchTableError(f"Table does not exist: {namespace}.{table_name}")

is executed . duckdb returns -1 for rowcount

Any thoughts or guidance on where this should be fixed ??

DuckDB Engine Version

0.14.0

DuckDB Version

1.1.3

SQLAlchemy Version

2.0.36

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@WobblyRobbly WobblyRobbly added the bug Something isn't working label Dec 20, 2024
@WobblyRobbly WobblyRobbly changed the title [Bug]: DuckDB as Iceberg Catalog - Unable to drop_table [Question]: DuckDB as Iceberg Catalog - Unable to drop_table Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants