Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation feedback on /docs/integrate/etl/influxdb.md #86

Closed
amotl opened this issue May 29, 2024 · 17 comments
Closed

Documentation feedback on /docs/integrate/etl/influxdb.md #86

amotl opened this issue May 29, 2024 · 17 comments
Assignees

Comments

@amotl
Copy link
Member

amotl commented May 29, 2024

Documentation feedback


This document uses a Docker-based setup for conveniency reasons, which is excellent.
Other than this, it would be nice to get a rough idea how to run the procedure in a Cloud-to-Cloud scenario.

@matkuliak
Copy link
Contributor

matkuliak commented May 29, 2024

Hi. Trying the Cloud-to-Cloud migration now.

Based on https://github.com/daq-tools/influxio/blob/main/influxio/core.py I guess something like this should work:

ctk load table \
  influxdb2://{{org}}:{{token}}@eu-central-1-1.aws.cloud2.influxdata.com/testdrive/demo \
  --cratedb-sqlalchemy-url="crate://admin:{{password}}@purple-shaak-ti.eks1.eu-west-1.aws.cratedb.net:4200/testdrive/demo"

Getting:

-05-29 11:54:23,533 [cratedb_toolkit.io.influxdb         ] INFO    : Running InfluxDB copy
2024-05-29 11:54:23,536 [influxio.core                       ] INFO    : Copying from http://Testing:T268DVLDHD8AJsjzOEluu7TO5nGG6NAca15ksfTa6PqJMWBkBC5haJy7wcWyxsmoo9y_zpk8Gbns9PcCoPic4A==@eu-central-1-1.aws.cloud2.influxdata.com/testdrive/demo to crate://admin:dZ,Y18*Z0vlQ-69hk(7)[email protected]:4200/testdrive/demo
2024-05-29 11:54:23,536 [influxio.adapter                    ] INFO    : SQLAlchemy DB URI: crate://admin:dZ,Y18*Z0vlQ-69hk(7)[email protected]:4200/?schema=testdrive
2024-05-29 11:54:23,548 [influxio.adapter                    ] INFO    : Loading dataframes into RDBMS/SQL database using pandas/Dask
Traceback (most recent call last):
  File "/usr/local/bin/ctk", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/cratedb_toolkit/io/cli.py", line 85, in load_table
    return cluster.load_table(resource=resource, target=target)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/cratedb_toolkit/api/main.py", line 113, in load_table
    if not influxdb_copy(source_url, target_url, progress=True):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/cratedb_toolkit/io/influxdb.py", line 16, in influxdb_copy
    copy(source_url, target_url, progress=progress)
  File "/usr/local/lib/python3.11/site-packages/influxio/core.py", line 90, in copy
    sink.write(source_node)
  File "/usr/local/lib/python3.11/site-packages/influxio/adapter.py", line 298, in write
    for df in source.read_df():
  File "/usr/local/lib/python3.11/site-packages/influxio/adapter.py", line 68, in read_df
    for df in self.client.query_api().query_data_frame_stream(query=query):
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxdb_client/client/query_api.py", line 299, in query_data_frame_stream
    response = self._query_api.post_query(org=org, query=self._create_query(query, self.default_dialect, params,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxdb_client/service/query_service.py", line 285, in post_query
    (data) = self.post_query_with_http_info(**kwargs)  # noqa: E501
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxdb_client/service/query_service.py", line 311, in post_query_with_http_info
    return self.api_client.call_api(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxdb_client/_sync/api_client.py", line 343, in call_api
    return self.__call_api(resource_path, method,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxdb_client/_sync/api_client.py", line 173, in __call_api
    response_data = self.request(
                    ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxdb_client/_sync/api_client.py", line 388, in request
    return self.rest_client.POST(url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxdb_client/_sync/rest.py", line 311, in POST
    return self.request("POST", url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxdb_client/_sync/rest.py", line 261, in request
    raise ApiException(http_resp=r)
influxdb_client.rest.ApiException: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Date': 'Wed, 29 May 2024 11:54:24 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Content-Length': '55', 'Connection': 'keep-alive', 'trace-id': '1ca97c50b347910b', 'trace-sampled': 'false', 'x-platform-error-code': 'unauthorized', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'X-Influxdb-Request-ID': '933dccb545e17d4cb36c995bda523618', 'X-Influxdb-Build': 'Cloud'})
HTTP response body: b'{"code":"unauthorized","message":"unauthorized access"}'

For some other variations also:

Traceback (most recent call last):
  File "/usr/local/bin/ctk", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/cratedb_toolkit/io/cli.py", line 85, in load_table
    return cluster.load_table(resource=resource, target=target)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/cratedb_toolkit/api/main.py", line 125, in load_table
    raise NotImplementedError("Importing resource not implemented yet")
NotImplementedError: Importing resource not implemented yet

@amotl
Copy link
Member Author

amotl commented May 29, 2024

Getting an error:

Analyzing the log output, the CrateDB connectivity information looks reasonable.

  1. You may want to cross-check if the process already works when using an InfluxDB instance on your workstation and shuffle data to a CrateDB Cloud instance?

  2. That error outlines that the program is not able to connect or authenticate to InfluxDB Cloud, right?

    {"code":"unauthorized","message":"unauthorized access"}
    

    Maybe the reason is that the program is already connecting to InfluxDB 3, while it expects to connect to InfluxDB 2? Is there any choice to start or run an InfluxDB 2 instance instead?

For some other variations also:
NotImplementedError: Importing resource not implemented yet

Yeah, I mean, that's a hard error. Please create dedicated tickets for missing features, including a detailed repro how ctk load table has been invoked. Thanks!

@matkuliak
Copy link
Contributor

You may want to cross-check if the process already works when using an InfluxDB instance on your workstation

Will do that

Is there any choice to start or run an InfluxDB 2 instance instead?

I don't believe so, there's not much customization possible, at least on my free version. Signup->region->deployed.

Will create corresponding tickets, thanks.

@amotl
Copy link
Member Author

amotl commented May 29, 2024

{"code":"unauthorized","message":"unauthorized access"}

It looks like this is a regular error, not specifically related to InfluxDB2/3, so the root cause is probably on our side.

https://community.influxdata.com/search?q=unauthorized%20access

@amotl
Copy link
Member Author

amotl commented May 29, 2024

Using influxio, this pair of commands appear to work. Can you confirm that?

influxio copy \
  "https://9faaa869a91a3bbb:U379DVLDHD8AJsjzOEluu7TO5nGG6NAca15ksfTa6PqJMWBkBC5haJy7wcWyxsmoo9y_zpk8Gbns9PcLaLa4A==@eu-central-1-1.aws.cloud2.influxdata.com/testdrive/demo" \
  "sqlite:///export.sqlite?table=demo"
sqlite3 -bail export.sqlite -cmd 'SELECT * FROM demo;' -cmd '.quit'

NB: Organization ID and authentication token have been amended, so they are invalid. Please use valid ones instead.

@amotl
Copy link
Member Author

amotl commented May 29, 2024

If that procedure works, and also succeeds on a relevant ctk load table command, let's add a corresponding example to a dedicated section in this document, elaborating about how to use the program together with InfluxDB Cloud and CrateDB Cloud.

@matkuliak
Copy link
Contributor

Hi. I am unable to reproduce this. Could you share how it should look with the correct URIs?

@matkuliak
Copy link
Contributor

matkuliak commented May 29, 2024

Trying something like this (credentials edited), adapted from your suggestion:

18:12:34 ~ $ docker run --rm --network=host ghcr.io/daq-tools/influxio \
   influxio copy \
   "https://9fafc8123ra43406:T268DVLDHD8AJsjzOEasdsgdr23Ta6PqJMWBkBC5haJy7wcWyxsmoo9y_zpk8Gbns9PcCoPic4A==@eu-central-1-1.aws.cloud2.influxdata.com/testdrive/demo" \
   "crate://admin:dZ,Y11230vlQ-69hkasd67)@purple-shaak-ti.eks1.eu-west-1.aws.cratedb.net:4200/testdrive/demo"

2024-05-29 16:12:36,019 [influxio.core           ] INFO    : Copying from https://9fafc869a91a3406:T268DVLDHD8AJsjzOEluu7TO5nGG6NAca15ksfTa6PqJMWBkBC5haJy7wcWyxsmoo9y_zpk8Gbns9PcCoPic4A==@eu-central-1-1.aws.cloud2.influxdata.com/testdrive/demo to crate://admin:dZ,Y18*Z0vlQ-69hk(7)[email protected]:4200/testdrive/demo
2024-05-29 16:12:36,019 [influxio.adapter        ] INFO    : SQLAlchemy DB URI: crate://admin:dZ,Y18*Z0vlQ-69hk(7)[email protected]:4200/?schema=testdrive
2024-05-29 16:12:36,029 [influxio.adapter        ] INFO    : Loading dataframes into RDBMS/SQL database using pandas/Dask
[                                        ] | 0% Completed | 200.64 ms2024-05-29 16:12:36,678 [crate.client.http       ] WARNING : Removed server http://purple-shaak-ti.eks1.eu-west-1.aws.cratedb.net:4200 from active pool
[                                        ] | 0% Completed | 300.80 ms
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 790, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 536, in _make_request
    response = conn.getresponse()
               ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 461, in getresponse
    httplib_response = super().getresponse()
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/http/client.py", line 1395, in getresponse
    response.begin()
  File "/usr/local/lib/python3.11/http/client.py", line 325, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/http/client.py", line 294, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/crate/client/http.py", line 533, in _request
    response = self.server_pool[next_server].request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/crate/client/http.py", line 164, in request
    return self.pool.urlopen(
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 844, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/retry.py", line 470, in increment
    raise reraise(type(error), error, _stacktrace)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/util.py", line 38, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 790, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 536, in _make_request
    response = conn.getresponse()
               ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 461, in getresponse
    httplib_response = super().getresponse()
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/http/client.py", line 1395, in getresponse
    response.begin()
  File "/usr/local/lib/python3.11/http/client.py", line 325, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/http/client.py", line 294, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1971, in _exec_single_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 919, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.11/site-packages/crate/client/cursor.py", line 58, in execute
    self._result = self.connection.client.sql(sql, parameters,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/crate/client/http.py", line 456, in sql
    content = self._json_request('POST', self.path, data=data)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/crate/client/http.py", line 582, in _json_request
    response = self._request(method, path, data=data)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/crate/client/http.py", line 573, in _request
    self._drop_server(next_server, ex_message)
  File "/usr/local/lib/python3.11/site-packages/crate/client/http.py", line 641, in _drop_server
    raise ConnectionError(
crate.client.exceptions.ConnectionError: No more Servers available, exception from last server: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/influxio", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxio/cli.py", line 151, in copy
    influxio.core.copy(source, target)
  File "/usr/local/lib/python3.11/site-packages/influxio/core.py", line 90, in copy
    sink.write(source_node)
  File "/usr/local/lib/python3.11/site-packages/influxio/adapter.py", line 299, in write
    dataframe_to_sql(df, dburi=self.dburi, tablename=self.table, progress=self.progress)
  File "/usr/local/lib/python3.11/site-packages/influxio/io.py", line 136, in dataframe_to_sql
    return ddf.to_sql(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dask_expr/_collection.py", line 2290, in to_sql
    return to_sql(
           ^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dask_expr/io/sql.py", line 94, in to_sql
    return _to_sql(
           ^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dask/dataframe/io/sql.py", line 606, in to_sql
    dask_compute(result)
  File "/usr/local/lib/python3.11/site-packages/dask/base.py", line 661, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dask/dataframe/io/sql.py", line 423, in _to_sql_chunk
    q = d.to_sql(con=engine, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/util/_decorators.py", line 333, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/io/sql.py", line 842, in to_sql
    return pandas_sql.to_sql(
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/io/sql.py", line 2008, in to_sql
    table = self.prep_table(
            ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/io/sql.py", line 1912, in prep_table
    table.create()
  File "/usr/local/lib/python3.11/site-packages/pandas/io/sql.py", line 984, in create
    if self.exists():
       ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/io/sql.py", line 970, in exists
    return self.pd_sql.has_table(self.name, self.schema)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/io/sql.py", line 2041, in has_table
    return insp.has_table(name, schema or self.meta.schema)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/reflection.py", line 429, in has_table
    return self.dialect.has_table(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/crate/client/sqlalchemy/dialect.py", line 247, in has_table
    return table_name in self.get_table_names(connection, schema=schema, **kw)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/cratedb_toolkit/sqlalchemy/patch.py", line 32, in get_table_names
    return get_table_names_dist(self, connection=connection, schema=schema, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 2, in get_table_names
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/reflection.py", line 97, in cache
    ret = fn(self, con, *args, **kw)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/crate/client/sqlalchemy/dialect.py", line 260, in get_table_names
    cursor = connection.exec_driver_sql(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1783, in exec_driver_sql
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1850, in _execute_context
    return self._exec_single_context(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1990, in _exec_single_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2357, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1971, in _exec_single_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 919, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.11/site-packages/crate/client/cursor.py", line 58, in execute
    self._result = self.connection.client.sql(sql, parameters,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/crate/client/http.py", line 456, in sql
    content = self._json_request('POST', self.path, data=data)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/crate/client/http.py", line 582, in _json_request
    response = self._request(method, path, data=data)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/crate/client/http.py", line 573, in _request
    self._drop_server(next_server, ex_message)
  File "/usr/local/lib/python3.11/site-packages/crate/client/http.py", line 641, in _drop_server
    raise ConnectionError(
sqlalchemy.exc.OperationalError: (crate.client.exceptions.ConnectionError) No more Servers available, exception from last server: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
[SQL: SELECT table_name FROM information_schema.tables WHERE table_schema = ? AND table_type = 'BASE TABLE' ORDER BY table_name ASC, table_schema ASC]
[parameters: ('testdrive',)]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

Last line of the error: https://docs.sqlalchemy.org/en/20/errors.html#error-e3q8

@amotl
Copy link
Member Author

amotl commented May 30, 2024

New versions of relevant packages have been released, including a corresponding fix.
-- https://github.com/daq-tools/influxio/releases/tag/v0.2.1
-- https://github.com/crate-workbench/cratedb-toolkit/releases/tag/v0.0.11

You can update to the most recent release, using one of those commands.

pip install --upgrade 'cratedb-toolkit[influxdb]'
docker pull ghcr.io/crate-workbench/cratedb-toolkit

Please confirm if that remedies the problem, and, if so, please add a corresponding section about cloud-to-cloud use to the documentation.

See also the corresponding Export from Cloud to Cloud documentation of influxio how we present that case concisely by shortening access tokens significantly, in order to reduce line length, and by that, improving readability.

@matkuliak
Copy link
Contributor

matkuliak commented May 30, 2024

Hi again. Confirming that influxio now works for me as well, thank you. But I am unable to reproduce it with ctk load table still. Sharing some logs:

$ ctk --version
ctk, version 0.0.11

$ ctk load table \
  "https://9org6:[email protected]/testdrive/demo" \
  --cratedb-sqlalchemy-url="crate://admin:[email protected]:4200/testdrive/demo?ssl=true"

Traceback (most recent call last):
  File "/usr/local/bin/ctk", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/cratedb_toolkit/io/cli.py", line 85, in load_table
    return cluster.load_table(resource=resource, target=target)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/cratedb_toolkit/api/main.py", line 125, in load_table
    raise NotImplementedError("Importing resource not implemented yet")
NotImplementedError: Importing resource not implemented yet

@amotl
Copy link
Member Author

amotl commented May 30, 2024

Analysis

ctk load table "https://...."

It might surprise you because you've also worked with influxio, but ctk load table can't just import from arbitrary HTTP URLs and pretend they would be InfluxDB-like. Instead, just follow the documentation and understand that the influxdb2:// identifier must be used. The differences are subtle, and I admit there is room for confusion, but for now, it's a fact:

  • Because influxio is a polyglot framework, but on one side of the pipeline always connects to InfluxDB, it exclusively uses the http:// or https:// for designating InfluxDB endpoint URLs, also not to confuse regular users of InfluxDB too much.
  • ctk load table, on the other hand, is an application focused on CrateDB, is polyglot on a different angle, because the target pipeline element is always CrateDB, but the source pipeline element needs to accept different types of sources. In this spirit, it uses custom HTTP schema identifiers, like influxdb2://, to tell data source types apart. As such, the syntax deviates from how you describe a pipeline with influxio.

Possible UX Improvements

Thoughts I

Saying this, there may be room for improvements. For example, influxio could be more lenient, and also accept the influxdb2:// notation. Unfortunately, we can't make it happen the other way round. However, we may also completely abstain from things like that, in order not to create further confusion for users of both tools, who may raise the question why one application is more lenient than the other.

Thoughts II

NotImplementedError: Importing resource not implemented yet

Indeed, the error message could be improved to better convey the application does not understand the http:// protocol, and not just raise an unspecific error message, which leaves much room for interpretation.

Going beyond improving the error message to better convey the reason, we may think about adding additional hints to it, like

You are using the http:// protocol, which is not supported. You need to specify a more specific protocol, like influxdb2://.

@matkuliak
Copy link
Contributor

matkuliak commented May 30, 2024

Ah. That's it, and my bad from deviating from using the right endpoint. But as you said, that kind of error message will be helpful if implemented.

Output looks better now, however I'm still having and issue even when using influxdb2://:

(credentials are correct, confirmed with influxio)

21:46:40 ~ $ ctk load table \
  "influxdb2://9**6:q**[email protected]/testdrive/demo" \
  --cratedb-sqlalchemy-url="crate://admin:d**[email protected]:4200/testdrive/demo?ssl=true"
2024-05-30 19:46:41,774 [cratedb_toolkit.io.influxdb         ] INFO    : Running InfluxDB copy
2024-05-30 19:46:41,777 [influxio.core                       ] INFO    : Copying from http://9**6:q**[email protected]/testdrive/demo to crate://admin:d**[email protected]:4200/testdrive/demo?ssl=true
2024-05-30 19:46:41,777 [influxio.adapter                    ] INFO    : SQLAlchemy DB URI: crate://admin:d**[email protected]:4200/?schema=testdrive&ssl=true
2024-05-30 19:46:41,787 [influxio.adapter                    ] INFO    : Loading dataframes into RDBMS/SQL database using pandas/Dask
Traceback (most recent call last):
  File "/usr/local/bin/ctk", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/cratedb_toolkit/io/cli.py", line 85, in load_table
    return cluster.load_table(resource=resource, target=target)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/cratedb_toolkit/api/main.py", line 113, in load_table
    if not influxdb_copy(source_url, target_url, progress=True):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/cratedb_toolkit/io/influxdb.py", line 16, in influxdb_copy
    copy(source_url, target_url, progress=progress)
  File "/usr/local/lib/python3.11/site-packages/influxio/core.py", line 90, in copy
    sink.write(source_node)
  File "/usr/local/lib/python3.11/site-packages/influxio/adapter.py", line 303, in write
    for df in source.read_df():
  File "/usr/local/lib/python3.11/site-packages/influxio/adapter.py", line 68, in read_df
    for df in self.client.query_api().query_data_frame_stream(query=query):
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxdb_client/client/query_api.py", line 299, in query_data_frame_stream
    response = self._query_api.post_query(org=org, query=self._create_query(query, self.default_dialect, params,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxdb_client/service/query_service.py", line 285, in post_query
    (data) = self.post_query_with_http_info(**kwargs)  # noqa: E501
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxdb_client/service/query_service.py", line 311, in post_query_with_http_info
    return self.api_client.call_api(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxdb_client/_sync/api_client.py", line 343, in call_api
    return self.__call_api(resource_path, method,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxdb_client/_sync/api_client.py", line 173, in __call_api
    response_data = self.request(
                    ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxdb_client/_sync/api_client.py", line 388, in request
    return self.rest_client.POST(url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxdb_client/_sync/rest.py", line 311, in POST
    return self.request("POST", url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/influxdb_client/_sync/rest.py", line 261, in request
    raise ApiException(http_resp=r)
influxdb_client.rest.ApiException: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Date': 'Thu, 30 May 2024 19:46:42 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Content-Length': '55', 'Connection': 'keep-alive', 'trace-id': 'd8250db52be15047', 'trace-sampled': 'false', 'x-platform-error-code': 'unauthorized', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'X-Influxdb-Request-ID': 'f3bf3f3dea2f91959addf5b74b084f92', 'X-Influxdb-Build': 'Cloud'})
HTTP response body: b'{"code":"unauthorized","message":"unauthorized access"}'

@amotl
Copy link
Member Author

amotl commented May 30, 2024

Observations

21:46:40 ~ $ ctk load table \
  "influxdb2://9**6:q**[email protected]/testdrive/demo" \
  --cratedb-sqlalchemy-url="crate://admin:d**[email protected]:4200/testdrive/demo?ssl=true"
2024-05-30 19:46:41,774 [cratedb_toolkit.io.influxdb         ] INFO    : Running InfluxDB copy
2024-05-30 19:46:41,777 [influxio.core                       ] INFO    : Copying from http://9**6:q**[email protected]/testdrive/demo to crate://admin:d**[email protected]:4200/testdrive/demo?ssl=true
2024-05-30 19:46:41,777 [influxio.adapter                    ] INFO    : SQLAlchemy DB URI: crate://admin:d**[email protected]:4200/?schema=testdrive&ssl=true

In this log output, it appears to me that influxdb2://9**6:q**[email protected]/testdrive/demo gets translated into http://9**6:q**[email protected]/testdrive/demo.

Evaluation

It is probably the wrong choice, because it says http://, while it should use https:// instead? On this spot, it is likely that ctk load table will need a corresponding fix. So, I think it will be fine to straightly classify that as a bug.

Thoughts

Because the synthetic protocol identifier is agnostic of plain vs. ssl comms, we will probably also need to synthesize the "I want encryption" flag through a corresponding ssl=true query parameter, in the same way like SQLAlchemy needs to do it. That's the price to pay when optimizing on a different aspect instead.

While we could introduce things like influxdb2:// vs. influxdb2s://, I am not gravitating too much to that, yet, most prominently because SQLAlchemy also did not take that choice. wdyt, including everyone who is still following...?

Assessment

The flaw is in this section, StandaloneCluster.load_table.

source_url = source_url.replace("influxdb2://", "http://")

@amotl
Copy link
Member Author

amotl commented May 30, 2024

CrateDB Toolkit v0.0.12, released just now, may resolve that problem.

Please note the ssl=true query parameter at the end of both database connection URLs.

-- https://cratedb-toolkit.readthedocs.io/io/influxdb/loader.html#cloud

@amotl
Copy link
Member Author

amotl commented May 30, 2024

@matkuliak confirmed it works well now. Thank you.

@amotl
Copy link
Member Author

amotl commented May 30, 2024

As an aftermath, will you add a corresponding section about cloud-to-cloud copying to this document, @matkuliak? I think it will be fine to use that miniature section as a blueprint, but make it a bit more verbose/rich, by linking to canonical InfluxDB Cloud and CrateDB Cloud pages, and/or elaborating about how to sign up, or such.

-- https://cratedb-toolkit.readthedocs.io/io/influxdb/loader.html#cloud

@amotl
Copy link
Member Author

amotl commented May 30, 2024

@amotl amotl closed this as completed May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants