Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatible with the bucket parameter for influxdb line protocol #3413

Closed
nicecui opened this issue Feb 29, 2024 · 20 comments
Closed

Compatible with the bucket parameter for influxdb line protocol #3413

nicecui opened this issue Feb 29, 2024 · 20 comments
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@nicecui
Copy link
Collaborator

nicecui commented Feb 29, 2024

What type of enhancement is this?

API improvement

What does the enhancement do?

The influxdb line protocol v2 and v3 use the bucket parameter to indicate the database name, for example:

curl --request POST \
"https://url?org=YOUR_ORG&bucket=YOUR_BUCKET&precision=ns" \

GreptimeDB uses the db parameter to indicate the database name:

curl 'http://localhost:4000/v1/influxdb/api/v2/write?db=public'

When using the client libs of InfluxDB, users cannot change the bucket parameter to db parameter.
Does Greptime plan to support the bucket parameter to indicate the database?

Implementation challenges

No response

@killme2008 killme2008 added good first issue Good for newcomers help wanted Extra attention is needed labels Feb 29, 2024
@tisonkun

This comment was marked as outdated.

@tisonkun
Copy link
Collaborator

tisonkun commented Mar 2, 2024

@etolbakov
Copy link
Collaborator

@tisonkun if this is not a super urgent one (say, I have a week or two) I'd like giving it a shot

@tisonkun
Copy link
Collaborator

tisonkun commented Mar 2, 2024

@etolbakov Thanks! It should not be urgent. Feel free to send a patch and ping me as the reviewer :D

@etolbakov etolbakov self-assigned this Mar 2, 2024
@etolbakov
Copy link
Collaborator

etolbakov commented Mar 2, 2024

@tisonkun
I had a quick look at the influxdb docs you've provided and Greptime code and would like to confirm if I'm not missing anything:
There are 2 functions that support write to influxdb:

  • influxdb_write_v1 that has a route /write and query parameter db
  • influxdb_write_v2 that has a route /api/v2/write and query parameter bucket

I understand that the current implementation doesn't have any issues and matches the influxdb API.

@nicecui
Could you please tell me where does the request curl 'http://localhost:4000/v1/influxdb/api/v2/write?db=public' come from?
I have a feeling that this API endpoint is used in an incorrect way but works because "public" is the default name schema name and this API endpoint should be used with ?bucket=public.

Out of curiosity I tried providing a different name for the schema, please find the results below:

http://localhost:4000/v1/influxdb/api/v2/write?db=evttest
at 13:35:38 ❯ curl -v --location --request POST 'http://localhost:4000/v1/influxdb/api/v2/write?db=evttest' \
 --header 'Content-Type: application/json' \
 --data-raw 'datum,host=host2 cpu=1.4 1664370459457010101'
*   Trying 127.0.0.1:4000...
* Connected to localhost (127.0.0.1) port 4000 (#0)
> POST /v1/influxdb/api/v2/write?db=evttest HTTP/1.1
> Host: localhost:4000
> User-Agent: curl/8.0.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 44
>
< HTTP/1.1 500 Internal Server Error
< content-type: application/json
< content-length: 55
< date: Sat, 02 Mar 2024 13:36:24 GMT
<
* Connection #0 to host localhost left intact
{"error":"Failed to find schema, schema info: evttest"}%
http://localhost:4000/v1/influxdb/api/v2/write?bucket=evttest
curl -v --location --request POST 'http://localhost:4000/v1/influxdb/api/v2/write?bucket=evttest' \
--header 'Content-Type: application/json' \
--data-raw 'datum,host=host3 cpu=1.5 1664370459457010102'
*   Trying 127.0.0.1:4000...
* Connected to localhost (127.0.0.1) port 4000 (#0)
> POST /v1/influxdb/api/v2/write?bucket=evttest HTTP/1.1
> Host: localhost:4000
> User-Agent: curl/8.0.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 44
>
< HTTP/1.1 204 No Content
< date: Sat, 02 Mar 2024 13:35:38 GMT

Please let me know what do you think?

@tisonkun
Copy link
Collaborator

tisonkun commented Mar 2, 2024

@etolbakov Thanks a lot for your time to investigate this. I agree that the implementation seems correct.

@nicecui This seems some incorrect docs issue? I share the question -

Could you please tell me where does the request curl 'http://localhost:4000/v1/influxdb/api/v2/write?db=public' come from?


but works because "public" is the default name schema name and this API endpoint should be used with ?bucket=public.

@etolbakov have you checked if InfluxDB has this default schema name manner? Or it returns an error if bucket unspecified.

@etolbakov
Copy link
Collaborator

etolbakov commented Mar 2, 2024

Looks like for the db query parameter, influxdb will create a default bucket with 3-day retention policy if it's not provided.
https://docs.influxdata.com/influxdb/cloud/api/v1-compatibility/#tag/Write

As for the bucket query parameter it's not stated the same (so my understanding - it is not created and return error):
https://docs.influxdata.com/influxdb/v2/api/#operation/PostWrite

UPD: it returns bucket not found.

I've downloaded the influxdb docker image and tried the /api/v2/write/, based on what I see the bucket and org query parameters are required:
http://localhost:8086/api/v2/write?bucket=sandbox&org=MyOrgName
gave me status 204 no content
while http://localhost:8086/api/v2/write
status 404 and Body:

{
    "code": "not found",
    "message": "bucket not found"
}

@nicecui
Copy link
Collaborator Author

nicecui commented Mar 4, 2024

@etolbakov Thank you for your interest in this issue!

We need the /api/v2/write API to support the bucket parameter.
The expected API behavior:

  • If there is no parameter db or bucket, the default database public should be used.
  • Use the value of the db parameter if exists.
  • Use the value of the bucket parameter if there is no db perameter.
  • If the database specified does not exist, throw error Database not found.

@nicecui
Copy link
Collaborator Author

nicecui commented Mar 4, 2024

I have a feeling that this API endpoint is used in an incorrect way but works because "public" is the default name schema name and this API endpoint should be used with ?bucket=public.

Here is a bug: If the database specified by the db parameter does not exist, the public database is used.
The correct behavior is to throw an error. We think there is no users using this bug as a feature 😜, let's fix it now.

@etolbakov
Copy link
Collaborator

Thanks for explaining the desired behaviour @nicecui.

We need the /api/v2/write API to support the bucket parameter.

just to confirm, this endpoint supports the bucket parameter (and ignores the db).

I'm happy to give this ticket a go but I'd like to have a confirmation from Greptime team that it's fine to go ahead and proceed with the proposed logic(@tisonkun could you please help me with that?).

In my opinion the client code should decided which endpoint (/write or /api/v2/write to invoke) based on the provided query parameter (db for the former and bucket for the latter).

@nicecui
Copy link
Collaborator Author

nicecui commented Mar 5, 2024

@etolbakov The /influxdb/write API is compatible with influxdb protocol v1 which also uses the db parameter, so we don't need to do anything with this API.
The influxdb/api/v2/write API is compatible with influxdb protocol v2, it needs to support the db and the bucket parameters as discussed above.

@tisonkun
Copy link
Collaborator

tisonkun commented Mar 5, 2024

I'm happy to give this ticket a go but I'd like to have a confirmation from Greptime team that it's fine to go ahead and proceed with the proposed logic

Confirmed that #3413 (comment) is reasonable. cc @etolbakov

@tisonkun
Copy link
Collaborator

tisonkun commented Mar 5, 2024

the client code

Maybe. But the related code snippet here is about the server side. I don't find influx client code in our repository ...

@tisonkun
Copy link
Collaborator

tisonkun commented Mar 6, 2024

Something related:

p = Point("my_measurement").tag("location", "Prague").field("temperature", 25.3)
bucket = 'xxx-public'
client = InfluxDBClient(url="https://yyy.greptime.cloud/v1/influxdb/", token="username:password", org="my-org")
write_api = client.write_api(write_options=SYNCHRONOUS)
write_api.write(bucket=bucket, record=p)

This would fail with:

Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Date': 'Wed, 06 Mar 2024 02:13:03 GMT', 'Content-Type': 'application/json', 'Content-Length': '125', 'Connection': 'keep-alive', 'x-greptime-err-code': '7005', 'x-greptime-err-msg': "Access denied for user 'uuu' to database 'greptime-public'", 'x-greptime-format': 'influxdb_v1', 'x-greptime-execution-time': '0', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Methods': 'OPTIONS', 'Access-Control-Allow-Headers': 'DNT,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization', 'Access-Control-Max-Age': '1728000'})
HTTP response body: {"code":7005,"error":"Access denied for user 'uuu' to database 'greptime-public'","execution_time_ms":0}

It can be a separate issue but still relevant (bucket param doesn't switch the database). I'm digging into it but post here for your information.

@tisonkun
Copy link
Collaborator

tisonkun commented Mar 6, 2024

Located. It's related to this issue and the real related code for this issue is:

pub fn extract_catalog_and_schema<B>(request: &Request<B>) -> (&str, &str) {
// parse database from header
let dbname = request
.headers()
.get(GreptimeDbName::name())
// eat this invalid ascii error and give user the final IllegalParam error
.and_then(|header| header.to_str().ok())
.or_else(|| {
let query = request.uri().query().unwrap_or_default();
extract_db_from_query(query)
})
.unwrap_or(DEFAULT_SCHEMA_NAME);
parse_catalog_and_schema_from_db_string(dbname)
}

@etolbakov
Copy link
Collaborator

etolbakov commented Mar 6, 2024

Thanks @tisonkun!
I've actually discovered that issue hard way before than found your comment 😅. So in my influxdb_write_v2 I pass the additional header

        .header(http::header::AUTHORIZATION, "token greptime:greptime")
        .header(GREPTIME_DB_HEADER_NAME, "influxdb")

I'll try to come up with the PR shortly so we can have a discussion (how/where that should be addressed)

@etolbakov
Copy link
Collaborator

@nicecui just to let you know the fix has been merged. Please let us know once you have a moment to test it (assume you need to wait for new release)?
Let's leave the ticket opened for now.

@nicecui
Copy link
Collaborator Author

nicecui commented Mar 11, 2024

@etolbakov Thank you very much! I have tested it from the cargo build version and it behaves as expected. I think this issued has been solved. Also thanks to @tisonkun

@tisonkun
Copy link
Collaborator

Closed as resolved.

Thanks for @nicecui reporting and @etolbakov fixing!

@etolbakov
Copy link
Collaborator

Great news @nicecui! I'm glad that it works!
Thanks @tisonkun for your support during my work on this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants