dk-query comments #4529

dkuku · 2024-10-18T21:10:48Z

This is continuation of the work that I started in postgrex repo - I think ecto will better benefit from it. For completnes I think an alternative to this pr may be to provide a callback that will receive the query generated query and allow to modify it with comment, but then we don't know if it could be cached and use as prepared query.

It adds a new field in Ecto.Query that aggregates comments.
The comments are then added to the generated query:
/*comment1*/ /*comment2*/ SELECT s0."x" FROM (/*subquery comment*/ SELECT ss0."x" AS "x" FROM "schema" AS ss0) AS s0
comments can be inspected:

           # foo
           # bar
           #Ecto.Query<from p0 in "posts">

Currently a comment can be a:

compile time string - this does not change and query with only such comments can be cached and a query can be prepared - we are not sacrificing any performance benefits of ecto in this case.
examples: module, function, app name, repo name, job name, host.
interpolated string or a variable - these are not cached and the query should not be prepared (not implemented yet).
examples traces and spans, ip
This may be slower when compiling a query and also require more resources to parse the query in the db.
But not everyone is running db with 80% load. Also traces can be sampled and only a small percent could include the additional data.
I need it for sqlcommenter mostly and libraries in other languages do it similarly, rails has it built in.

What next:

disable prepared queries when the comment is not a compile time string - I'm not sure how to best do that, if it's enough to extend the build_meta functions or if the db libraries also need a new param or maybe a new field in query struct that the query should not be prepared ?
validate for sql injection - currently in postgrex there is a transaction comment that raises when */ is included in comment. I think that it's better to replace any occurence of */ with * / to have any sql injection attempt logged and the data saved.
add changes to ecto_sql to add the comments to actual query
add config option to specify if the comment should be prepended or appended to the query:
by default the comments will be in front but sqlcommenter specification requires comments a the end.
add a default comment comment() configured similar as logger format with access to compile time env struct.
this would allow to add metadata to a composed query from multiple files
format: $line: $module.$function will create a compile time comment that can be cached and prepared and the resulting sql would look like:
/*22: MyApp.QueryBuilder.build*/ /*33: MyApp.QueryBuilder.filter*/ SELECT s0."x" FROM "schema" where z = ? AS s0
add a comment(:sqlcommenter, keyword) that will accept create a sqlcommenter comment or require the use an external library to do the escaping and just accept the comment as string or iolist. The escaping logic is pretty simple and I think it could be included in ecto. This will require to specify in config ecto: comment_position: :append
This may require flushing previous comments or merging them into one - sqlcommenter specification says that there should be only one. But we can for example aggrgate key value pairs and the build it all at the end into a single comment, additional config option may be required.
comment(atom) - when you follow the rules to not create atoms dynamically then we can use atoms from variables similarly as compile time strings and cache queries built from strings with atoms, example :

method = :get
query |> comment("method: #{method}")
or 
query |> comment({:safe, "method: #{method}"})

implement for insert_all and delete_all
implement for other adapters
add TracedRepo module that would be a drop in replacement for Repo with macros and calling TracedRepo.all(query) would use the work done here and inject comments with static metadata. Again - I'm not sure if it should be in ecto or just in ecto documentation as an example ?
documentation - I wan't some feedback first before I do that.

josevalim · 2024-10-18T21:44:52Z

Thank you @dkuku. Looking at the impact this has in the codebase, I am thinking the PG route may be a better route after all. My thinking was that we would only add static comments but if we want both, for example to later add stacktrace information, I say the work on PostgreSQL is a better fit (and can play nicely with stuff like the default_options callback from Ecto). The work here definitely comes at a higher cost in terms of code and abstractions if we want to add all of these features.

dkuku · 2024-10-18T22:05:49Z

My concern in postgrex was when I started disabling the prepared statements. I started reading about it and there are clear benefits of it. If someone would like to only add static comments to track where a query originates from, then there is no need to disable it. In this case the pr there can be even simpler and just include documentation that you should not add any data that changes.

josevalim · 2024-10-19T07:31:17Z

I understand that concern but can you provide examples of where the comment would be static and be based on the place the query was written and not where it was executed, i.e. the repository?

dkuku · 2024-10-21T19:15:53Z

You're right – most of the options can be added when we call the Repo and pass them as parameters. I created this PR mainly to retain the ability to use prepared queries with comments, especially when the comment includes, for example, only the app name executing the DB call. The traces and spans are a nice addition, especially since sqlcommenter has become a widely supported standard.

My main use case is to identify where an expensive query originates. This type of comment can be cached in ETS, just like an existing query, which is why I thought Ecto might be a better fit. However, I’m happy to revert to the Postgrex pr.

josevalim · 2024-10-21T19:22:45Z

I think it is good to have explored both approaches. I think starting with PG is the best route at the moment :) Thank you ❤️

Schultzer · 2024-10-21T19:47:12Z

You're right – most of the options can be added when we call the Repo and pass them as parameters. I created this PR mainly to retain the ability to use prepared queries with comments, especially when the comment includes, for example, only the app name executing the DB call. The traces and spans are a nice addition, especially since sqlcommenter has become a widely supported standard.

My main use case is to identify where an expensive query originates. This type of comment can be cached in ETS, just like an existing query, which is why I thought Ecto might be a better fit. However, I’m happy to revert to the Postgrex pr.

Out of curiosity, what is lacking currently from telemetry events for you to determine where a query originated from?

dkuku · 2024-10-21T20:16:13Z

Out of curiosity, what is lacking currently from telemetry events for you to determine where a query originated from?

You need to forward telemetry events to, for example, Datadog. The default Postgres integration does not understand telemetry but does support sqlcommenter. Is there a way to integrate it with telemetry here?

Currently, a single database can be used by multiple frameworks. In the simplest scenario, you might migrate from Rails to Elixir, resulting in queries from two different frameworks appearing in the Postgres logs.

Multiple Elixir microservices can also execute queries on the same database (e.g., when you extract part of a microservice into a separate one).

Additionally, a single context module can be executed from multiple places (REST, GraphQL, gRPC, event consumers, Oban jobs, and periodic tasks).

In large applications, you might have all of the above scenarios. In case something goes wrong, the more data you have for debugging, the better.

Schultzer · 2024-10-21T21:51:26Z

Out of curiosity, what is lacking currently from telemetry events for you to determine where a query originated from?

You need to forward telemetry events to, for example, Datadog. The default Postgres integration does not understand telemetry but does support sqlcommenter. Is there a way to integrate it with telemetry here?

Currently, a single database can be used by multiple frameworks. In the simplest scenario, you might migrate from Rails to Elixir, resulting in queries from two different frameworks appearing in the Postgres logs.

Multiple Elixir microservices can also execute queries on the same database (e.g., when you extract part of a microservice into a separate one).

Additionally, a single context module can be executed from multiple places (REST, GraphQL, gRPC, event consumers, Oban jobs, and periodic tasks).

In large applications, you might have all of the above scenarios. In case something goes wrong, the more data you have for debugging, the better.

I agree, currently working on a tool for that, to help with database observability, which leverages telemetry events and store them as time series in a automatically partitioned table based on which Ecto adapter is being used. I even reconcile dependencies so when clicking on the stacktrace it takes you to the repository. Anyway feel free to reach out if this is something that interest you, you can find my email in my github profile.

aloukissas · 2024-12-18T19:43:00Z

To give some more context: having this functionality is required to enable DB monitoring on Datadog to link queries back to Ecto Otel spans (this does not work today). In fact, this is exactly how they do it in e.g. their nodejs library.

Schultzer · 2024-12-18T19:55:40Z

To give some more context: having this functionality is required to enable DB monitoring on Datadog to link queries back to Ecto Otel spans (this does not work today). In fact, this is exactly how they do it in e.g. their nodejs library.

How is telemetry not sufficient in this case?

You can sent Ecto telemetry events to datadog or even export them to otel.

aloukissas · 2024-12-18T20:18:06Z

To give some more context: having this functionality is required to enable DB monitoring on Datadog to link queries back to Ecto Otel spans (this does not work today). In fact, this is exactly how they do it in e.g. their nodejs library.

How is telemetry not sufficient in this case?

You can sent Ecto telemetry events to datadog or even export them to otel.

It does not work. You need the span_id and other metadata in the statement sent to the DB. All the APM libs from Datadog do this. Think of it like this: w/o explicit span_id, the system can only guess (by matching the statement and the tiemstamp).

tsloughter · 2024-12-18T20:22:26Z

@Schultzer telemetry events don't correlate with telemetry from the database itself. It won't tell you that your query is running a sequential scan.

Passing the context (OpenTelemetry specifically) enables this capability and is why sqlcommenter exists and why I think it is important for ecto and postgrex to support this.

dkuku · 2024-12-18T20:29:09Z

@aloukissas For spans you can follow this guide:
https://dev.to/dkuku/sql-commenter-with-postgrex-2bfd
And look at the source of this library:
https://github.com/dkuku/opentelemetry_sqlcommenter
this pr is more to add comments that are static like the codeowner or app name that executed for shared databases and connection poolers.

tsloughter · 2024-12-18T20:34:27Z

@dkuku oh snap :), interested in getting that up into https://github.com/open-telemetry/opentelemetry-erlang-contrib

dkuku · 2024-12-18T20:38:07Z

@tsloughter we can do that, I can also transfer the package ownership.

tsloughter · 2024-12-18T20:40:27Z

@dkuku Cool. Are you on slack? Lets leave Ecto repo's PR comments :)

dk-query comments

86c1a34

dkuku mentioned this pull request Oct 18, 2024

dk-query comments elixir-ecto/ecto_sql#645

Closed

josevalim closed this Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dk-query comments #4529

dk-query comments #4529

dkuku commented Oct 18, 2024 •

edited

Loading

josevalim commented Oct 18, 2024

dkuku commented Oct 18, 2024

josevalim commented Oct 19, 2024

dkuku commented Oct 21, 2024

josevalim commented Oct 21, 2024

Schultzer commented Oct 21, 2024

dkuku commented Oct 21, 2024 •

edited

Loading

Schultzer commented Oct 21, 2024

aloukissas commented Dec 18, 2024

Schultzer commented Dec 18, 2024

aloukissas commented Dec 18, 2024 •

edited

Loading

tsloughter commented Dec 18, 2024

dkuku commented Dec 18, 2024

tsloughter commented Dec 18, 2024

dkuku commented Dec 18, 2024

tsloughter commented Dec 18, 2024

dk-query comments #4529

dk-query comments #4529

Conversation

dkuku commented Oct 18, 2024 • edited Loading

josevalim commented Oct 18, 2024

dkuku commented Oct 18, 2024

josevalim commented Oct 19, 2024

dkuku commented Oct 21, 2024

josevalim commented Oct 21, 2024

Schultzer commented Oct 21, 2024

dkuku commented Oct 21, 2024 • edited Loading

Schultzer commented Oct 21, 2024

aloukissas commented Dec 18, 2024

Schultzer commented Dec 18, 2024

aloukissas commented Dec 18, 2024 • edited Loading

tsloughter commented Dec 18, 2024

dkuku commented Dec 18, 2024

tsloughter commented Dec 18, 2024

dkuku commented Dec 18, 2024

tsloughter commented Dec 18, 2024

dkuku commented Oct 18, 2024 •

edited

Loading

dkuku commented Oct 21, 2024 •

edited

Loading

aloukissas commented Dec 18, 2024 •

edited

Loading