Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dk-query comments #4529

Closed
wants to merge 1 commit into from
Closed

Conversation

dkuku
Copy link
Contributor

@dkuku dkuku commented Oct 18, 2024

This is continuation of the work that I started in postgrex repo - I think ecto will better benefit from it. For completnes I think an alternative to this pr may be to provide a callback that will receive the query generated query and allow to modify it with comment, but then we don't know if it could be cached and use as prepared query.

It adds a new field in Ecto.Query that aggregates comments.
The comments are then added to the generated query:
/*comment1*/ /*comment2*/ SELECT s0."x" FROM (/*subquery comment*/ SELECT ss0."x" AS "x" FROM "schema" AS ss0) AS s0
comments can be inspected:

           # foo
           # bar
           #Ecto.Query<from p0 in "posts">

Currently a comment can be a:

  • compile time string - this does not change and query with only such comments can be cached and a query can be prepared - we are not sacrificing any performance benefits of ecto in this case.
    examples: module, function, app name, repo name, job name, host.
  • interpolated string or a variable - these are not cached and the query should not be prepared (not implemented yet).
    examples traces and spans, ip
    This may be slower when compiling a query and also require more resources to parse the query in the db.
    But not everyone is running db with 80% load. Also traces can be sampled and only a small percent could include the additional data.
    I need it for sqlcommenter mostly and libraries in other languages do it similarly, rails has it built in.

What next:

  • disable prepared queries when the comment is not a compile time string - I'm not sure how to best do that, if it's enough to extend the build_meta functions or if the db libraries also need a new param or maybe a new field in query struct that the query should not be prepared ?
  • validate for sql injection - currently in postgrex there is a transaction comment that raises when */ is included in comment. I think that it's better to replace any occurence of */ with * / to have any sql injection attempt logged and the data saved.
  • add changes to ecto_sql to add the comments to actual query
  • add config option to specify if the comment should be prepended or appended to the query:
    by default the comments will be in front but sqlcommenter specification requires comments a the end.
  • add a default comment comment() configured similar as logger format with access to compile time env struct.
    this would allow to add metadata to a composed query from multiple files
    format: $line: $module.$function will create a compile time comment that can be cached and prepared and the resulting sql would look like:
    /*22: MyApp.QueryBuilder.build*/ /*33: MyApp.QueryBuilder.filter*/ SELECT s0."x" FROM "schema" where z = ? AS s0
  • add a comment(:sqlcommenter, keyword) that will accept create a sqlcommenter comment or require the use an external library to do the escaping and just accept the comment as string or iolist. The escaping logic is pretty simple and I think it could be included in ecto. This will require to specify in config ecto: comment_position: :append
    This may require flushing previous comments or merging them into one - sqlcommenter specification says that there should be only one. But we can for example aggrgate key value pairs and the build it all at the end into a single comment, additional config option may be required.
  • comment(atom) - when you follow the rules to not create atoms dynamically then we can use atoms from variables similarly as compile time strings and cache queries built from strings with atoms, example :
method = :get
query |> comment("method: #{method}")
or 
query |> comment({:safe, "method: #{method}"})
  • implement for insert_all and delete_all
  • implement for other adapters
  • add TracedRepo module that would be a drop in replacement for Repo with macros and calling TracedRepo.all(query) would use the work done here and inject comments with static metadata. Again - I'm not sure if it should be in ecto or just in ecto documentation as an example ?
  • documentation - I wan't some feedback first before I do that.

@josevalim
Copy link
Member

Thank you @dkuku. Looking at the impact this has in the codebase, I am thinking the PG route may be a better route after all. My thinking was that we would only add static comments but if we want both, for example to later add stacktrace information, I say the work on PostgreSQL is a better fit (and can play nicely with stuff like the default_options callback from Ecto). The work here definitely comes at a higher cost in terms of code and abstractions if we want to add all of these features.

@dkuku
Copy link
Contributor Author

dkuku commented Oct 18, 2024

My concern in postgrex was when I started disabling the prepared statements. I started reading about it and there are clear benefits of it. If someone would like to only add static comments to track where a query originates from, then there is no need to disable it. In this case the pr there can be even simpler and just include documentation that you should not add any data that changes.

@josevalim
Copy link
Member

I understand that concern but can you provide examples of where the comment would be static and be based on the place the query was written and not where it was executed, i.e. the repository?

@dkuku
Copy link
Contributor Author

dkuku commented Oct 21, 2024

You're right – most of the options can be added when we call the Repo and pass them as parameters. I created this PR mainly to retain the ability to use prepared queries with comments, especially when the comment includes, for example, only the app name executing the DB call. The traces and spans are a nice addition, especially since sqlcommenter has become a widely supported standard.

My main use case is to identify where an expensive query originates. This type of comment can be cached in ETS, just like an existing query, which is why I thought Ecto might be a better fit. However, I’m happy to revert to the Postgrex pr.

@josevalim
Copy link
Member

I think it is good to have explored both approaches. I think starting with PG is the best route at the moment :) Thank you ❤️

@josevalim josevalim closed this Oct 21, 2024
@Schultzer
Copy link
Contributor

You're right – most of the options can be added when we call the Repo and pass them as parameters. I created this PR mainly to retain the ability to use prepared queries with comments, especially when the comment includes, for example, only the app name executing the DB call. The traces and spans are a nice addition, especially since sqlcommenter has become a widely supported standard.

My main use case is to identify where an expensive query originates. This type of comment can be cached in ETS, just like an existing query, which is why I thought Ecto might be a better fit. However, I’m happy to revert to the Postgrex pr.

Out of curiosity, what is lacking currently from telemetry events for you to determine where a query originated from?

@dkuku
Copy link
Contributor Author

dkuku commented Oct 21, 2024

Out of curiosity, what is lacking currently from telemetry events for you to determine where a query originated from?

You need to forward telemetry events to, for example, Datadog. The default Postgres integration does not understand telemetry but does support sqlcommenter. Is there a way to integrate it with telemetry here?

Currently, a single database can be used by multiple frameworks. In the simplest scenario, you might migrate from Rails to Elixir, resulting in queries from two different frameworks appearing in the Postgres logs.

Multiple Elixir microservices can also execute queries on the same database (e.g., when you extract part of a microservice into a separate one).

Additionally, a single context module can be executed from multiple places (REST, GraphQL, gRPC, event consumers, Oban jobs, and periodic tasks).

In large applications, you might have all of the above scenarios. In case something goes wrong, the more data you have for debugging, the better.

@Schultzer
Copy link
Contributor

Out of curiosity, what is lacking currently from telemetry events for you to determine where a query originated from?

You need to forward telemetry events to, for example, Datadog. The default Postgres integration does not understand telemetry but does support sqlcommenter. Is there a way to integrate it with telemetry here?

Currently, a single database can be used by multiple frameworks. In the simplest scenario, you might migrate from Rails to Elixir, resulting in queries from two different frameworks appearing in the Postgres logs.

Multiple Elixir microservices can also execute queries on the same database (e.g., when you extract part of a microservice into a separate one).

Additionally, a single context module can be executed from multiple places (REST, GraphQL, gRPC, event consumers, Oban jobs, and periodic tasks).

In large applications, you might have all of the above scenarios. In case something goes wrong, the more data you have for debugging, the better.

I agree, currently working on a tool for that, to help with database observability, which leverages telemetry events and store them as time series in a automatically partitioned table based on which Ecto adapter is being used. I even reconcile dependencies so when clicking on the stacktrace it takes you to the repository. Anyway feel free to reach out if this is something that interest you, you can find my email in my github profile.

@aloukissas
Copy link

To give some more context: having this functionality is required to enable DB monitoring on Datadog to link queries back to Ecto Otel spans (this does not work today). In fact, this is exactly how they do it in e.g. their nodejs library.

@Schultzer
Copy link
Contributor

To give some more context: having this functionality is required to enable DB monitoring on Datadog to link queries back to Ecto Otel spans (this does not work today). In fact, this is exactly how they do it in e.g. their nodejs library.

How is telemetry not sufficient in this case?

You can sent Ecto telemetry events to datadog or even export them to otel.

@aloukissas
Copy link

aloukissas commented Dec 18, 2024

To give some more context: having this functionality is required to enable DB monitoring on Datadog to link queries back to Ecto Otel spans (this does not work today). In fact, this is exactly how they do it in e.g. their nodejs library.

How is telemetry not sufficient in this case?

You can sent Ecto telemetry events to datadog or even export them to otel.

It does not work. You need the span_id and other metadata in the statement sent to the DB. All the APM libs from Datadog do this. Think of it like this: w/o explicit span_id, the system can only guess (by matching the statement and the tiemstamp).

@tsloughter
Copy link

@Schultzer telemetry events don't correlate with telemetry from the database itself. It won't tell you that your query is running a sequential scan.

Passing the context (OpenTelemetry specifically) enables this capability and is why sqlcommenter exists and why I think it is important for ecto and postgrex to support this.

@dkuku
Copy link
Contributor Author

dkuku commented Dec 18, 2024

@aloukissas For spans you can follow this guide:
https://dev.to/dkuku/sql-commenter-with-postgrex-2bfd
And look at the source of this library:
https://github.com/dkuku/opentelemetry_sqlcommenter
this pr is more to add comments that are static like the codeowner or app name that executed for shared databases and connection poolers.

@tsloughter
Copy link

@dkuku oh snap :), interested in getting that up into https://github.com/open-telemetry/opentelemetry-erlang-contrib

@dkuku
Copy link
Contributor Author

dkuku commented Dec 18, 2024

@tsloughter we can do that, I can also transfer the package ownership.

@tsloughter
Copy link

@dkuku Cool. Are you on slack? Lets leave Ecto repo's PR comments :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants