Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add schema definition breakdown feature #123

Closed
marcpascualsanchez opened this issue May 4, 2022 · 10 comments
Closed

Add schema definition breakdown feature #123

marcpascualsanchez opened this issue May 4, 2022 · 10 comments
Labels
new feature New feature or request

Comments

@marcpascualsanchez
Copy link

Hello @tot-ra 👋

We are planning to add some features mentioned in the roadmap, starting by adding a schema usage breakdown (Query, Mutations, Scalars, Objects...).
The aim is to store all the schema definitions to be able to display them and also allow next steps such as usage tracking.

Backend changes

We would like to create some new MySQL tables with the distribution showed below:
image
Having this, we could parse the type_defs received in the schema/push endpoint to the new tables. Furthermore we will add new graphQL queries for the frontend to consume this data.

Frontend changes

Consume the backend new graphQL queries to present all the schema definitions adding new pages.

Thank you for your time, and nice work! 😃

@tot-ra
Copy link
Owner

tot-ra commented May 4, 2022

Hey. Thx for the interesting topic, we do have schema usage in PD ourselves too (though its not so nested).
Some questions:

  • How is UI/API going to look like?
  • Lets say I make a query { user { name } }. Do you plan to insert new record for every query into operation, type & field tables? Otherwise I don't see exactly how you're getting schema usage breakdown (by property). If you do that, then this is not going to scale very well, because we can get a lot of queries & a lot of properties.
  • why do you need fields like is_nullable and is_array [for usage]?

@SirJalias
Copy link

SirJalias commented May 4, 2022

Hello!

  • How is UI/API going to look like?

It is going to be similar to other solutions on the market, I can share tomorrow some prototyping.

  • Do you plan to insert new record for every query into operation, type & field tables?

That is the plan, the idea is to have control about what are the fields inside each operation , this should not be an issue, on the other hand we plan to store the usage of those fields on the operations but it is not going to be stored forever, the idea is to have records from the last 30 days otherwise can be a decrease in the performance

@oscarSeGa
Copy link

oscarSeGa commented May 4, 2022

Hello 👋

  • why do you need fields like is_nullable and is_array [for usage]?

We decided to add this columns in the fields table, because we are planning to be able to represent data like the following -> [String ! ] !. So for this example, it will be is_array=true, is_nullable=false and is_array_nullable=false

  • { user { name } }

For this example, we will need to store the query on the operation table, the name on the fieldtable and assuming name is type String, we also need to add the String in the type column as Scalar. With all of that data stored, we can know the usage for the Query, and also for the attribute name. Because we will be able to register the usages on the requested_fieldsand requested_operations

@tot-ra
Copy link
Owner

tot-ra commented May 4, 2022

what are the fields inside each operation
we also need to add the String in the...

Thats not going to scale. Here at pipedrive, we serve >8k requests per minute.
Thats 8k amount of INSERTS just if you assume its one field requested.
At that rate, your mysql table would have 345M rows by the end of the month..

@tot-ra tot-ra added the new feature New feature or request label May 4, 2022
@tot-ra
Copy link
Owner

tot-ra commented May 4, 2022

I would suggest to consider this kind of architecture:

  • Gateway needs to send requested query to some queue (pubsub redis or better kafka)
  • Then some piece of code, preferably written in golang that can efficiently utilize all CPUs, would fetch the query, parse it to AST tree, use graphql's visitor, go through all graph nodes, increase property count (usage) & store it in memory
  • in graphql visitor you need to map queried field onto current live schema, because { user{name} } has no knowledge of User type
  • Then once in ~1 minute, it would take data from memory and flush it to mysql (with bulk insert)
  • Basic & most valuable information is hits per day per property (User.name: 1 )
  • Periodically, you need to cleanup old usage info. I'd suggest 5 day usage retention. But for smaller projects, I guess it makes sense to have 30 days.
  • The more granular & more connected data you need, the bigger disk space you need. So these values should be configureable. But Ideally you shouldn't have more than 1M rows in a table.
  • As I mentioned golang, It doesn't matter that much, as long as this processing can be moved to a separate dockerized process. DB can remain the same.

@oscarSeGa
Copy link

Hello :)

We discussed internally about your suggestion and right now we are going to focus on the breakdown queries when receiving a schema/push endpoint. Meanwhile we are going to explore a new solution for the schema usage and share it on this thread again.

Thanks for the patience

@tot-ra
Copy link
Owner

tot-ra commented May 5, 2022

breakdown queries

do you mean when someone pushes the schema, you want to parse type_defs and save it into relational form? I guess that may help to build UI where you can focus on specific entity or property (like apollo studio does). The possible problem there is that it may be inconsistent with actual type_defs that are stored as text. So I assume text form will remain source of truth.

@oscarSeGa
Copy link

Exactly as you said. We will store everything on the database tables, to be able to display the model similar to apollo studio does. And yes, the text form will be the source of truth.

@oscarSeGa
Copy link

Hello @tot-ra , as mentioned before, we are going to start working on the break down feature, before planning the schema usage feature. We would like to know your opinion when a schema is being updated (new type, modifying a field, removing a query...), if we encounter a breaking change, as we don't know if the change is being used by anyone, we are planning to add a header on the /schema/push http POST as a "force" mechanism to allow the schema update. By default, will be false, so if we encounter a breaking change, it won't be possible to update the schema.

As soon as the usage feature is working, we will change this behaviour, and will be only valid to update an schema in case the breaking change is not being used.

@tot-ra
Copy link
Owner

tot-ra commented Sep 12, 2022

closing this, lets continue in #146

@tot-ra tot-ra closed this as completed Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants