Add schema definition breakdown feature #123

marcpascualsanchez · 2022-05-04T14:51:12Z

Hello @tot-ra 👋

We are planning to add some features mentioned in the roadmap, starting by adding a schema usage breakdown (Query, Mutations, Scalars, Objects...).
The aim is to store all the schema definitions to be able to display them and also allow next steps such as usage tracking.

Backend changes

We would like to create some new MySQL tables with the distribution showed below:

Having this, we could parse the type_defs received in the schema/push endpoint to the new tables. Furthermore we will add new graphQL queries for the frontend to consume this data.

Frontend changes

Consume the backend new graphQL queries to present all the schema definitions adding new pages.

Thank you for your time, and nice work! 😃

The text was updated successfully, but these errors were encountered:

tot-ra · 2022-05-04T15:12:41Z

Hey. Thx for the interesting topic, we do have schema usage in PD ourselves too (though its not so nested).
Some questions:

How is UI/API going to look like?
Lets say I make a query { user { name } }. Do you plan to insert new record for every query into operation, type & field tables? Otherwise I don't see exactly how you're getting schema usage breakdown (by property). If you do that, then this is not going to scale very well, because we can get a lot of queries & a lot of properties.
why do you need fields like is_nullable and is_array [for usage]?

SirJalias · 2022-05-04T15:29:12Z

Hello!

How is UI/API going to look like?

It is going to be similar to other solutions on the market, I can share tomorrow some prototyping.

Do you plan to insert new record for every query into operation, type & field tables?

That is the plan, the idea is to have control about what are the fields inside each operation , this should not be an issue, on the other hand we plan to store the usage of those fields on the operations but it is not going to be stored forever, the idea is to have records from the last 30 days otherwise can be a decrease in the performance

oscarSeGa · 2022-05-04T15:32:12Z

Hello 👋

why do you need fields like is_nullable and is_array [for usage]?

We decided to add this columns in the fields table, because we are planning to be able to represent data like the following -> [String ! ] !. So for this example, it will be is_array=true, is_nullable=false and is_array_nullable=false

{ user { name } }

For this example, we will need to store the query on the operation table, the name on the fieldtable and assuming name is type String, we also need to add the String in the type column as Scalar. With all of that data stored, we can know the usage for the Query, and also for the attribute name. Because we will be able to register the usages on the requested_fieldsand requested_operations

tot-ra · 2022-05-04T15:51:43Z

what are the fields inside each operation
we also need to add the String in the...

Thats not going to scale. Here at pipedrive, we serve >8k requests per minute.
Thats 8k amount of INSERTS just if you assume its one field requested.
At that rate, your mysql table would have 345M rows by the end of the month..

tot-ra · 2022-05-04T16:41:34Z

I would suggest to consider this kind of architecture:

Gateway needs to send requested query to some queue (pubsub redis or better kafka)
Then some piece of code, preferably written in golang that can efficiently utilize all CPUs, would fetch the query, parse it to AST tree, use graphql's visitor, go through all graph nodes, increase property count (usage) & store it in memory
in graphql visitor you need to map queried field onto current live schema, because { user{name} } has no knowledge of User type
Then once in ~1 minute, it would take data from memory and flush it to mysql (with bulk insert)
Basic & most valuable information is hits per day per property (User.name: 1 )
Periodically, you need to cleanup old usage info. I'd suggest 5 day usage retention. But for smaller projects, I guess it makes sense to have 30 days.
The more granular & more connected data you need, the bigger disk space you need. So these values should be configureable. But Ideally you shouldn't have more than 1M rows in a table.
As I mentioned golang, It doesn't matter that much, as long as this processing can be moved to a separate dockerized process. DB can remain the same.

oscarSeGa · 2022-05-05T10:17:06Z

Hello :)

We discussed internally about your suggestion and right now we are going to focus on the breakdown queries when receiving a schema/push endpoint. Meanwhile we are going to explore a new solution for the schema usage and share it on this thread again.

Thanks for the patience

tot-ra · 2022-05-05T11:02:59Z

breakdown queries

do you mean when someone pushes the schema, you want to parse type_defs and save it into relational form? I guess that may help to build UI where you can focus on specific entity or property (like apollo studio does). The possible problem there is that it may be inconsistent with actual type_defs that are stored as text. So I assume text form will remain source of truth.

oscarSeGa · 2022-05-05T11:08:56Z

Exactly as you said. We will store everything on the database tables, to be able to display the model similar to apollo studio does. And yes, the text form will be the source of truth.

oscarSeGa · 2022-05-10T16:19:43Z

Hello @tot-ra , as mentioned before, we are going to start working on the break down feature, before planning the schema usage feature. We would like to know your opinion when a schema is being updated (new type, modifying a field, removing a query...), if we encounter a breaking change, as we don't know if the change is being used by anyone, we are planning to add a header on the /schema/push http POST as a "force" mechanism to allow the schema update. By default, will be false, so if we encounter a breaking change, it won't be possible to update the schema.

As soon as the usage feature is working, we will change this behaviour, and will be only valid to update an schema in case the breaking change is not being used.

tot-ra · 2022-09-12T11:56:48Z

closing this, lets continue in #146

marcpascualsanchez mentioned this issue May 4, 2022

Add TypeScript definitions #124

Closed

tot-ra added the new feature New feature or request label May 4, 2022

SirJalias mentioned this issue Sep 7, 2022

Adding extra features / schema definition breakdown & sync usage feature #146

Open

tot-ra closed this as completed Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add schema definition breakdown feature #123

Add schema definition breakdown feature #123

marcpascualsanchez commented May 4, 2022

tot-ra commented May 4, 2022

SirJalias commented May 4, 2022 •

edited

Loading

oscarSeGa commented May 4, 2022 •

edited

Loading

tot-ra commented May 4, 2022 •

edited

Loading

tot-ra commented May 4, 2022 •

edited

Loading

oscarSeGa commented May 5, 2022

tot-ra commented May 5, 2022 •

edited

Loading

oscarSeGa commented May 5, 2022

oscarSeGa commented May 10, 2022

tot-ra commented Sep 12, 2022

Add schema definition breakdown feature #123

Add schema definition breakdown feature #123

Comments

marcpascualsanchez commented May 4, 2022

Backend changes

Frontend changes

tot-ra commented May 4, 2022

SirJalias commented May 4, 2022 • edited Loading

oscarSeGa commented May 4, 2022 • edited Loading

tot-ra commented May 4, 2022 • edited Loading

tot-ra commented May 4, 2022 • edited Loading

oscarSeGa commented May 5, 2022

tot-ra commented May 5, 2022 • edited Loading

oscarSeGa commented May 5, 2022

oscarSeGa commented May 10, 2022

tot-ra commented Sep 12, 2022

SirJalias commented May 4, 2022 •

edited

Loading

oscarSeGa commented May 4, 2022 •

edited

Loading

tot-ra commented May 4, 2022 •

edited

Loading

tot-ra commented May 4, 2022 •

edited

Loading

tot-ra commented May 5, 2022 •

edited

Loading