This style guide describes best practices for Vanta's GraphQL API design and implementation. This is a living document; please feel free to propose changes!
- 1. Getting started
- 2. Schema style (implementation-independent rules for our .graphql files)
- 2.1. Guidelines for GraphQL types
- 2.1.1. Prefer custom scalar types with semantic meaning
- 2.1.2. Prefer non-nullable types
- 2.1.3. Prefer ID types over String types for identifiers
- 2.1.4. Names should be as specific as possible
- 2.1.5. Use enums when the set of types is constrained
- 2.1.6. Make impossible states impossible
- 2.1.7. Prefer interfaces to unions when there is shared meaning
- 2.1.8. No type should depend on a sibling type to get relevant information
- 2.2. Guidelines for Errors
- 2.3. Guidelines for Mutations
- 2.3.1. All mutations must be direct children of the root
Mutation
type - 2.3.2. All mutations must return a unique union type with the suffix
Payload
- 2.3.3. Mutations must take a single, unique argument called
input
- 2.3.4. When in doubt, write 2 mutations
- 2.3.5. Internal mutations should be prefixed with an underscore
- 2.3.1. All mutations must be direct children of the root
- 2.4. Guidelines for Queries
- 2.5. Pagination
- 2.6. Deprecation
- 2.1. Guidelines for GraphQL types
- 3. Resolver style (rules for our .ts resolvers)
We use a schema-first design. This means that we explicitly define our entire GraphQL schema in the Graphql Schema Definition Language (SDL). This allows us to define our schema independent of the implementation and ensure that we are prioritizing correct API design over convenient implementation.
Without tooling, schema-first API design can be difficult to maintain. To enforce implementation completeness and type correctness, we lean heavily on code generation.
We use graphql-eslint to enforce many style rules across our schema.
In addition to the rules provided by graphql-eslint, we define our own rules in eslint-plugin-vanta. Don't be afraid to introduce a new lint rule by submitting a PR to that repo and adding it to our linter!
While these are all enforced by our linter, it's worth pointing out a few specific style rules:
- Our GraphQL SDL files are all autoformatted by Prettier
- Descriptions are required on all object types
- Unreachable types are banned and are automatically removed from the schema
Sometimes a String
is just a String
. Often, however, a String
is a bit of HTML, or a date, or an email address. In these cases, we prefer to define a custom scalar type that has semantic meaning. This allows developers to write validation/serialization logic that is more specific to that type and prevent leaking implementation details to API consumers.
Bad:
type Webpage {
contentHTML: String!
}
Good:
type Webpage {
content: HTML!
}
See our DateTime
scalar for a good example of this rule in practice. Some common scalars are available in the graphql-scalars library.
Make fields non-nullable when we don't expect them to be null.
We use nullability to express the notion of optionality. If a field is really nullable – e.g. an optional displayName
, the schema should reflect that by allowing nullability. However, if a field cannot be resolved because of a failed network call, we prefer to fail the whole request instead of returning a null value.
This means that users may have to retry a whole request even if only one small part of the request fails. However, to prevent repeated null checks on the client-side, this is a tradeoff we're willing to make.
That said, changing a field from non-nullable to nullable is a breaking change, but the opposite isn't – if you're unsure about the nullability of a field, it's ok to make it nullable – we can always add an !
later.
GraphQL has a built-in type called ID
that is used to represent a unique identifier. For any non-human-readable ID, use ID
instead of String
.
Bad:
type User {
id: String!
name: String!
}
Good:
type User {
id: ID!
name: String!
}
The id
field must be unique across all objects of a given type. That is, Object A === Object B iff their ids are equal. This is required for caching to work properly.
It's difficult to change a field name once clients are using it. Make names overly specific so they're unlikely to collide with new concepts that we want to introduce to the graph.
Bad:
type Query {
teams: [Team!]!
}
Good:
type Query {
underwaterRugbyTeams: [UnderwaterRugbyTeam!]!
}
Often an input or output type is known to have a fixed set of possible values. In this case, we prefer to define an enum type. This allows safer API use and also generates a stricter TypesScript type for us.
Bad:
type Paint {
color: String!
}
Good:
enum Color {
RED
GREEN
BLUE
}
type Paint {
color: Color!
}
It should be impossible to define a type with inconsistent information.
For example, the following is not allowed:
type Address {
zipcode: Zipcode
postalCode: PostalCode
city: String
state: String
province: String
}
Since a real address will probably have exactly one of zipcode and postalcode, and one of state or province. Instead, this should be rewritten as
union Address = AmericanAddress | CanadianAddress
type AmericanAddress {
zipcode: Zipcode!
city: String!
state: String!
}
type CanadianAddress {
postalCode: PostalCode!
city: String!
province: String!
}
Another way to express this idea is: write types such that the number of nullable types is minimized. The type system should express constraints without requiring domain-specific knowledge from the api consumer.
This rule applies across our whole schema – several rules regarding input types and errors are also implied by this philosophy.
Union types require the user to specify a case for every possible type. Interfaces allow a user to select fields in common if they don't care about the fields that differ.
Don't overuse interfaces – if a field does not have the same meaning between two types, it shouldn't be a common field on the interface, even if it shares a name. However, when there is a reasonable inheritance-esque relationship, use an interface instead of a union type.
Bad:
union Animal = Dog | Cat
type Dog = {
numLegs: Int!
barksPerMinute: Int!
name: String!
}
type Cat = {
numLegs: Int!
meowsPerMinute: Int!
name: String!
}
Good:
interface Animal {
name: String!
numLegs: Int!
}
type Dog implements Animal {
numLegs: Int!
barksPerMinute: Int!
name: String!
}
type Cat implements Animal {
numLegs: Int!
meowsPerMinute: Int!
name: String!
}
If there is a relationship between two types, expose the relationship in the graph – one should be a child of the other. They can also be children of one another.
In other words, avoid having fields on types that are simply IDs of other types. This allows the client to avoid complicated joins in most cases, and only fetch the necessary data.
Bad:
type Cat {
age: Int
breed: String!
name: String!
nemesisId: String
houseId: String
}
type Dog {
age: Int
breed: String!
name: String!
houseId: String
}
type House {
id: String!
name: String!
}
type Query {
cats: [Cat!]!
dogs: [Dog!]!
houses: [House!]!
}
# this schema gives you all the information you need,
# but requires you to join after making database queries.
# Additionally, it makes pagination nearly impossible since you
# have to load all of the data to figure out what maps to what.
Good:
type Cat {
age: Int
breed: String!
name: String!
nemesis: Dog
house: House
}
type Dog {
age: Int
breed: String!
name: String!
house: House
}
type House {
name: String!
cats: [Cat!]!
dogs: [Dog!]!
}
type Query {
houses: [House!]!
# and maybe, if we want to get them across all houses
cats: [Cat!]!
dogs: [Dog!]!
}
# now we can query using idiomatic graphql. Want to get a house with all its cats? That's just
query {
houses {
name
cats {
name
}
}
}
There are two types of errors that we might encounter when serving a request.
- Logic errors – errors caused by a user doing something illegal given our business logic. For example, uploading a disallowed file type, insufficient funds for purchase, etc.
- Generic server errors – errors that have nothing to do with the business logic but nonetheless caused a request to fail. For example, network errors, rate limiting, etc.
Logic errors fall into a few different subtypes:
- VantaAuthenticationErrors - user is unauthenticated. This is an ApolloError extending the AuthenticationError type.
- VantaForbiddenErrors - user is unauthorized. This is an ApolloError extending the ForbiddenError type.
- ResourceNotFoundErrors - a resource we tried to look up doesn't exist. Note that this should also be used for resources that do exist but which the user cannot access. This is a custom ApolloError type.
- InvalidInputErrors - something about the shape of the input was wrong, for example, SLA value was a negative number, startDate value was after endDate value, etc. Use this for cases where the client could have validated that the input was incorrect. This is an ApolloError extending the UserInputError type.
- other expected user errors- user tried to do something illegal given our business logic, for example, uploading a disallowed file type. These errors are considered part of our API and should explicitly be part of the schema as types extending BaseUserError.
Queries should only ever use ApolloErrors, never UserErrors. Queries that take list input and return list output should not throw resource not found, but should return null values in the list for any missing resources.
Mutations may use ApolloErrors or UserErrors as appropriate.
When there is an expected failure case that we want to expose to the API consumer, we prefer to return a union type that contains all possible errors. This forces the client to decide how to handle the error and jives with our "make impossible states impossible" philosophy.
The error's __typename should be used as the machine-readable error code. The message should be a message that can be shown to the user.
If an error type must expose fields other message
, then you can define your own type which implements UserError
.
type Mutation {
register(input: RegisterInput!): RegisterPayload
}
union RegisterPayload =
RegisterSuccess |
PasswordTooWeakError
interface UserError {
message: String!
}
type PasswordTooWeakError implements UserError {
message: String!
passwordRules: [String!]!
}
...
All error types should extend the UserError
interface so the client does not have to exhaustively check every possible error type.
For example, a client might make the following request:
mutation {
register(input: $input) {
... on RegisterSuccess {
id
}
# Get message and code from all errors
... on UserError {
__typename
message
}
# Get additional specific info about PasswordTooWeak errors
... on PasswordTooWeakError {
passwordRules
}
}
}
This way, adding new error types to the union doesn't break clients.
Relevant lint rules:
- errors-implement-usererror (link to lint rule enforcing A type should extend the
UserError
interface iff its name ends withError
)
GraphQL has a special errors
key that allows us to return errors to the client. When a request is successful, it returns a data
key that contains the requested data. However, if the request fails, it returns an errors
key that contains a list of errors.
When we use one of the Apollo error types defined above, this information is returned in the errors
key.
Unexpected errors should be reported as generic ServerErrors in the errors
key.
Example developer error for a malformed request:
Query:
query q {
domain {
idd
}
}
Response:
{
"error": {
"errors": [
{
"message": "Cannot query field \"idd\" on type \"domain\". Did you mean \"id\"?",
"locations": [
{
"line": 3,
"column": 5
}
],
"extensions": {
"code": "GRAPHQL_VALIDATION_FAILED"
}
}
]
}
}
Example developer error for a generic server error – basically equivalent to a 5xx:
{
"errors": [
{
"message": "Server error",
"extensions": {
"code": "ServerError"
}
}
],
"data": null
}
Bad:
type Mutation {
post: PostMutations
}
type PostMutations {
like(id: ID!): LikePostPayload
unlike(id: ID!): UnlikePostPayload
create: CreatePostPayload
}
Good:
type Mutation {
likePost(id: ID!): LikePostPayload
unlikePost(id: ID!): UnlikePostPayload
createPost: CreatePostPayload
}
Rationale: The GraphQL spec guarantees that mutations run serially, top-to-bottom. This means that the following mutation will be executed serially on the server, and the post is guaranteed to be unliked after it is executed:
mutation brokenLikeThenUnlikePost(id: $id) { # this runs first likePost(id: $id) # then this runs unlikePost(id: $id) }It may be tempting to namespace mutations to group related actions. However, the serial guarantee only applies to top-level mutations!
mutation likeThenUnlikePost(id: $id) { post { # This runs in parallel like(id: $id) # with this unlike(id: $id) } }So it's impossible to predict the outcome of multiple mutations in a single request.
A user can force these to be executed serially by doing something like
mutation likeThenUnlikePost(id: $id) { # this runs first like: post { like(id: $id) } # then this unlike: post { unlike(id: $id) } }but we can't enforce that client code does this, so we don't allow it.
Mutations should all return union types named with the suffix Payload
.
A mutation payload must contain a unique success object type with the suffix Success
, the base error type BaseUserError
, and zero or more specific error types that adhere to the error guidelines above.
This section is still being considered. Depending on implementation complexity, we may want to reevaluate whether every payload needs to be a union type and, if so, whether they should all include a
BaseUserError
.
type Mutation {
register(input: RegisterInput!): RegisterPayload
}
union RegisterPayload =
RegisterSuccess |
BaseUserError |
InvalidEmailError |
PasswordTooWeakError
type RegisterSuccess {
user: User!
}
type BaseUserError implements UserError {
message: String!
}
type InvalidEmailError implements UserError {
...
}
type PasswordTooWeakError implements UserError {
...
}
Relevant lint rules:
Having a unique input
type for each mutation makes it easier for clients to pass objects in as input, and also allows us to add arguments without making a breaking change to the schema.
Relevant lint rules:
Currently, there is no way to have a union input type. There is a graphql proposal to add this feature, but it is not currently part of the spec.
In the spirit of the "make impossible states impossible" rule above, mutations (or queries) that require one of several arguments should be split into two mutations.
Bad:
type Mutation {
createUser(input: CreateUserInput!): CreateUserPayload
}
type CreateUserInput: {
name: string!
# exactly one of email and phoneNumber is required
email: string
phoneNumber: string
}
union CreateUserPayload =
CreateUserSuccess |
BaseUserError |
InvalidEmailError |
InvalidPhoneNumberError
Good:
type Mutation {
createUserByEmail(input: CreateUserByEmailInput!): CreateUserByEmailPayload
createUserByPhoneNumber(input: CreateUserByPhoneNumberInput!): CreateUserByPhoneNumberPayload
}
type CreateUserByEmailInput: {
name: String!
email: String!
}
type CreateUserByPhoneNumberInput: {
name: String!
phoneNumber: String!
}
union CreateUserByEmailPayload =
CreateUserSuccess |
BaseUserError |
InvalidEmailError
union CreateUserByPhoneNumberPayload =
CreateUserSuccess |
BaseUserError |
InvalidPhoneNumberError
Mutations that are accessible only to internal Vanta users should be prefixed with a _
character.
See the equivalent mutation rule above.
If a query takes a boolean argument, there's a good chance that it could be better rewritten as two queries.
Bad:
type Query {
getUsers(active: Boolean!): [User!]!
}
Good:
type Query {
getActiveUsers: [User!]!
getInactiveUsers: [User!]!
}
We want our API to be self-documenting when possible. Instead of providing a nullable parameter that requires documentation, provide a default value when it makes sense to do so.
Bad:
type Query {
"""
Default sort order is DESC
"""
allUsers(sort: Order): [User!]!
}
Good:
type Query {
allUsers(sort: Order = DESC): [User!]!
}
Usually, when there is a need for a query that selects a single object by some identifier, it is just as useful to have a more powerful query that selects multiple objects at a time.
Default to implementing a plural version of most fields. Dealing with a list can be annoying, so it's ok to also define a singular version when necessary.
Bad (no plural form):
type Query {
getUserById(id: ID!): User
}
Good (plural form):
type Query {
# List of users corresponds to the list of IDs. If a user is not found,
# the value at that id's index is null.
getUsersById(ids: [ID!]!): [User]!
}
Also OK (both, for really common queries)
type Query {
getUsersById(ids: [ID!]!): [User]!
getUserById(id: ID!): User
}
Queries that take a list as a parameter must return a list of the same length, where the index of each result is the same as the index of the corresponding parameter.
An API consumer might ask for all of the users in their domain or all resources of some type. As our customers become more complex, these queries may return thousands or even hundreds of thousands of results. To guarantee reasonable performance invariants, we must offer paginated fields for all queries that return lists of results. We should not offer non-paginated alternatives.
See the pagination section for details about pagination.
Relevant lint rules:
- all-lists-in-connections (List types must be parts of connections or be whitelisted as constant length)
We use relay-style pagination. I won’t copy that spec here.
Relevant lint rules:
- connections-are-relay-compliant (Any type that ends in
Connection
must be a relay connection.)- edges-are-relay-compliant (Any type that ends in
edge
must be a relay edge)
The Relay spec allows for a first
or last
argument to specify how many results to return. Queries requesting more than 100 results should be rejected.
A connection as described in the cursor spec looks like this:
{
user {
# type: User
id
name
friends(first: 10, after: "opaqueCursor") {
edges {
cursor
node {
# also type: User
id
name
}
}
pageInfo {
hasNextPage
}
}
}
}
Information about the relationship between the user and their friends should live on the edges
field, not on the user type:
{
user {
id
name
friends(first: 10, after: "opaqueCursor") {
edges {
cursor
howMet # The relationship between a user and a friend belongs on the edge, not on either user type
node {
id
name
}
}
pageInfo {
hasNextPage
}
}
}
}
Our GraphQL schema is unversioned. Instead, we continuously evolve our API. While we do our best to plan for the future, we sometimes need to make breaking changes. To do this, we use deprecation:
- Add a @deprecated directive to the deprecated field or type, along with a reason and a planned removal date.
- Monitor clients to ensure that the deprecated field is no longer being used.
- Remove the field from the API
Some examples of breaking changes:-
- Removing a field
- Changing a field from non-nullable to nullable
Resolvers should do the meat of their logic as close to each “leaf” (scalar) as possible. If a user doesn't query for a field, we shouldn't do any extra work to compute it. Only when a user explicitly asks for a field should we compute and return it.
An example: Consider the following graphql schema:
type Cat { age: Int breed: String! name: String! nemesis: Dog } type Dog { age: Int breed: String! name: String! favoriteFoods: [String!]! } type Query { cats: [Cat!]! dogs: [Dog!]! }And its corresponding database schema:
cats id: mongoid age: optionalNumber breed: string name: string nemesisId: string (dogId) dogs id: mongoId age: optionalNumber breed: string name: string favoriteFoodsId: foodListId foodList id: mongoId foods: string[]
Resolver definitions:
Bad
catsResolver: // query resolver allCats = cats.find() nemeses = dogs.find(id in allCats.map(a => a.nemesisId)) // why do the work to get nemeses if we don't always query for it? allCatsWithNemeses = allCats.map(c => c with nemesis) // still need to get favorite food of the dogs here... return allCatsWithNemeses dogsResolver: // query resolver allDogs = dogs.find() dogsWithFoods = [] for (dog in allDogs): favoriteFoods = foodList.find(id: dog.favoriteFoodId) dogsWithFoods.push({...dog, favoriteFoods}) CatResolver: // type resolver // doesn't exist, since catsresolver already does this DogResolver: // type resolver // doesn't exist, since dogsresolver already gets all its fieldsGood
catsResolver: // query resolver return cats.find() dogsResolver: // query resolver return dogs.find() CatResolver: // type resolver, parent is of type mongoCat { // can technically omit, but explicit is better than implicit! age: (parent) => parent.age breed (parent) => parent.breed name (parent) => parent.name nemesis (parent, args, req) => GetDogByIdDataloader(req).load(parent.nemesisId) } DogResolver: // type resolver, parent is of type mongoDog { age (parent) => parent.age breed (parent) => parent.breed name (parent) => parent.name favoriteFoods (parent, args, req) => GetFavoriteFoodByIdDataLoader(req).load(parent.favoriteFoodsId) }With the “good” formulation, we could add another query
dogsWhoAreTenYearsoldOrYounger
with essentially a single line of code. With thebad
formulation, we’d have to rewrite the dogsResolver.
Since clients can construct their own queries, they often query for the same bit of data multiple times in a graph. For example, consider the following query:
query q {
users {
id
name
friends {
id
name
}
}
}
A naive implementation would be to do a database call to get the list of users, then another database call per (user, friend) pair to get the friend's name. This means that we need to make up to n^2 database calls to get the data we need!
We're doing a lot of redundant work here, though – we've already fetched all of the friends' names in the initial user list call. But we don't want to rewrite our logic to do extra work in the user list call since that would violate our principle of laziness.
Instead, we use a DataLoader. A DataLoader is a caching mechanism that allows you to request a single bit of data at each point in the graph, but it batches all of the requests together so redundant requests are only made once.
In most cases where you're making a database call within a resolver, you should be using a DataLoader since a client can request arbitrarily redundant information even if there isn't the n^2 relationship described above:
query q {
users1: getUsers(ids: [1, 2, 3]) {
id
name
}
users2: getUsers(ids: [2, 3, 4]) {
id
name
}
}
The DataLoader README does a more thorough job of explaining how and why to use DataLoaders.
Defining every field is a bit of extra boilerplate, but guarantees type safety and allows us to add new fields extremely easily.
We autogenerate default resolvers for all queries, so this is as simple as importing <type>DefaultResolvers
from gqltypes.ts
.
export const user: UserResolvers = {
// autofill all of the default resolvers, like user.id
...userDefaultResolvers,
// override or create resolvers for fields that need to be created on the fly
displayName: (user) => user.firstname + " " + user.lastname,
};
Resolvers for non-scalar types should almost always take an unmodified version of the corresponding database type as a parent. That means that any query can do a simple database call to filter the things it needs, then pass those straight down into a type resolver.
Resolvers should be as thin and light as possible and do a predictable amount of work.
We will introduce strict performance guidelines in the future, but for now, just remember:
- Try to ensure that any given resolver executes in well under 100ms
- Use DataLoaders!
- Resolvers should never do an unbounded amount of work – the amount of work done should be a small polynomial in the size of the input (linear in almost all cases) and the size of the input must be capped.
All queries and mutations must have an explicit authorization rule defined.
Individual types and fields can also define authorization rules, but it's less common that this is necessary.
To prevent graph traversal vulnerabilities, ensure that the graph is structured in such a way that sensitive fields are never accessible from non-sensitive fields. See the @public directive lint rule for more information.
We haven't designed our GraphQL API to work well with inter-service communication. If you need two services to communicate, use a queue or something like gRPC, depending on your requirements.
We discuss how errors are exposed to clients in the Guidelines for Errors section. On the server side, we should never return an information-free ServerError
unless something has really gone wrong.
Resolvers should handle errors that they know about and return non-sensitive ApolloError
s to the frontend when possible. Only unexpected bugs should be exposed to the user as ServerError
.