diff --git a/develop-docs/ab-testing/index.mdx b/develop-docs/ab-testing/index.mdx new file mode 100644 index 0000000000000..14b3d2ef1acac --- /dev/null +++ b/develop-docs/ab-testing/index.mdx @@ -0,0 +1,127 @@ +# A/B Tests with Amplitude + +## Why AB testing? + +1. It's a direct, easy-to-understand way to validate your hypothesis or product change +2. Even when you know your change is an improvement, it allows you to quantify the impact + +[Learn more](https://www.optimizely.com/optimization-glossary/ab-testing/) + +## Example + +- Adding a "signup with github” button to the sign up page +- Hypothesis: signup rate will increase because there's an easier way to sign up + +We showed half the visitors the new signup page, and the other half the old page. Turns out there was no effect on the signup rate, i.e. people do use GitHub to sign up, but if it isn't available they will use the regular form instead. + +## Details + +Our A/B testing framework is built on [Facebook's planout library](https://github.com/facebookarchive/planout). + +Let's say you want to optimize the clickthrough rate for Sentry links and decide to test whether to show Plugins in the Integration Directory. + +For any experiment, there are 2 events of interest: + +- Exposure: when the user is subjected to the experiment. In this case it is when a link is shown. +- Action: the behavior we're trying to observe, which should always be tied to a metric. In this case the action is a click and the metric is the clickthrough rate. + +## Adding an A/B Test + +To launch the A/B test you will need to do the following: + +1. Set up an A/B test in Getsentry to place users/organizations into different test groups. +2. Adding the experiment to the list of experiments in Sentry for Typescript definitions +3. Add the logic where you have to control the in-app behavior + +### Changes to `getsentry/getsentry` + +In **getsentry/getsentry**, locate the `getsentry/experiments/config.py` file. In this file, you will define an experiment class that inherits from either `OrgExperiment` or `UserExperiment`. Choose one depending on whether you want to assign the experiment across each user or each organization. + +For this example, we will use the `OrgExperiment` , this way everyone in the organization will be given the same experiment assignment. + +```python +# In getsentry/experiments/config.py + +from __future__ import absolute_import + +from django.conf import settings + +from .base import OrgExperiment +from planout.ops.random import WeightedChoice + +WEIGHT_VARIANT = 0.5 + +class ShowPluginsExperiment(OrgExperiment): + def default_value(self): + return "0" + + def assign(self, params, org, actor): + if settings.IS_TEST or org.slug == "sentry": + params.variant = "1" + else: + params.variant = WeightedChoice( + choices=["0", "1"], weights=[1 - WEIGHT_VARIANT, WEIGHT_VARIANT], unit=org.id + ) +``` + + +- You can set the assignment value to whatever is necessary for your experiment. +- Assignment logic: The `assign` function on the experiment class determines which group an entity falls into. In the most simple case, you have just two groups: a control and test group. If that's the case, you can use `BernoulliTrial` to generate the test value. If you have three or more groups, you can use `UniformChoice`. You can also insert whatever custom logic you want, such as enabling an experiment if a particular feature flag is set. +- You can use other random assignment operators. + +We are setting the values to "0" or "1" and giving a 50% chance an organization sees the plugins in the Integration Directory + +Initialize the experiment in `getsentry/web/apps.py` : + +If you add your experiment to `getsentry/experiments/config.py` ACTIVE_EXPERIMENTS then it will be automatically initialized because of this piece of code. So no need to make any edits. +```python +class Config(AppConfig): + for experiment in ACTIVE_EXPERIMENTS: + expt_manager.add(experiment, experiment.param_name) +``` + +### Changes to **`getsentry/sentry`** + +In `static/app/data/experimentConfig.tsx` : + +```tsx +export const experimentList = [ + { + key: 'ShowPluginsExperiment', + type: ExperimentType.Organization, + parameter: 'exposed', + assignments: [0, 1], + }, +] as const; +``` + +Now the experiment is available to you in the `organization` props. You can access it with: + +`organization?.experiments?.ShowPluginsExperiment`. + +Now you can build conditional logic to show whichever view you want, depending on if they are part of the experiment or the control group. + +### Adding Logic + +To use the experiment to drive in-app logic on the front-end you should use the `withExperiment` HoC. Here's an [example](https://github.com/getsentry/sentry/blob/76acba78e8bbd730873bb46e04f7930bb748d371/src/sentry/static/sentry/app/views/organizationIntegrations/integrationListDirectory.tsx#L491-L495) of it. It will add a prop called `experimentAssignment` which you can use to determine which experience to show. + +### Logging the Experiment + +In order to measure the result of the experiment, we need to log the experiment. This generates a `Triggered Experiment (Deduped)` event in Amplitude which you can use as the first step in your funnel (example [here](https://analytics.amplitude.com/sentry/chart/u5gt3z8)). If you use `withExperiment`, it will log the experiment automatically when the component mounts. If you want to only log the experiment under certain conditions (ex: org shouldn't qualify for the experiment). If you set `injectLogExperiment` to `true`, the child of the HoC is responsible for calling `logExperiment`. + +### Backend A/B Testing + +It is possible to run backend-only A/B testing where the experiment values have to be read in a different way than the FE. To do this, you need to do the following: + +```python +from sentry.experiments import manager as expt_manager + +# gets the value of assigned for ExampleMultiChoiceExperiment +exposed = expt_manager.get("ExampleMultiChoiceExperiment", org=org, actor=user) +``` + +You can then add checks on the value of `exposed` to drive your in-app logic. + +### Logging the Experiment on the Backend + +Logging the experiment on the backend can be tricky because it creates an analytics event each time it's triggered. That might be too many events depending on how often it's called. Refer to the implementation of the [log experiment endpoint](https://github.com/getsentry/getsentry/blob/master/getsentry/web/experiment.py) to see how to log the experiment on the backend. diff --git a/develop-docs/analytics.mdx b/develop-docs/analytics.mdx new file mode 100644 index 0000000000000..dfc72eca4c8bc --- /dev/null +++ b/develop-docs/analytics.mdx @@ -0,0 +1,316 @@ +--- +title: 'Analytics' +--- + +This guide steps you through instrumenting your code with Sentry's 3rd-party analytics infrastructure. + +## Big Query + +[BigQuery](https://cloud.google.com/bigquery/) is a Google data warehouse that a lot of our data calls home. This includes all our analytics data and some (not all) production data that might be of interest when joins are necessary for answering richer, more complex questions. From sentry/getsentry, our data goes through [reload](https://github.com/getsentry/reload), our ETL for BigQuery. + +As a general rule, do not add an analytic event if it has the potential to be triggered over 1,000,000 times/day. If this threshold is exceeded, it may cause the analytics service to crash. If you are unsure of how frequently the analytic event will be triggered, add it as a Datadog [metric](https://develop.sentry.dev/analytics/#metrics) first. + +## Amplitude + +[Amplitude](https://amplitude.com/) is a product analytics tool that we use to track user behavior. We use Amplitude to track events such as when a user views a page, clicks a button, or performs a search. + +## Backend Analytics + +### Step 1: Create your Analytics Events! + +Conventionally, the analytics events are stored in a file named `analytics` within the folder of the code it tracks. i.e.: `sentry/integrations/analytics.py` for the Integration analytic events and `sentry/api/analytics.py` for the API analytic events. + +The Event classes look like this: + +```python +from __future__ import absolute_import, print_function + +from sentry import analytics + +class ExampleTutorialCreatedEvent(analytics.Event): + type = 'example_tutorial.created' + + attributes = ( + analytics.Attribute('id'), + analytics.Attribute('user_id'), + ) + +class ExampleTutorialDeletedEvent(analytics.Event): + type = 'example_tutorial.deleted' + + attributes = ( + analytics.Attribute('id'), + analytics.Attribute('user_id'), + ) + +analytics.register(ExampleTutorialCreatedEvent) +analytics.register(ExampleTutorialDeletedEvent) +``` + +Your event classes will inherit from [`analytics.Event`](https://github.com/getsentry/sentry/blob/master/src/sentry/analytics/event.py) as shown above. All events have a `type` and `attributes`. + +- `type`: Describes what the Event is, and this name should be unique across all analytics event classes. + +- `attributes`: Parameters what you would like to track, for example the `user_id` of the user performing the action. All `attributes` must be an `Attribute` object as shown above. Note that you cannot create an `attribute` named `'type'`. + +Finally, register your event classes so that the analytics [event_manager](https://github.com/getsentry/sentry/blob/master/src/sentry/analytics/event_manager.py) will pick them up. + +### If you are creating the [analytics.py](http://analytics.py) file for the first time: + +If you are creating a new analytics file for the first time, you will need to add an import to the package's `__init__.py`. + +If the Event classes are defined in a file named: `sentry/examples/analytics`, then the class below would be defined at `sentry/examples/__init__.py`: + +```python +from __future__ import absolute_import + +from .analytics import * # NOQA +``` + +Here, you have your usual `absolute_import` but in addition you will import every class in your [`analytics.py`](http://analytics.py) add `# NOQA` to avoid linter complaints. + +### Step 2: Add it to the code you want to track + +You'll be adding code in some user-facing area like an API endpoint. + +```python +from sentry import analytics + +class ExampleEndpoint(Endpoint): + + def post(self, request): + example = Example.objects.create(...) + analytics.record( + 'example_tutorial.created', + id=example.id, + user_id=request.user.id, + ) + return Response(serialize(example, request.user)) +``` + +Do whatever you would normally with the endpoint, then use the `analytics.record` method to gather the information you want. Note that it takes input of the form: + +```python +analytics.record( + 'event_type_as_string', + =, + .... + =, +) +``` + +Run the tests that touch the endpoint to ensure everything is Gucci. + +### Step 3: Add it to ETL + +To also send the analytics event in Amplitude, add the event to this file: [https://github.com/getsentry/etl/blob/master/etl/operators/analytics_events_schema.py](https://github.com/getsentry/etl/blob/master/etl/operators/analytics_events_schema.py) + +```jsx +'my_backend_event.sent': { // Needs to match the `type` field in the class you created in step 1 + 'name': 'My Backend Event', // The event name in Amplitude + 'uid_field': 'target_user_id' // Optional. Field name that represents the user id of the event. +}, +``` + +Note that in the future this will change so that by default, all events will go to Amplitude. + +## Route-Based Frontend Analytics + +All in-app page loads will generate analytics events by default (both Reload and Amplitude). All analytics related to page load should be handled within the route-based analytics system instead of manually instrumenting front-end events. The default names are based on the parameterized path and are prefixed with `Page View`. Here is an example event name: + +``` +Page View: :OrgId :ProjectId Releases +``` + +Any field that has a `:` at the front is a parameterized field. The value of the field will be the value of the parameter. For example, the `:OrgId` field will be the value of the `orgId` parameter (really the `slug`). + +### Customization + +Route-based analytics can be customized to have different event names and parameters. For functional components, the following hooks can be used: + +- `useDisableRouteAnalytics`: Disables route-based analytics for the component. +- `useRouteAnalyticsEventNames`: Customizes the event name. First argument is the Reload `eventKey` and second argument is the Amplitude `eventName`. +- `useRouteAnalyticsParams`: Customizes the parameters. + +Example: + +```tsx +export default function SpanDetailsContentWrapper(props: Props) { + const {location, project,} = props; + + // customize the route analytics event we send + useRouteAnalyticsEventNames( + 'performance_views.span_summary.view', + 'Performance Views: Span Summary page viewed' + ); + useRouteAnalyticsParams({ + project_platforms: project ? getSelectedProjectPlatforms(location, [project]) : '', + }); + +``` + +The main purpose of customizing the event names is to create more readable event names that won't change if the underlying route changes. + +For class based components, the following functions can be used: + +- `setDisableRouteAnalytics`: Disables route-based analytics for the component. +- `setEventNames`: Customizes the event name. First argument is the Reload `eventKey` and second argument is the Amplitude `eventName`. +- `setRouteAnalyticsParams`: Customizes the parameters. + +You will need to import both `withRouteAnalytics` and `WithRouteAnalyticsProps`. `WithRouteAnalyticsProps` needs to be added to the Props of the class. The functions can then accessed through the Props. `withRouteAnalytics` is called while exporting the class. + +Example: + +```tsx +class Monitors extends AsyncView { + componentDidMount() { + this.props.setEventNames('monitors.page_viewed', 'Monitors: Page Viewed'); + } + +``` + +## Button Click Analytics + +All in-app button clicks will generate analytics events by default for Reload. As of now, they are not automatically being sent to Amplitude, but you can set it up so that the events are sent there. + +Out of the box, each click on a button will track the `text` (aria-label), `priority`, `href`, and `parameterized_path` of the button. You might encounter an issue where some of the data will be an empty string which is a known bug. The event name will be based on the parameterized path , similar to route-based analytics (see above for more information) but will be prefixed with `button_click`. + +### Customization + +There are a few ways to customize the button click analytics: + +- `analyticsEventName` - Allows the event to be sent to Amplitude with the given event name. If this is not set, then the event will not be sent to Amplitude +- `analyticsEventKey` - Overrides the default parameterized path event key and replaces it with the given event key. Unless `analyticsEventName` is also set, this will continue to only send events to Reload, but using the new event key. The parameterized path will still be sent as part of the event, but will not be the event key. +- `analyticsParams` - Adds additional parameters that will be tracked with each button click + +Example: + +```tsx + +``` + +In this example, clicking this button will now send an event to amplitude with the `analyticsEventName` , send an event to reload with the new `analyticsEventKey`, and add on `tour` as a parameter sent with each event. + +## For Frontend events + +For analytics events that happen in Sentry, the function `trackAnalytics` is what should be used. For Getsentry, `trackGetsentryAnalytics` is what to use. + +### Step 1: Add the Typescript Definition + +First, add the Typescript definiition of the event to an analytics event file inside the `analytics` directory like [issueAnalyticsEvents.tsx](https://github.com/getsentry/sentry/blob/master/static/app/utils/analytics/issueAnalyticsEvents.tsx). There are two parts of this: + +1. Define the event parameters that get exported + +```tsx +export type ExampleTutorialEventParameters = { + 'example_tutorial.created': { + source: string; + }; + 'example_tutorial.viewed': { + source: string; + }; +}; +``` + +2. Define the Reload to Amplitude name mapping. If the value is `null`, then the event will not be sent to Amplitude. This is usually only done for very high volumne events. + +```tsx +export const exampleTutorialEventMap: Record< + keyof ExampleTutorialEventParameters, + string | null +> = { + 'example_tutorial.created': 'Example Tutorial Created', + 'example_tutorial.viewed': null, // don't send to Amplitude +}; +``` + +### Step 2: Add the Event in Code + +Now, you can use the event in code using `trackAnalytics` or `trackGetsentryAnalytics`. Both functions take the following arguments: + +- `eventKey`: This describes the key used in Sentry's own analytics system (Reload). It will map to an Amplitude name as determined in step 1. + +- `analyticsParams`: This object will hold all the information you're interested in tracking. Generally, you always pass in an `Organization` object into it unless the event is not tied to a specific organization. In getsentry, you should pass the `Subscription` as well. Certain fields like the role and plan will be pulled out of those entities and added to the event payloads. + +- `options`: This field allows passing the following optional fields: + - `mapValuesFn`: An arbitrary function to map the parameters to new parameters + - `startSession`: If true, starts an analytics session. This session can be used to construct funnels. The start of the funnel should have startSession set to `true`. + - `time`: Optional unix timestamp. + +### Typing and Mapping + +All events should be typed which specifies what the payload should be. We also define a mapping from the Reload event name to the Amplitude event name. + +### Naming Convention + +Our current naming convention for Reload events is `descriptor.action` e.g. what we have above with `example_tutorial.created` and `example_tutorial.deleted` . You want these to be specific enough to capture the action of interest but not too specific that you have a million distinctly named events with information that could be captured in the data object. For example, if you can create your example tutorial from multiple places, fight the urge to have the source as part of your descriptor i.e. `example_tutorial_onboarding.created` and `example_tutorial_settings.created`. Your future self running analytics will thank you. Amplitude event names should be similar to the Reload event name except you should capitalize the words and use spaces instead of underscores. + +### Testing your Analytics Event + +When developing locally, analytics events will not be sent to Reload or Amplitude. To test to see if your event is sending as expected and with the correct data, you can set "DEBUG_ANALYTICS" to "1" in local storage on your browser. Then it will log the analytics event data to your console, whenever it would've sent an analytics event, allowing to check your analytics locally. + +**getsentry** + +```jsx +import React from 'react'; + +import { trackAnalytics } from 'getsentry/utils/analytics'; + +class ExampleComponent extends React.Component { + componentDidMount() { + trackAnalytics('example_tutorial.created', { + organization, + subscription, + source: 'wakanda', + }); + } + + render() { + return

HI!

; + } +} +``` + +**sentry** + +All you'll actually need is to import analytics from utils and call it wherever you need. Keep in mind the effect of React lifecycles on your data. In this case, we only want to send one event when the component mounts so we place it in `componentDidMount` . + +```jsx +import React from 'react'; + +import { trackAnalytics } from 'getsentry/utils/analytics'; + +class ExampleComponent extends React.Component { + componentDidMount() { + trackAnalytics('example_tutorial.deleted', { + organization, + source: 'wakanda', + }); + } + render() { + return

HI!

; + } +} +``` + +## Metrics + +Track aggregrate stats with [Metrics](/services/metrics/). For example, this can be used to track aggregate response codes for an endpoint. + +Import the metrics library and use the `metrics.inc` function. The key needs to be unique. + +```python +from sentry.utils import metrics + +metrics.incr( + "codeowners.create.http_response", # needs to be unique + sample_rate=1.0, + tags={"status": status}, +) +``` + +If you don't put a sample rate, you get 1 in 10 events. If the service is expected to have low traffic, we can start with a sample rate of 1. diff --git a/develop-docs/api/basics.mdx b/develop-docs/api/basics.mdx new file mode 100644 index 0000000000000..ab27b37f3e66c --- /dev/null +++ b/develop-docs/api/basics.mdx @@ -0,0 +1,17 @@ +--- +title: "REST API Basics" +sidebar_order: 1 +--- +This section includes common terms and resources to learn more about API design. If you're new to API design, this is a good place to start. + +## Common Terms +- **Resource** is the object you’re performing the action on with your endpoint. For example, in ProjectEndpoint the resource is Project. +- **Resource Identifier** can be an ID, like an event ID, or slug, like an organization slug. Note that it must be unique. For example, Sentry's project resource identifier is {org_slug}/{project_slug}, because projects are only unique within their organization. You can find more information about this in the slug vs. ID section. +- **Method** is what you do with the resource. Each endpoint can have multiple methods. For example in ProjectEndpoint, you can have a GET method that returns details of a specific project. +- **Collection** is basically an object type. You can map them to a Django object type like an Organization or a text object like an error. + +## Extra Resources +The following resources are helpful to learn about general API design best practices: + +- **[The Design of Web APIs](https://g.co/kgs/R7rXEk)** is an example-packed guide to designing APIs. +- **[API Design Patterns](https://g.co/kgs/Vfnpe5)** is comprehensive guide on key API patterns. diff --git a/develop-docs/api/checklist.mdx b/develop-docs/api/checklist.mdx new file mode 100644 index 0000000000000..60b77f3a820b1 --- /dev/null +++ b/develop-docs/api/checklist.mdx @@ -0,0 +1,17 @@ +--- +title: "Public API Checklist" +sidebar_order: 5 +--- + +Below is a checklist that developers should go through before making APIs public. + +1. APIs must return JSON. Exceptions to this rule are APIs where the user is specifically requesting for non JSON data. For example, requesting a CSV file or a debug file. +1. The API should be available if requested with a bearer token from the integration platform. +1. There are scopes that can be selected in the integration platform that enables or disables access to the resource exposed by the API. +1. If the API call being made is resource intensive on Sentry's infrastructure, there is a rate limit in place which rate limits excessive calls. +1. APIs returning a variable number of elements as a list should be paginated. They should be paginated using the link header standard. +1. End user requests should not easily be able to trigger 5xx errors in our application by manipulating request data or sending invalid/malformed data. +1. There are tests associated with the API which look for correctness. There should also be tests that check the response format to ensure the API does not deviate from its contract. +1. The API should return the same format for a status code, irrespective of the input provided. The content can be different but the response format should be the same. +1. When an API is only available when certain feature flags are enabled (perhaps through a pricing plan), the API should check for that. If a bearer token of an organization who does not have access to the feature is used, the API should return with a 403 Forbidden and an error explaining which feature flag is required. +1. APIs should use camelCase for query parameters and request/response bodies. diff --git a/develop-docs/api/concepts.mdx b/develop-docs/api/concepts.mdx new file mode 100644 index 0000000000000..5bd5f715d1758 --- /dev/null +++ b/develop-docs/api/concepts.mdx @@ -0,0 +1,142 @@ +--- +title: "API Concepts" +sidebar_order: 3 +--- +In this document, we will be looking at API concepts that exist and should be followed by endpoints. We also describe why these concepts exist so that developers can use them at their own discretion. + +## Expanding responses +Expanding responses allow us to include relational information on a resource without loading it by default. + +In general, endpoints should expose the fewest fields that will make the API usable in the general scenario. Doing one SQL request per API request is a good rule of thumb. To return information on a bounded relationship, endpoints should rely on the `expand` parameter. To return an unbounded relationship, it should be another endpoint. + +To take an example, let's talk about the projects list endpoint. A project belongs to an organizations but could be on multiple teams. + +By default, here's what the project endpoint should look like + +```json +GET /api/0/projects/{project_slug}/ +{ + "id": 5, + "name": "foo", + ... +} +``` + +To display information about a bounded relationship, a user should be able to use the `expand` parameter. This is generally only true for 1:1 relationships. + +```json +GET /api/0/projects/{project_slug}/?expand=organization +{ + "id": 5, + "name": "foo", + "organization": { + "slug": "bar", + "isEarlyAdopter": false, + ... + } + ... +} +``` + +For unbounded relationships, make a separate query. This allows the query to be paginated and reduces the risk of having an arbitrarily large payload. + +```json +GET /api/0/projects/{project_slug}/teams +[ + { + "id": 1, + "name": "Team 1", + "slug": "team1", + }, + { + "id": 2, + "name": "Team 2", + "slug": "team2", + } +] + +``` + +## Collapsing responses +Similar to expanding responses, an API endpoint can also collapse responses. When the `collapse` parameter is passed, the API should not return attributes that have been collapsed. + +To take an example, let's look at the project list endpoints again. A project gets events and hence, has a `stats` component, which conveys information about how many events were received for the project. Let's say we made the stats part of the endpoint public, along with the rest of the projects list endpoint. + +```json +GET /api/0/projects/{project_slug}/ +{ + "id": 5, + "name": "foo", + "stats": { + "24h": [ + [ + 1629064800, + 27 + ], + [ + 1629068400, + 24 + ], + ... + ] + } +} +``` + +The `collapse` parameter can be passed to not return stats information. + +```json +GET /api/0/projects/{project_slug}/?collapse=stats +{ + "id": 5, + "name": "foo", + ... +} +``` + +This is typically only needed if the endpoint is already public and we do not want to introduce a breaking change. Remember, if the endpoint is public and we remove an attribute, it is a breaking change. If you are iterating on an undocumented endpoint, return the minimal set of attributes and rely on the `expand` parameter to get more detailed information. + +## Paginating responses +>> APIs often need to provide collections of data, most commonly in the `List` standard method. However, collections can be arbitrarily sized, and tend to grow over time, increasing lookup time as well as the size of the responses being sent over the wire. This is why it's important for collections to be paginated. + +Paginating responses is a [standard practice for APIs](https://google.aip.dev/158), which Sentry follows. +We've seen an example of a `List` endpoint above; these endpoints have two tell-tale signs: +```json +GET /api/0/projects/{project_slug}/teams +[ + { + "id": 1, + "name": "Team 1", + "slug": "team1", + }, + { + "id": 2, + "name": "Team 2", + "slug": "team2", + } +] + +``` +1. The endpoint returns an array, or multiple, objects instead of just one. +2. The endpoint can sometimes end in a plural (s), but more importantly, it does __not__ end in an identifier (`*_slug`, or `*_id`). + +To paginate a response at Sentry, you can leverage the [`self.paginate`](https://github.com/getsentry/sentry/blob/24.2.0/src/sentry/api/base.py#L463-L476) method as part of your endpoint. +`self.paginate` is the standardized way we paginate at Sentry, and it helps us with unification of logging and monitoring. +You can find multiple [examples of this](https://github.com/getsentry/sentry/blob/24.2.0/src/sentry/api/endpoints/api_applications.py#L22-L33) in the code base. They'll look something like: +```python +def get(self, request: Request) -> Response: + queryset = ApiApplication.objects.filter( + owner_id=request.user.id, status=ApiApplicationStatus.active + ) + + return self.paginate( + request=request, + queryset=queryset, + order_by="name", + paginator_cls=OffsetPaginator, + on_results=lambda x: serialize(x, request.user), + ) +``` + +The example above uses an offset type paginator, but feel free to use whatever paginator type suits your endpoint needs. +There are some [existing types](https://github.com/getsentry/sentry/blob/24.2.0/src/sentry/api/paginator.py) that you can leverage out of the box, or you can extend the base class [`BasePaginator`](https://github.com/getsentry/sentry/blob/24.2.0/src/sentry/api/paginator.py#L48) and implement your own. diff --git a/develop-docs/api/design.mdx b/develop-docs/api/design.mdx new file mode 100644 index 0000000000000..8582c0e100e9e --- /dev/null +++ b/develop-docs/api/design.mdx @@ -0,0 +1,89 @@ +--- +title: "Designing a New API" +sidebar_order: 2 +--- +[Django REST framework](https://www.django-rest-framework.org/)(DRF) is a powerful and flexible toolkit +for building Web APIs. Sentry APIs are built using DRF. Here are some considerations to make when designing APIs at Sentry. + +## URL Patterns +The API's URL is what developers use to call the endpoint so it’s important that it is meaningful and clear. + +### Don't Exceed 3-level Nesting +Nested resources format like `api/0/organizations/{org}/projects/` is recommended over `api/0/projects/` for readability because it gives user an understanding of resource hierarchy. However, nesting can make URLs too long and hard to use. Sentry uses 3-level nesting as a hybrid solution. + +Here are some possible urls for values with this resource hierarchy: organization -> project -> tag -> value +- 👍 `/projects/{organization_slug}/{project_slug}/tags/{tag_id}/values` +- 👎 `/organizations/{organization_slug}/projects/{project_slug}/tags/{tag_id}/values/` +- 👎 `/values/` + +In the above example we flattened `projects`. The table below shows the existing flattened collections which works out with our existing APIs. + +| First collection in URL | When to use | Parent | Identifier | Example | +| --- | --------- | ----- | --- | --- | +| organizations | When the resource cannot be attached to any other collection below parent like Project | N/A - always comes as first collection | {organization_slug} | [Create a New Team](https://docs.sentry.io/api/teams/create-a-new-team/) | +| teams | When the resource is under a specific team in the hierarchy | organizations | {organization_slug}/ {team_slug} | [Retreive Team Stats](https://docs.sentry.io/api/teams/retrieve-event-counts-for-a-team/) | +| projects | When the resource is under a specific project in the hierarchy but not under an issue | organizations | {organization_slug}/ {project_slug} | [Create a New Client Key](https://docs.sentry.io/api/projects/create-a-new-client-key/)| +| issues | When the resource is under a specific issue in the hierarchy | projects | {issue_id} | [List an Issue's Events](https://docs.sentry.io/api/events/list-an-issues-events/)| +| sentry-app-installations | When the resource is mapped to a specific integration | organizations | {integration_slug} | [Delete an External Issue](https://docs.sentry.io/api/integration/delete-an-external-issue/)| + +Here are some additional examples: +- 👍 `/organizations/{organization_slug}/projects/` +- 👍 `/projects/{organization_slug}/{project_slug}/issues/` +- 👎 `/projects/` + + + +Hierarchy here does not necessarily mean that one collection belongs to a parent collection. For example: +- `projects/{project_identifier}/teams/` refers to the **teams** that have been added to specific project +- `teams/{team_identifier}/projects/` refers to the **projects** a specific team has been added to + + + +### Naming Guidance +- Collection names should be lowercase and hyphenated, e.g. `commit-files`. +- Collection names must be plural. Avoid using uncountable words because the user can’t know whether the GET returns one item or a list. +- Query params and body params should be `camelBacked`. eg. `userId` or `dateCreated`. +- For sorting and filtering, stick with the common param names: `sortBy` (e.g. `sortBy=-dateCreated`), `orderBy` (either `asc` or `desc`), `groupBy`, `limit` +- Path params should be `snake_case` + +## Functionality +Each API should be **stateless**, have a clear purpose, and do one specific thing. To achieve that, stick with the standard methods listed below. If your API needs to be more complicated, work with [owners-api](https://github.com/orgs/getsentry/teams/owners-api) on how to create it. + +- 👍 An API that updates project settings: PATCH for updating a field or PUT for updating settings +- 👎 An API that creates a project, creates a team, and creates alerts for that team about that project + +| Functionality | HTTP Method | Response Object | Example | +| --- | ---- | ---- | ---- | +| Create | POST | Serialized created resource | [Create a Project](https://github.com/getsentry/sentry/blob/756bda4419cfaf28b2e351278a5c4c1665082eba/src/sentry/api/endpoints/team_projects.py#L156) | +| Update | PUT or PATCH | Serialized updated resource | [Update Project Settings](https://github.com/getsentry/sentry/blob/756bda4419cfaf28b2e351278a5c4c1665082eba/src/sentry/api/endpoints/project_details.py#L474) | +| Get | GET | Serialized single resource | [Retrieve a Project](https://github.com/getsentry/sentry/blob/756bda4419cfaf28b2e351278a5c4c1665082eba/src/sentry/api/endpoints/project_details.py#L415) | +| Delete | DELETE | None | [Delete a Project](https://github.com/getsentry/sentry/blob/756bda4419cfaf28b2e351278a5c4c1665082eba/src/sentry/api/endpoints/project_details.py#L840) | +| List | GET | List of multiple serialized resources | [List All the Projects in an Organization](https://github.com/getsentry/sentry/blob/756bda4419cfaf28b2e351278a5c4c1665082eba/src/sentry/api/endpoints/organization_projects.py#L49) | +| Batch Get | GET | List of serialized resources | Get project details for specific project ids | +| Batch Create | POST | List of serialized created resources | Create multiple projects with the same settings | +| Batch Update | PUT | List of serialized updated resources | [Update a list of issues](https://github.com/getsentry/sentry/blob/ea14f740c78b8df68281ffad6a3bf3709ed3d5b5/src/sentry/api/endpoints/organization_group_index.py#L379) | +| Batch Delete | DELETE | None | [Delete a list of issues](https://github.com/getsentry/sentry/blob/ea14f740c78b8df68281ffad6a3bf3709ed3d5b5/src/sentry/api/endpoints/organization_group_index.py#L467)| + +Here are some examples of how to use standard methods to represent complex tasks: +- Get count of a resource + - Count is part of the `List` API and is provided in header X-Total-Count param +- Get latest of a resource + - Order and filtering should happen as part of list api query parameters. Here’s a [good read](https://www.moesif.com/blog/technical/api-design/REST-API-Design-Filtering-Sorting-and-Pagination/). + - 👎 `/api/0/issues/{issue_id}/events/latest/` + - 👍 `/api/0/issues/{issue_id}/events?orderBy=-date,limit=1`, `-` for descending + +### Batch vs. Single Resource Methods +Here are some notes that can help you decide between similar methods. We use *Get* here as an example but the same applies to all the other methods in the parenthesis. +- **Get (Update, Delete)**: Use get on the `{resource}DetailsEndpoint` to retrieve a resource. For example, `ProjectDetailsEndpoint`. +- **List (Create, Batch Create, Batch Update, Batch Delete)**: Use get on the `{resource-parent}{resource}Endpoint` to retreive all resources that belong to that parent. For example `TeamProjectsEndpoint`. +- **Batch Get (Batch Create, Batch Update, Batch Delete)**: Use get on the `{root-parent}{resource}Endpoint`. The difference between `Batch` and `List` is that batch usually includes a list of `ids` as query parameter and returns details about those ids. This list does not necessarily belong to one parent. For example, we can't retrieve two projects that belong to two different teams in the above example and in that case we use the get method in the root resource, in this case `OrganizationProjectsEndpoint`. + +## Response Object +Each response object returned from an API should be a serialized version of the Django model associated with the resource. You can see all the existing serializers [here](https://github.com/getsentry/sentry/tree/master/src/sentry/api/serializers/models). + + + + Some models might have different serializers based on use case. For example, `Project` can be serialized into `DetailedProjectSerializer` or `ProjectSerializer`. Decide which one to use based on your use case and API scope but **DO NOT RETURN CUSTOM OBJECTS** like {`slug: project_slug, platform: project_platform`}. We want the API responses to be uniform and useable in multiple automations without adding extra complication to the external developers' code. + + + diff --git a/develop-docs/api/index.mdx b/develop-docs/api/index.mdx new file mode 100644 index 0000000000000..23e25ff06e558 --- /dev/null +++ b/develop-docs/api/index.mdx @@ -0,0 +1,10 @@ +--- +title: API +--- + +As a developer-facing company it’s critical for us to have simple, intuitive, and consistent APIs that our users can call from their dev environment to accomplish key tasks without going to the UI. If you’re creating or editing an endpoint, this doc will help you achieve Sentry standards. +- [REST API Basics](/api/basics/) +- [Designing a New API](/api/design/) +- [API Concepts](/api/concepts/) +- [Publishing an API](/api/public/) +- [Public API Checklist](/api/checklist/) diff --git a/develop-docs/api/public.mdx b/develop-docs/api/public.mdx new file mode 100644 index 0000000000000..3a9bbe4c14493 --- /dev/null +++ b/develop-docs/api/public.mdx @@ -0,0 +1,449 @@ +--- +title: 'Publishing an API' +sidebar_order: 4 +--- + +## Introduction + +In this document, we explain when and how to publish API documentation to public Sentry docs. + +## API Versioning + +Currently, Sentry's API is **`v0`** and considered to be in draft phase. While we don't expect our +public endpoints to change greatly, keep in mind that our API is still under development. + +## Public APIs + +A public API is an API that is stable and well documented at [https://docs.sentry.io/api/](https://docs.sentry.io/api/). +When you make an endpoint public, you're committing to always returning at least the set of defined +attributes when that endpoint is called. As a result, the attribute(s) **cannot be removed** from an +endpoint after they are public. Additionally, the type of the attribute **cannot be changed** once +the attribute is public. The reason the attribute and its type cannot be changed is because both +external and managed clients are relying on the public attribute to be returned from that endpoint. + +You **can** add attributes to a public endpoint. + +## When to make an API public? + +Should I publish my API? The answer in most cases is **Yes**. As a developer facing company it is critical to +make Sentry features available for developers to use in their various automations. Do not try to predict if +an automation finds the API useful because we can't predict all the automations our clients might want to +build. With that being said, there are 3 exceptions where we don't publish APIs: + +1. **Experimental APIs:** APIs that are either in development or undergoing frequent iterations, + which may result in breaking changes. These are expected to be published or removed after a while. +2. **Private APIs:** APIs that control UI features in Sentry, such as the visibility of a button. +3. **APIs in getsentry:** If you're a Sentry engineer building APIs in the getsentry repo, they will + not be published. This is the expected behavior for all the getsentry APIs, but if you have a case + that you think it should be published let [owners-api](https://github.com/orgs/getsentry/teams/owners-api) + know and we can help with that. + + + +Always keep in mind that your APIs may still be used even if they're `private`. All the APIs +in the Sentry repo are accessible due to our open-source nature. + + + +### Before Publishing + +Remember, making an API public means that you can only add to it: You cannot make _any_ breaking +changes. + +As a guide, use these questions: + +1. Is the feature for which you're making the public endpoint stable? +2. Will the API change substantially in the future? + +If your answers are Yes and No, you're in business - make the endpoint public. Head over to the +[public API checklist](/api/checklist/) and ensure that your endpoint conforms to the checklist. + + +## Publishing an API + + +We use [Open API Spec 3](https://swagger.io/docs/specification/about/) and the +[drf-spectacular](https://drf-spectacular.readthedocs.io/en/latest/readme.html) library to document +our APIs. + +**API Documentation consists of**: + +1. Owner +2. Publish Status +3. Sidebar Tab +4. Endpoint Description +5. Method Decorator + - Title + - Path and Query Parameters + - Body Parameters + - Success Responses + - Error Responses + - Sample Response Body + +We utilize the drf-spectacular's +[`extend_schema`](https://drf-spectacular.readthedocs.io/en/latest/drf_spectacular.html#drf_spectacular.utils.extend_schema) +decorator to perform the documentation. The sections below outline each step to use this decorator. + +### 1. Owner + +Specify an owner for the endpoint. This would be the team at Sentry responsible for maintaining +and owning the endpoint. You can see the full list of teams +[here](https://github.com/getsentry/sentry/blob/master/src/sentry/api/api_owners.py). +```python +class OrganizationTeamsEndpoint(...): + owner = ApiOwner.ENTERPRISE +``` + +### 2. Publish Status + +Declare endpoint methods public by setting their publish_status to `PUBLIC`. + +```python +class OrganizationTeamsEndpoint(...): + owner = ApiOwner.ENTERPRISE + publish_status = { + 'GET': ApiPublishStatus.PUBLIC, + 'POST': ApiPublishStatus.PUBLIC, + 'PUT': ApiPublishStatus.EXPERIMENTAL, + } +``` + +### 3. Sidebar Tab + +Specify the endpoint's sidebar tab by using the `extend_schema` decorator on +the endpoint class. You can see the current list of tags or add tags +[here](https://github.com/getsentry/sentry/blob/master/src/sentry/apidocs/build.py). In the example +below the [endpoint](https://docs.sentry.io/api/teams/create-a-new-team/) is tagged in the `Teams` sidebar tab. + +```python +from drf_spectacular.utils import extend_schema + +@extend_schema(tags=["Teams"]) +class OrganizationTeamsEndpoint(...): + owner = ApiOwner.ENTERPRISE + publish_status = { + 'GET': ApiPublishStatus.PUBLIC, + 'POST': ApiPublishStatus.PUBLIC, + 'PUT': ApiPublishStatus.EXPERIMENTAL, + } +``` +### 4. Endpoint Description + +Specify the endpoint description in the documentation using the endpoint's docstring. +Please include a description for all resources if they are not immediately clear. +For example, `team` and `organization` don't require a description, but other vague terms like +`external-issue` or `integration` would require a description. + +```python +def post(self, request, organization, **kwargs): + """ + Create a new team bound to an organization. + """ +``` +### 5. Method Decorator + +We utilize another `@extend_schema` decorator on the endpoint method to perform the majority of +the documentation. The code below provides an example of a fully documented POST API. + +```python +@extend_schema(tags=["Teams"]) +class OrganizationTeamsEndpoint(...): + owner = ApiOwner.ENTERPRISE + publish_status = { + 'GET': ApiPublishStatus.PUBLIC, + 'POST': ApiPublishStatus.PUBLIC, + 'PUT': ApiPublishStatus.EXPERIMENTAL, + } + + @extend_schema( + operation_id="Create a New Team", + parameters=[GlobalParams.ORG_SLUG], + request=TeamPostSerializer, + responses={ + 201: TeamSerializer, + 400: RESPONSE_BAD_REQUEST, + 403: RESPONSE_FORBIDDEN, + 404: OpenApiResponse(description="A team with this slug already exists."), + }, + examples=TeamExamples.CREATE_TEAM, + ) + def post(self, request, organization, **kwargs): + """ + Create a new team bound to an organization. + """ +``` + +Here's description of each argument in the decorator: + +- **operation_id:** will be shown as the title of the endpoint's page in the documentation. + +- **parameters:** is a list of path and query parameters. + - You can find existing parameters in this + [file](https://github.com/getsentry/sentry/blob/master/src/sentry/apidocs/parameters.py). Note + that the `description` field in `OpenApiParameter` populates the parameter's description in the + documentation. + - DRF serializers conveniently translate into parameters. See + [here](https://github.com/getsentry/sentry/blob/0b9b6563bc4e232334cfc71f136a49117171d13f/src/sentry/api/endpoints/organization_stats_v2.py#L145) + for an example of this. + - Note that the ordering of the list determines the order of the parameters in the documentation. + Required parameters will automatically be placed at the top of the section. + +- **request:** is a serializer that generates body parameters. + - It can generally just be the DRF serializer of the endpoint itself. + - The `help_text` argument of each serializer field populates the body + parameter's description in the documentation. + - For nested serializers, use the class docstring instead to populate the parameter's description. + See [here](https://github.com/getsentry/sentry/blob/338916947b1a316bf908d048b9feb27dceced351/src/sentry/api/serializers/rest_framework/project_key.py#L9-L24) + for an example of this. + - To exclude certain body parameters, you can pass `exclude_fields` to the + [extend_schema_serializer](https://drf-spectacular.readthedocs.io/en/latest/drf_spectacular.html#drf_spectacular.utils.extend_schema_serializer) + decorator. See [here](https://github.com/getsentry/sentry/blob/5573be76bcbbf8cc8f5d36c72bea57c6ca207122/src/sentry/api/endpoints/organization_teams.py#L51) + for an example of this. + - The ordering of the serializer fields determines the order of the body + parameters. Required parameters will automatically be placed at the top of + the section. + - Note that for Model serializers, the ordering is determined by the ordering + of `fields` array within the Meta class. See [here](https://github.com/getsentry/sentry/pull/57884) + for an example of this. + - If you need more customization or the endpoint doesn't use a serializer, you can use an + [inline_serializer](https://drf-spectacular.readthedocs.io/en/latest/drf_spectacular.html#drf_spectacular.utils.inline_serializer) + to create a one-off serializer. See [here](https://github.com/getsentry/sentry/blob/6cc2769e13a09414175005bc967bd0e1db8c81bc/src/sentry/api/endpoints/project_key_details.py#L75-L98) + for an example of this. + - For custom serializer fields, you must explicitly type them using the + [extend_schema_field](https://drf-spectacular.readthedocs.io/en/latest/drf_spectacular.html#drf_spectacular.utils.extend_schema_field) + decorator. See [here](https://github.com/getsentry/sentry/blob/236cc90daa1db5fcedade1db2b0bd7bb1ce9bdcd/src/sentry/api/serializers/rest_framework/project.py#L9-L10) + for an example of this. + - If there is no request body, you can omit the `request` field entirely. + +- **responses:** include all possible success and failure HTTP response cases. You can learn more about them in the next section. + +- **examples:** specify the endpoint's sample response. We keep all our examples in this +[folder](https://github.com/getsentry/sentry/tree/master/src/sentry/apidocs/examples) sorted by +sidebar tags. + - Note that the statement `response_only=True` is required for all examples. + - We currently only show one response example per endpoint but plan to support + this in the future, so feel free to add multiple examples here. + + ```python + @extend_schema( + ... + examples=TeamExamples.CREATE_TEAM, + ) + + from drf_spectacular.types import OpenApiExample + + class TeamExamples: + CREATE_TEAM = [ + OpenApiExample( + # description of example, not used for anything + "Create a new team", + # actual response body + value={"slug": "my-team", "name": "My Team"}, + # the status code(s) this example applies to + status_codes=["201"], + # You MUST INCLUDE this for all examples + response_only=True, + ) + ] + ``` + +#### Success Responses + +Specify the return type of success responses, which are used to generate the response schema and +validate any included example responses. There are three ways to do this: + +1. If the success response is a single object instead of a list, you can pass a DRF serializer as + the response. In order for this serializer to generate a schema, its `serialize` method must be + typed to return a [TypedDict](https://peps.python.org/pep-0589/)`*` + + For example, this [sample + code](https://github.com/getsentry/sentry/blob/0b9b6563bc4e232334cfc71f136a49117171d13f/src/sentry/scim/endpoints/members.py#L202) + has the `200` status code returning an `OrganizationMemberSCIMSerializer`. It's + [`serialize`](https://github.com/getsentry/sentry/blob/0b9b6563bc4e232334cfc71f136a49117171d13f/src/sentry/api/serializers/models/organization_member/scim.py#L14-L16) + method is typed to return an + [`OrganizationMemberSCIMSerializerResponse`](https://github.com/getsentry/sentry/blob/0b9b6563bc4e232334cfc71f136a49117171d13f/src/sentry/api/serializers/models/organization_member/response.py#L60) + TypedDict which specifies the typing of the response body. + + In order to indicate optional fields that may or may not be set in the response, you must use + [totality](https://peps.python.org/pep-0589/#totality) and separate your TypedDict into + two classes. The first class should only contain optional fields and inherit from a TypedDict + with `total=False` in it's class header. The second class should only contain required fields, + and inherit from the first class. See + [here](https://github.com/getsentry/sentry/blob/4404deda8376bf1a011b76964b45b58671c2a33b/src/sentry/monitors/serializers.py#L35-L48) + for an example of this. + + For fields that may be null, use `Optional` followed by the field's type. Note + that a field may be both optional and null. + ```python + class ExampleResponse(TypedDict): + potentiallyNullStringField: Optional[str] + ``` + + `*`Similar to the request serializer, use the + [extend_schema_serializer](https://drf-spectacular.readthedocs.io/en/latest/drf_spectacular.html#drf_spectacular.utils.extend_schema_serializer) + directly on the TypedDict to exclude any private fields from the response schema. + See [here](https://github.com/getsentry/sentry/blob/4828c60fd7f9fce5e32769054b9a7b10dc795aaa/src/sentry/replays/post_process.py#L43-L44) + for an example of this. + +2. To return a list of objects or for more customization, use the + [inline_sentry_response_serializer](https://github.com/getsentry/sentry/blob/aa61724035370a47fdc112c14d1467be2609d9a2/src/sentry/apidocs/utils.py#L24). + Note that the name of each one must be unique or the docs will fail testing. + + ```python + from sentry.apidocs.utils import inline_sentry_response_serializer + + @extend_schema( + responses={ + 200: inline_sentry_response_serializer( + "ListOrgTeamResponse", List[TeamSerializerResponse] + ), + } + ) + ``` + + Note that we HIGHLY discourage returning a single TypedDict object if your + endpoint utilizes an existing serializer and returns a single object. Instead, + please use the first method as this ensures the documentation stays up-to-date as the + endpoint evolves. + +3. You can also provide OpenAPI JSON if you are running into issues, although we recommend avoiding + this if possible. + +#### Error Responses + +Specify error responses using the existing `OpenApiResponse` constants in this +[file](https://github.com/getsentry/sentry/blob/master/src/sentry/apidocs/constants.py). You can +also define your own for more detailed messages like the example below. + +```python +responses={ + 201: TeamSerializer, + 400: RESPONSE_BAD_REQUEST, + 403: RESPONSE_FORBIDDEN, + 404: OpenApiResponse(description="A team with this slug already exists."), +}, +``` + +### Private Attributes in a Public Endpoint + +You can have private attributes within a public endpoint. + +As an example: +[https://docs.sentry.io/api/teams/retrieve-a-team/](https://docs.sentry.io/api/teams/retrieve-a-team/). +The response has multiple attributes: + +```json +{ + "id": "2", + "slug": "the-interstellar-jurisdiction" + ... +} +``` + +Let's say we also return a `nickname` for a team. If the data returned is not documented as a +response in our public documentation, that attribute is a private attribute within a public +endpoint. + + +### Caveats + +- **If the endpoint you're modifying had previous JSON documentation, you must delete the +old documentation path in [this +file](https://github.com/getsentry/sentry/blob/master/api-docs/openapi.json) +and its corresponding JSON build in [this +folder](https://github.com/getsentry/sentry/blob/master/api-docs/paths).** + +- **Additionally if there are multiple request types in the same endpoint using the old JSON +documentation, you must update both of them in the same PR. Updating only one request and +deleting the old documentation will cause all other requests to disappear from the docs.** + +### Building and Testing Locally + +**Commands**: + +- `make test-api-docs` builds the OpenAPI JSON, validates the schema for all examples, and runs all + API docs tests. +- `make build-api-docs` builds the OpenAPI JSON. The build will fail if there are any warnings. +- `make diff-api-docs` produces a diff of your local OpenAPI JSON with production. +- `make watch-api-docs` automatically rebuilds the OpenAPI JSON on change. + +**To see your changes in the docs locally**: + +In `sentry`: + +1. Use `make watch-api-docs` to continuously build the intermediate asset + `tests/apidocs/openapi-derefed.json` locally. +2. Copy the full path to `{YOUR_SYSTEM_FOLDER}/tests/apidocs/openapi-derefed.json`, + + e.g. `/Users/yourname/code/sentry/tests/apidocs/openapi-derefed.json`. + +In `sentry-docs`: + +1. Run `OPENAPI_LOCAL_PATH= DISABLE_THUMBNAILS=1 yarn dev` and substitute + `` with the path to your local openapi-derefed.json. + + Unfortunately changes do not automatically reflect in your local server, so you will need to + rerun this command on every change. We hope to add this feature in the future. + +See [here](https://docs.sentry.io/contributing/environment/) for detailed doc build instructions. + +When you open the pull request, please add a screenshot of the page or pages you're adding. + +### Build Process + +The openapi-diff test will fail when CI runs on your pull request, this is expected and meant to +highlight the diff. It is not required to merge. + +Once you make changes to an endpoint and merge the change into Sentry, a series of GitHub Actions +will be triggered to make your changes automatically go live: + +1. The [openapi workflow in + sentry](https://github.com/getsentry/sentry/blob/master/.github/workflows/openapi.yml) updates + the schema in [sentry-api-schema](https://github.com/getsentry/sentry-api-schema) with the + OpenAPI build artifact. +2. The [cascade-to-sentry-docs workflow in + sentry-api-schema](https://github.com/getsentry/sentry-api-schema/blob/main/.github/workflows/cascade-to-sentry-docs.yml) + reacts to the push to `main` in (1) by triggering the [bump-api-schema-sha workflow in + sentry-docs](https://github.com/getsentry/sentry-docs/blob/master/.github/workflows/bump-api-schema-sha.yml). +3. The [bump-api-schema-sha workflow in + sentry-docs](https://github.com/getsentry/sentry-docs/blob/master/.github/workflows/bump-api-schema-sha.yml) + fetches the latest commit SHA from `sentry-api-schema` and writes it into the correct file, then + makes and merges a PR in `sentry-docs`, which kicks off a deploy via Vercel to + https://docs.sentry.io/api/. + +## Requesting an API to be public + +Are you a Sentry user who wants an endpoint to be public? + +Look at the [issues on sentry with the `Component: API` +label](https://github.com/getsentry/sentry/issues?q=is%3Aopen+is%3Aissue+label%3A%22Component%3A+API%22). +If a request has already been made to make the endpoint public, give it a thumbs up. If not, create +a [feature request](https://github.com/getsentry/sentry/issues/new?template=feature.yml) on Sentry +and add the `Component: API` label. + +The team responsible for the endpoint will review the stability of the endpoint. If the endpoint +will not have breaking changes in future, they can determine whether to make it public. + +### FAQs + +**When should an attribute be `required`?** + +An attribute is `required` if it will always be returned by the API. + +**What does it mean when a response doesn't have a schema?** + +Some endpoints have no response schema. This means that while the endpoint is public, the attributes +within that endpoint can change at any time. This is a relic from migrating the documentation from +our prior approach. Note that, going forward, we recommend new endpoints always provide response +schema. + +** Can customers use private endpoints?** + +Yes, if they wish. However private endpoints are liable to change without notice at any time, +potentially breaking the customer's code. Please be careful using them. + +**I have a question and it has not been answered.** + +No problem. Send us an [email](mailto:partners@sentry.io) so we can answer your question. + + diff --git a/develop-docs/architecture.mdx b/develop-docs/architecture.mdx new file mode 100644 index 0000000000000..df88ba006d2cb --- /dev/null +++ b/develop-docs/architecture.mdx @@ -0,0 +1,85 @@ +--- +title: Architecture +--- + +## High level overview + +Edges represent service dependencies. + +![](https://mermaid.ink/svg/pako:eNqFU01PwzAM_StRTiDGeu8BCbQbcKFc0DwhN_XWqs2H0kQwtv13smSj1SrgFj8_v9gvzo4LXRHP-caiqdnrAhQas3zT3rJ7Y7pGoGu0WrHb2zu2LxaPe9aVu92Txoo9YIdKkD0cQHVlYgDvSTm7ndMnStPRXGiZoWkygOom6522lAHfM0sdbv8uO9IS-v5B5RJ4EQN2FaJr4CtQ_VaWOraobdQZ6KDiDRFtcd3iGLBUNX0o_2GnYuVLnKCSpEBRUzXJGN27jaWp0Fn-qDduYAAmDBGmaGvtewIV2RH90rolMmSnvZ5CbVuyY2sicHJnzPltliF5YcqQmM455MYvAApUnCM5yYDXzpk-z7JN42pfxlXYkEsqWbKbB1oR6QstvAypuG_Az1rp1f7RStt01HqJ9AstPuOSrMSmCou-A8UC0dUkCXgejhWt0XfueOchUNE7XWyV4LmznmbcmwodLRoMX0TyfI1dH9DgVpj5OX2e-IcO3z0YL2M) +[diagram source](https://mermaid.live/edit#pako:eNqFU01PwzAM_StRTiDGeu8BCbQbcKFc0DwhN_XWqs2H0kQwtv13smSj1SrgFj8_v9gvzo4LXRHP-caiqdnrAhQas3zT3rJ7Y7pGoGu0WrHb2zu2LxaPe9aVu92Txoo9YIdKkD0cQHVlYgDvSTm7ndMnStPRXGiZoWkygOom6522lAHfM0sdbv8uO9IS-v5B5RJ4EQN2FaJr4CtQ_VaWOraobdQZ6KDiDRFtcd3iGLBUNX0o_2GnYuVLnKCSpEBRUzXJGN27jaWp0Fn-qDduYAAmDBGmaGvtewIV2RH90rolMmSnvZ5CbVuyY2sicHJnzPltliF5YcqQmM455MYvAApUnCM5yYDXzpk-z7JN42pfxlXYkEsqWbKbB1oR6QstvAypuG_Az1rp1f7RStt01HqJ9AstPuOSrMSmCou-A8UC0dUkCXgejhWt0XfueOchUNE7XWyV4LmznmbcmwodLRoMX0TyfI1dH9DgVpj5OX2e-IcO3z0YL2M) + +## Event pipeline + +How an event gets saved. Edges represent data flow through system. + +This graph is extremely simplified mostly due to layout constraints. Missing from this graph: + +* How Relay fetches project configs. Answer: from sentry-web +* How Relay caches project configs. Answer: In memory, and in Redis +* How Relay counts events and keeps track of quotas. Answer: more Redis +* Symbolicator as auxilliary service to symbolicate-event +* How alerting is triggered. Answer: postprocess-event, a Celery task which is responsible for alerting (spawned by a Kafka consumer in Sentry reading from eventstream) +* Possibly more + +For more information read [Path of an event through Relay](https://getsentry.github.io/relay/relay_server/index.html#path-of-an-event-through-relay) and [Event Ingestion Pipeline](https://getsentry.github.io/event-ingestion-graph/). + +![](https://mermaid.ink/svg/pako:eNp9UsFugzAM_ZUop1Uq4o6mXtbLNO3UXSbSgwluQUCC4qQaovz7QqArbdWdEtvPz_azey51jjzhRwNtwb62QgkFbZt-a2eY_9SlBFtqtWdRtGFnQpUTkwaoQDqzOut7dSzVzzAIVWcTJoa2jFVMVhuMz8xgDZ1Q4QmACg4VpC_v6ohk2cdorfZCBXcAlCESSa3INWhSwXeorOnmALsEBPdpd-BA0BpsjZZIFOHJp45DkcumGSXWaLor6WQzC1RRYGQP-YGU4IQXuieQu6LPiLom00HXBd-D8xnh_21drSmoXAbRLLifeDQnxV8zs3kJQLIGoVkJPi7Br_dGrECQzpmzOgvSRZHltkKZi4Nu0m72JP3AVaEd4djd25-16IWvuUc3UOb-SPuRR3BbYOMxif_meABXW8GFGjwUnNW7TkmeWONwzV2be0G3JfhpGp4coCbvxbz0t_k5HX64_-EXvdQVKA) +[diagram source](https://mermaid.live/edit#pako:eNp9UsFugzAM_ZUop1Uq4o6mXtbLNO3UXSbSgwluQUCC4qQaovz7QqArbdWdEtvPz_azey51jjzhRwNtwb62QgkFbZt-a2eY_9SlBFtqtWdRtGFnQpUTkwaoQDqzOut7dSzVzzAIVWcTJoa2jFVMVhuMz8xgDZ1Q4QmACg4VpC_v6ohk2cdorfZCBXcAlCESSa3INWhSwXeorOnmALsEBPdpd-BA0BpsjZZIFOHJp45DkcumGSXWaLor6WQzC1RRYGQP-YGU4IQXuieQu6LPiLom00HXBd-D8xnh_21drSmoXAbRLLifeDQnxV8zs3kJQLIGoVkJPi7Br_dGrECQzpmzOgvSRZHltkKZi4Nu0m72JP3AVaEd4djd25-16IWvuUc3UOb-SPuRR3BbYOMxif_meABXW8GFGjwUnNW7TkmeWONwzV2be0G3JfhpGp4coCbvxbz0t_k5HX64_-EXvdQVKA) + +## Multi-Region + +Sentry's saas offering is a multi-region deployment of Sentry that is built from the same source code as self-hosted and single-tenant deployments. + +### The goals and constraints of our multi-region design + +Our multi-region architecture was driven by several design-goals. Several of these design goals have not been implemented yet, but could be in the future without requiring extensive rework. + +The high-level design goals were: + +- Create a solution that allowed for many isolated regions. Isolated regions are important because they allow us to address data-residency requirements in the EU and in other regions in the future as those requirements become relevant. Regions can contain as few as one tenant or hundreds of thousands. Regions may be scaled independently from one another. +- Our solution should afford the ability for customers to provide their own hardware for a region in the future. +- Create a solution that allows single-tenants to join up with saas to help reduce operational overhead and increase our ability to deliver updates to single-tenants. +- Our solution would need to ‘scale down’ to self-hosted and development environments. It should offer the same functionality in all deployment environments. + +In addition to the goals we had several constraints: + +- Saas Customers will continue to login and signup at `sentry.io`. They will not need to log into each organization individually. +- Existing Saas Customers should not be required to update their DSN values. If a customer migrates to a new region, it is ok if they also need to update their DSNs. +- Generally API URLs should not be required to change. Existing customers can use `region.sentry.io` style URLs to improve latency and ensure data residency requirements are met, but they don’t need to. Customers in new regions may be required to use `region.sentry.io` style URLs. +- A single Sentry Organization has all of their data located within a single Storage Region. This requirement comes from the need to do cross-project operations in Business plans. +- The solution should support ~1000 regions. While these values are larger than we are likely to use it serves as a useful stress test for the solution. + + +### Silo Modes + +In order to offer customers data residency we needed to co-locate the bulk of an organization’s data together. The data our customers collect and send to Sentry needs to be located within the customer’s chosen region. While customer events need to be stored in the preferred region, there is also data in Sentry that is shared between organizations, and thus between regions. This data needs to be centrally stored outside of any region. + +These constraints led our design to having multiple 'silo modes'. + + + ☝ The term ‘silo’ was chosen as other applicable terms (region, zone, partition, cluster) are all taken. + + +Our architecture includes two silo modes: + +1. **Control Silo** Contains all of the globally shared data and services. This data is shared between organizations and thus between regions. +2. **Region Silo** Region silos contain organization data. Each region has independent infrastructure, and holds at least one organization. + +In addition to the siloed modes, there also exists a **Monolith** mode. In monolith mode both control and region silo modes can be accessed. Models, endpoints and tasks are silo-aware and should a siloed resource be used in the wrong silo context an error is raised. + +### Multi-region architecture + +![multi-region architecture](https://mermaid.ink/svg/pako:eNp1UctqwzAQ_JVFpxSSH_ChUOoUeiiEuD5JOajS2hbIktGjJYT8e1dumjotvS2zs_NgT0x5jaxinfUfapAhwWstnHDZ8KfgXUKnoX0-wGZzDzmGsphnFa8j5lAuhIv5rQ9yGgoReNvAHnvj3UE4IEhOpmANuhSO8LC7ik49X9Fi52PqA8a7H_6FgJqvti1s3-kUaplkoVCwhaOKwB8pbvAWGmP9rKGK5RL9Y66K9zdjGeBWnRoCpwSLPpiLOGG_JTEXTVrc9pn5F8J_fdiajRhGaTQ95FSuBEsDjihYRaPGTmabBBPuTFSZk2-OTrEqhYxrlictE9ZGUuaRVZ20kVDUJvnw8vXk-dfnT1dDqnk) +[diagram source](https://mermaid.live/edit#pako:eNp1UctqwzAQ_JVFpxSSH_ChUOoUeiiEuD5JOajS2hbIktGjJYT8e1dumjotvS2zs_NgT0x5jaxinfUfapAhwWstnHDZ8KfgXUKnoX0-wGZzDzmGsphnFa8j5lAuhIv5rQ9yGgoReNvAHnvj3UE4IEhOpmANuhSO8LC7ik49X9Fi52PqA8a7H_6FgJqvti1s3-kUaplkoVCwhaOKwB8pbvAWGmP9rKGK5RL9Y66K9zdjGeBWnRoCpwSLPpiLOGG_JTEXTVrc9pn5F8J_fdiajRhGaTQ95FSuBEsDjihYRaPGTmabBBPuTFSZk2-OTrEqhYxrlictE9ZGUuaRVZ20kVDUJvnw8vXk-dfnT1dDqnk) + +Each region silo can be scaled independently, and is isolated from other regions. Within each region exists separate, dedicated infrastructure and applications as outlined in the [application overview](/architecture/#high-level-overview). + +### Silo Configuration + +Silo modes are defined as environment variables (and django settings). The relevant settings are: + +- `SILO_MODE` either `CONTROL`, `REGION` , `MONOLITH` or None. +- `SENTRY_REGION` The name of the region the application is in (eg. `us` or `de`) or `None`. +- `SENTRY_REGION_CONFIG` A list of regions that the application has. This list defines names, URLs, and category (multiregion, singletenant) that a region operates as. +- `SENTRY_MONOLITH_REGION` The region name of the former ‘monolith’ region. For [sentry.io](http://sentry.io) this is the `us` region. The monolith region has special behavior that is required for backwards compatibility. diff --git a/develop-docs/backend/control-silo.mdx b/develop-docs/backend/control-silo.mdx new file mode 100644 index 0000000000000..6015305bda0e1 --- /dev/null +++ b/develop-docs/backend/control-silo.mdx @@ -0,0 +1,136 @@ +--- +title: "Control Silo" +--- + +Within the Control Silo are features that allow us to provide backwards compatibility for both customer API usage, and integrations + +# API Gateway + +Prior to our multi-region offering, all API traffic from customers would be sent to [sentry.io](http://sentry.io) . We need to continue supporting these URLs despite some customers residing in the `us` region and others in the `de` region. The API Gateway functionality of control silo will introspect requests and proxy requests to the appropriate region based on path parameters in the request. Requests can be routed via: + +- Organization slug or id +- Sentry App Installation UUID +- Sentry App Slug +- DSN host +- A static list of URL names (region pin list) + + +💡 Requests going to sentry.io do not meet data residency requirements. Customers will need to use region domains in order to not have requests go to the US. + + +The API Gateway is implemented as a middleware that only takes action when Sentry is running as Control Silo. After matching URLs, the middleware checks the silo mode of the matched endpoint. If the matched endpoint is a **region** endpoint, the API Gateway attempts to route the request to a region based on a few heuristics. + +### Slug, id, and UUID based routing + +For requests that use `organization_slug`, `organization_id_or_slug`, `sentry_app_slug` or `uuid` parameters the API Gateway can resolve the organization’s region using data present in control silo. Once the region is resolved requests are synchronously proxied to the appropriate region. + +### DSN host + +The legacy error-embed page requires specialized handling, and we extract the +destination from the `dsn` query string parameter. + +### Region Pin List + +We have a handful of API endpoints that lack useful hints to route requests between regions. When a request is made to one of these paths we assume the intent is to send a request to the `us` region. + +### Direct Location Header + +When a request is handled by the API Gateway, the response delivered to the client will contain `X-Sentry-Proxy-Url` header which contains the URL that clients should use to avoid the additional latency incurred by the API Gateway. + +# Integration Credential Proxy + +Integrations can be shared by multiple organizations. For example, a single MS teams workspace can be connected to multiple organizations, and those organizations can live in multiple regions. A few of our integrations leverage refresh tokens and use short lived access tokens. To avoid racy updates when tokens become invalidated and need to be refreshed, we send outbound requests to control silo, which acts as a proxy for the integration service. By funnelling all outbound traffic for integrations that require refresh tokens through control silo we can more easily co-ordinate token replacements. + +![credential-proxy-flowchart](https://mermaid.ink/svg/pako:eNqdk01uwjAQha8y9RoukAWbdsumdOnNKH4Bq_GY2hNBhbh7DUnURJFA7c4ave_Nry-mjg6mMhlfHaTGm-d94mDFCtcaE71j76PsfBvHyGsUTbGdhrb5AxzyjfrV03qzmeAVZYgjn3MH4hZJF-KJMy3lpJFC1j7RVLpE28gFFUXpRUsGClB2rPwMZDfn6gQHUc9tvqVPtyllXboMAxirlqi-8XXvMUB08nqYGloZqfWyEI2fEML56Avw8ihhQpOQDz3x0LPIjlEyng2hO5ZRgXLZLNzo-5eO750KTv-vaX46M51ZmYAU2LtythcrRNboAQHWVOXp0HDXqjVWrkXKncbdt9Sm0tRhZfrehis3VVP2UKJwvjS77b_C_UdcfwDQFR3U) +[diagram source](https://mermaid.live/edit#pako:eNqdk01uwjAQha8y9RoukAWbdsumdOnNKH4Bq_GY2hNBhbh7DUnURJFA7c4ave_Nry-mjg6mMhlfHaTGm-d94mDFCtcaE71j76PsfBvHyGsUTbGdhrb5AxzyjfrV03qzmeAVZYgjn3MH4hZJF-KJMy3lpJFC1j7RVLpE28gFFUXpRUsGClB2rPwMZDfn6gQHUc9tvqVPtyllXboMAxirlqi-8XXvMUB08nqYGloZqfWyEI2fEML56Avw8ihhQpOQDz3x0LPIjlEyng2hO5ZRgXLZLNzo-5eO750KTv-vaX46M51ZmYAU2LtythcrRNboAQHWVOXp0HDXqjVWrkXKncbdt9Sm0tRhZfrehis3VVP2UKJwvjS77b_C_UdcfwDQFR3U) + +The integration proxy is implemented as a class that integrations requiring refresh tokens can sub-class. Currently the following integrations use the integration credential proxy: + +- GitHub +- GitLab +- MSTeams +- Azure DevOps Services (Visual Studio Team Services) + +### Integration Credential Proxy Headers + +The integration credential proxy leverages several specialized headers to perform its role: + +- `X-Sentry-Subnet-Organization-Integration` The id of the `OrganizationIntegration` that credentials should be added from. +- `X-Sentry-Subnet-Base-Url` defines the integration host and base URL that the request should be forwarded to. +- `X-Sentry-Subnet-Signature` An HMAC based request signature that is verified in control silo. +- `X-Sentry-Subnet-Path` The path that should be used in the forwarded request. +- `X-Sentry-Subnet-Keyid` For integrations that use a keyring, the id of the key to use. + +# Integration Webhook Forwarding + +Many of the 3rd-party services we integrate with only offer a single endpoint that they will submit webhooks to. Because all of our existing webhook traffic is being sent to `sentry.io` we need to receive webhook requests there. For webhooks that need to be handled in a region silo, we store the webhooks in postgres and then asynchronously deliver captured webhooks to the relevant regions. + +Capturing and replaying webhooks lets us solve a few availability issues: + +1. Many webhook deliveries have short timeouts and need rapid responses. +2. Integrations can be shared by organizations in multiple regions and synchronous forwarding risks hitting timeouts. +3. Synchronous forwarding of webhooks could lead to RPC worker exhaustion. + +### Webhook Storage + +Webhook payloads are stored in postgres. Webhook payloads are sorted into ‘mailboxes’ and messages within a mailbox are delivered in the order they are received. Generally mailboxes map to a single integration, but for higher volume integration customers we further partition messages into smaller grained shards to increase delivery throughput. + +The storage model for webhooks is similar to outboxes as webhooks were originally stored *in* outboxes. However, they needed to be broken out into separate storage as the transactional boundaries that outboxes require are not compatible with the longer request durations that webhook delivery entails. + +```sql +create table hybridcloud_webhookpayload ( + id bigserial primary key, + + -- mailbox_name would contain values like 'jira:12364' + mailbox_name char, + region_name char, + schedule_for datetime, + attempts int default 0, + + -- payload attributes + request_method char, + request_path char, + request_headers text, + request_body text, + + date_added datetime, + index(mailbox_name), + index(schedule_for), +) +``` + +### High-volume integration mailboxes + +For integration providers that have multiple high-volume customers we partition messages into more finely grained mailboxes. The partitioning strategy varies by integration, but the objective is to partition messages by remote resources such that operations to a single resource all resolve to the same mailbox. Some examples of good partitioning keys are, GitLab projectids, or Jira issue ids. + +### Webhook Scheduling + +With this storage model we use a pair of queries in a celery task that is spawned every 10 seconds to find the ‘next’ webhook in each mailbox. + +```sql +-- Get the 'first' message in each mailbox +select min(id), mailbox_name +from sentry_webhookpayload +group by mailbox_name + +-- Get any mailboxes with messages that can be delivered +select * from sentry_webhookpayload +where id in (...) and schedule_for < current_timestamp +``` + +For each mailbox with undelivered messages, we select a block of messages, and update their next schedule time to be a few minutes in the future. Doing this helps reduce the next scheduling task from attempting to deliver the same messages again. + +![mailbox-delivery-flowchart](https://mermaid.ink/svg/pako:eNqNUTtyAzEIvQpD7VxgCzdJm2pbNXjF2owl2EjIn_H47lFiJ5OkcTrgfWB4F5wsMg5Y-a2xTvwitC2UgwZdqLhMspA6PJt6sQSjJPvAfvbwtF7_IsAAs2iETJI2duIKR_EdZK6Vtlz_o07OBaz5X4-5eSsMkZMcuJwfW7UlkjMon_xbBS6ZYbYCvrtDm2TT_rHbuNBRwanuwQ1iIVGgrxuD4gozl97G_tFLUICAfUXmgEMvI8_UkgcMeu1Uam7jWSccvDRe4e3UewA4zJRqn3IUt_J6S-kzrOs7BGObcw) +[diagram source](https://mermaid.live/edit#pako:eNqNUTtyAzEIvQpD7VxgCzdJm2pbNXjF2owl2EjIn_H47lFiJ5OkcTrgfWB4F5wsMg5Y-a2xTvwitC2UgwZdqLhMspA6PJt6sQSjJPvAfvbwtF7_IsAAs2iETJI2duIKR_EdZK6Vtlz_o07OBaz5X4-5eSsMkZMcuJwfW7UlkjMon_xbBS6ZYbYCvrtDm2TT_rHbuNBRwanuwQ1iIVGgrxuD4gozl97G_tFLUICAfUXmgEMvI8_UkgcMeu1Uam7jWSccvDRe4e3UewA4zJRqn3IUt_J6S-kzrOs7BGObcw) + +### Webhook Delivery + +Scheduled webhooks are delivered by celery tasks. Delivery tasks attempt to forward requests to the relevant region silo. If delivery succeeds, the payload record is deleted from postgres. If the request fails and `attempts` is lower than the max attempts (10) the message is left alone, if the `attempts` is equal or higher than the max attempts, an log message is emitted and the webhook is discarded. + +Draining a mailbox involves sending as many messages from a given mailbox as we can. Should we hit a networking error, the current head of line will be rescheduled and have its attempt number increased. + +![delivery flow chart](https://mermaid.ink/svg/pako:eNqNUrFuAjEM_RUrM1SoapcbWIq6dYE1i0l8XETOviYOokL8e8NBK5CQYIkS-eW9Z_sdjBNPpjGZvguxo0XATcLe8oBJgwsDssKHsCaJsApRbitL2gThS8HyNRCm8_nNT2jgk9R10BF6WEdxW5AWegxxLXvLUWSAVhJhxfSUM27IMsAj0tBCJ7KFkAHXsqPKuAdUpX7QDJ4iKb3cJbpyX3kWFMOO0khWm1GMCrk4V52cnnADn94x8jqbTd9m-zP4keuzr38xipmAUpL0nNb7SQfgiemwS9RT3dVlJIDsIVF2HfkSaRRnb3k8LJuJ6SnVpfiaisNJwxrtKoM1Tb16arFEtcbysUKxqKx-2JlGU6GJKYNH_QuRaVqsbU0M-aCSvs5JGwN3_AUkRNw7) + +[diagram source](https://mermaid.live/edit#pako:eNqNUrFuAjEM_RUrM1SoapcbWIq6dYE1i0l8XETOviYOokL8e8NBK5CQYIkS-eW9Z_sdjBNPpjGZvguxo0XATcLe8oBJgwsDssKHsCaJsApRbitL2gThS8HyNRCm8_nNT2jgk9R10BF6WEdxW5AWegxxLXvLUWSAVhJhxfSUM27IMsAj0tBCJ7KFkAHXsqPKuAdUpX7QDJ4iKb3cJbpyX3kWFMOO0khWm1GMCrk4V52cnnADn94x8jqbTd9m-zP4keuzr38xipmAUpL0nNb7SQfgiemwS9RT3dVlJIDsIVF2HfkSaRRnb3k8LJuJ6SnVpfiaisNJwxrtKoM1Tb16arFEtcbysUKxqKx-2JlGU6GJKYNH_QuRaVqsbU0M-aCSvs5JGwN3_AUkRNw7) + +Notably, most 40x errors are considered 'successful' deliveries. The reason for this is that re-attempting a delivery of a webhook that initially had a 40x response will generally not result in a 200. diff --git a/develop-docs/backend/cross-region-replication.mdx b/develop-docs/backend/cross-region-replication.mdx new file mode 100644 index 0000000000000..b394ca1ecd928 --- /dev/null +++ b/develop-docs/backend/cross-region-replication.mdx @@ -0,0 +1,117 @@ +--- +title: "Cross Region Replication" +--- + +Our data-model includes many relations between users and the objects they create or interact with. Eventually users are deleted, and they need to be detached from the records they created, or those records need to be destroyed. Before sentry became a multi-region application, we relied on a mixture of Django callbacks, and postgres constraints to cascade deletions. However, in a multi-region state we arent't able to rely on these anymore as Users are in [Control Silo](/architecture/#silo-modes) and many of the objects they interact with are in the various Region Silos. + +## HybridCloud Foreign Keys + +Like `ForeignKey` the `HybridCloudForeignKey` field provides relational cascade semantics to the application. Usage of `HybridCloudForeignKey` is very similar to the standard `ForeignKey`: + +```python +@region_silo_model +class GroupHistory(Model): + organization = FlexibleForeignKey("sentry.Organization", db_constraint=False) + group = FlexibleForeignKey("sentry.Group", db_constraint=False) + project = FlexibleForeignKey("sentry.Project", db_constraint=False) + release = FlexibleForeignKey("sentry.Release", null=True, db_constraint=False) + + user_id = HybridCloudForeignKey(settings.AUTH_USER_MODEL, null=True, on_delete="SET_NULL") +``` + +`HybridCloudForeignKey` supports the same `on_delete` and `db_index` options that +`FlexibleForeignKey` does, however cascade operations are **eventually consistent** + +## Tombstones + +Tracking the absence of a record is complicated. It is far simpler to track the presence of another object. When a cross region resource (like a user) is deleted we create and propagate a marker of what once was aka a *Tombstone*. The presence of a tombstone is used to cleanup data that was previously related now deceased record. + +Tombstones have a few properties: + +- `id` a monotonic integer that increases as new tombstones are received. +- `table_name` The table the tombstone is for. +- `object_identifier` The record that has been removed. + +With these properties we can reconcile a record's removal with data removals in another region. + +### Tombstone Workflow + +User, Organization, and Membership deletions are the most common form of cross region deletions that leverage tombstones. Our implementation of tombstones is powered by [Outbox es](/backend/outboxes/) to reliably and eventually propagate changes to other regions. + +The flow for removing a user is + +![tombstone workflow diagram](https://mermaid.ink/svg/pako:eNplkTFrwzAQhf-K0BYcD203Q7O0ayHEdNMiSxdHIN2l0qk0hPz3ntMUYme703vf49A7a0cedKcLfFVAB-_Bjtkmg7YyYU0DZIOyOaasPsu0HW3m4MLRIqs3Qs4UVR8izZUdjIHwJhicUNVuNs0MUZ3yEIFB1UkP_vXp-cXgzCLQktlBou8b06hiZabKA_2oBKXYEZYRktFerxd4OD3Kq-b-XjFt6xBDOTzE3ruuoXOsn05hSkORz1va29XCvQNH6EKcIXqtE-Rkg5dWzgaVMpoPkMDoTkYPe1sjG23wItappf6ETnecK6x1PXrL_yXqbm9jkVfwQer7-Gv6WvjlF-fzsOs) +[diagram source](https://mermaid.live/edit#pako:eNplkTFrwzAQhf-K0BYcD203Q7O0ayHEdNMiSxdHIN2l0qk0hPz3ntMUYme703vf49A7a0cedKcLfFVAB-_Bjtkmg7YyYU0DZIOyOaasPsu0HW3m4MLRIqs3Qs4UVR8izZUdjIHwJhicUNVuNs0MUZ3yEIFB1UkP_vXp-cXgzCLQktlBou8b06hiZabKA_2oBKXYEZYRktFerxd4OD3Kq-b-XjFt6xBDOTzE3ruuoXOsn05hSkORz1va29XCvQNH6EKcIXqtE-Rkg5dWzgaVMpoPkMDoTkYPe1sjG23wItappf6ETnecK6x1PXrL_yXqbm9jkVfwQer7-Gv6WvjlF-fzsOs) + +In step 5 and 6 of the above diagram we reconcile the tombstone changes with the rest of the data in the region. Tombstones needs to be reconciled for each relation that the removed record had. For example, removing a user will: + +- Delete the user’s saved searches, dashboards, and more using the `CASCADE` action. +- Null out the user’s in alert rules, releases and more using the `SET NULL` action. + +### Tombstone Reconciliation + +For each model that has a `HybridCloudForeignKey` we have an arbitrary amount of data to process. Furthermore, due to eventual consistency it is possible for us to receive new records referencing a deleted entity after an entity has been deleted. + +To account for this inherit eventual consistency we maintain a set of ‘watermark’ for each relation. Watermarks can be incremented in batches and progress can be made on each relation in isolation. + +For each relationship that the application has to a tombstoned model we do the following: + +1. Get the current watermark for the relationship. A watermark consists of the last fully processed record + transaction identifier. +2. If the current tombstone still has rows referencing it, process another batch of deletions. Deletions can take the form of additional scheduled background deletion, or bulk update queries for the `SET NULL` action. +3. If the current tombstone has been fully processed, advance the tombstone watermark so that subsequent tombstones are processed. +4. Repeat steps 1-3 until all tombstones have been processed. + +### Adding new HybridCloudForeignKey usage + +If you're making a foreign key relationship to an existing model like `User` you don't need to do anything additional to have foreign key actions taken. If you're creating a new model that will be used in `HybridCloudForeignKey` relations, there are several steps that must be completed. + +1. Add a new `OutboxCategory` for your model. +2. Your model's `delete` method must save outbox messages in the same + transaction as the `delete` is performed. +3. You need to implement an 'outbox receiver' that uses `maybe_process_tombstone` to generate a tombstone record. + +With these tasks completed, the tombstone system will be able to cascade +delete operations to related models in other regions. + +## Replicated Models + +Replicated models allow models to be read locally in all regions where they are used. While reads can be done in all regions, writes must be done in the 'owning' region. Replicated models incur additional complexity and operational overhead and should be used sparingly. Replicated models are a good fit for: + +- Records with high read throughput +- Read operations where the overhead of RPC would cause unacceptable performance overhead. +- Records where we can accept **eventual consistency**, and don't have requirements for 'read after write' semantics. + +Replicated models are powered by [outboxes](/backend/outboxes/) and use outboxes to push state changes from the owning region into regions that are interested in the changes in record state. For organization scoped resources, this is often only the region that the Organization is located in. For User scoped regions, this is generally the regions that a user has membership in. + +### Adding a new replicated model + +Adding a replicated model rqeuires a number of changes to the 'source' model and creating a 'replica' model. The high-level process to add a replicated model is: + +1. Define the 'replica' model. This model should have similar or the same schema as the source model. Take care to preserve the source model id as a separate field in the schema so that updates and deletes can be performed on replicated data. +2. Add a new `OutboxCategory` for model updates, and add that `category` to the model class. +3. Add an RpcModel subclass for your model. The RpcModel is used to serialize the state between regions. +4. Add new an RPC method to `region_replica_service` or `control_replica_service` to handle replication for your model. Your replication RPC method should accept your `RpcModel` as a parameter and idempotently update the replica model. +5. Update the source model to inherit from `ReplicatedControlModel` or `ReplicatedRegionModel`. Choose a base class based on the silo mode of the 'source' model. +6. Implement the `payload_for_update`, `outboxes_for_update`, `handle_async_replication`, and `handle_async_deletion` methods as required. + +For an existing example of replicated models see `OrgAuthToken` and `OrgAuthTokenReplica`. + +### Backfilling replicated models + +After creating a replicated model and getting replication working you'll need a way to backfill all existing records from the source region into the regions where replicas are. The replicated model machinery includes tooling for running backfills, and re-running those backfills as schema evolves. Under the hood, replica backfills are completed by generating and processing outbox messages for each record that needs to be replicated. + +Before backfills can be performed on a replicated model the following needs to be done: + +1. Define an option following the pattern `outbox_replication.{table_name}.replication_version` in `sentry/hybridcloud/options.py`. Set the default value to `0`. +2. Set the option you created in step 1 to the default value with options-automator. + +With these steps completed you're ready to begin a backfill. Backfills are controlled by incrementing the option value to be equal to the `replication_version` defined on the model class. By default `replication_version` is set to `1`. Backfills will incrementally make progress in batches, as to not overwhelm outbox delivery. + +### Running backfills as schema evolves + +As the schema of your model and its replica evolve, you may need to re-run +backfills to synchronize state without having to wait for user action. + +1. Increment the `replication_version` attribute on the 'source' model. +2. Update the matching `outbox_replication` option value to start another + backfill. diff --git a/develop-docs/backend/cross-region-rpc.mdx b/develop-docs/backend/cross-region-rpc.mdx new file mode 100644 index 0000000000000..bf101c8390bf8 --- /dev/null +++ b/develop-docs/backend/cross-region-rpc.mdx @@ -0,0 +1,159 @@ +--- +title: "Cross Region RPC" +--- + +When Sentry is deployed in a multi-region deployment (like sentry.io) there are many scenarios and workflows where a region silo requires information that is stored in Control Silo. Similarly there are flows where Control Silo operations need to read or mutate state stored in regions. + +When data needs to be read, or mutated synchronously, we use Remote Procedure Calls (RPC). + +## Design Context + +Before we get into specifics, it is useful to understand the requirements that the system was designed under. We needed the following attributes in our solution: + +- Needs to have consistent parameter and return types in local development, CI and multi-region production. +- Serialization and transport concerns should be hidden from most application logic, as the transport depends on how the application is deployed. +- Strong support for Python typing +- Solid support for schema generation, and validation tooling. + +## Components of the RPC framework + +The cross-region RPC framework is composed of several RPC services, and RPC models. Each RPC service provides a bundle of methods for different product domains. For example, `organizations`, `users` and `integrations` are separate RPC services. + +Each RPC service is composed of a service interface and local implementation. + +```python +from sentry.services.hybrid_cloud.region import ByOrganizationSlug +from sentry.services.hybrid_cloud.rpc import RpcService, regional_rpc_method +from sentry.silo.base import SiloMode + +class OrganizationService(RpcService): + key = "organization" + local_mode = SiloMode.REGION + + @classmethod + def get_local_implementation(cls) -> RpcService: + from sentry.services.hybrid_cloud.organization.impl import ( + DatabaseBackedOrganizationService + ) + + return DatabaseBackedOrganizationService() + + @regional_rpc_method( + resolve=ByOrganizationSlug(), + return_none_if_mapping_not_found=True + ) + @abstractmethod + def get_org_by_slug( + self, + *, + slug: str, + user_id: int | None = None, + ) -> RpcOrganizationSummary | None: + ... +``` + +Service classes act as stubs that define the interface a service will have. In the above example, we see a few important pieces of the RPC machinery: + +- `key` defines the service name that is used in URLs and as a key when building method indexes. +- `local_mode` defines which silo mode this service uses it’s ‘local implementation’. Services can have local implementations in either silo mode. +- `get_local_implementation` is used by the RPC machinery to find the implementation service when the silo mode matches or is `MONOLITH`. + +RPC methods like `get_org_by_slug` must be defined as `abstractmethod` and must have either `rpc_method` or `regional_rpc_method` applied. If a method has `local_mode = REGION` it should use `regional_rpc_method` with a resolve ‘resolver’. There are several resolvers that accomodate a variety of method call signatures: + +- `ByOrganizationSlug` will extract the `organization_slug` parameter and use it to locate the region using `sentry_organizationmapping`. +- `ByOrganizationId` will extract the `organization_id` parameter and use it to locate the organization’s region using `sentry_organizationmapping` +- `ByRegionName` uses the `region_name` parameter to choose a destination region. +- `ByOrganizationIsAttribute(param)` Will extract `organization_id` from `param` to resolve the region. + +The implementation for `get_org_by_slug` looks like: + +```python +from sentry.services.hybrid_cloud.organization.service import OrganizationService +from sentry.services.hybrid_cloud.organization.model import RpcOrganizationSummary +from sentry.models.organization import Organization, OrganizationStatus + +class DatabaseBackedOrganizationService(OrganizationService): + + def get_org_by_slug( + self, + *, + slug: str, + user_id: int | None = None, + ) -> RpcOrganizationSummary | None: + query = Organization.objects.filter(slug=slug) + if user_id is not None: + query = query.filter( + status=OrganizationStatus.ACTIVE, + member_set__user_id=user_id, + ) + try: + return serialize_organization_summary(query.get()) + except Organization.DoesNotExist: + return None +``` + +RPC method implementations are simple python methods, but there are a few rules that should be followed: + +1. All parameters and return values must either be simple values, or `RpcModel` instances. This ensures that requests and responses can be serialized. +2. Response values should be scalar values or `RpcModel` instances. Avoid tuples or lists that are not sequences of objects. + +### RPC Authentication + +All cross-region RPC requests go to a single endpoint `/api/0/internal/rpc/:service/:method` The RPC endpoint uses HMAC based signature comparison to validate authenticity. Before a request is made, the request body has an HMAC signature created using a secret that is shared by region and control silo instances. + +When an RPC message is received, the signature header is compared with a locally generated HMAC using the receiver’s secret. The request is only processed if the checksums match. + +### RPC serialization and transport + +Cross-Region RPC uses pydantic and openapi to provide typehinted parameter and response values, and fulfill our requirements for schema generation and validation. How RPC methods behave varies based on how the application is configured. + +#### In monolith mode + +- RPC methods are invoked synchronously on the implementation service. + +#### In tests + +- During tests we emulate as much of the RPC stack as we can. +- Request bodies and responses are serialized as JSON. +- Requests are made with `django.test.Client` to the RPC endpoint. + +#### In siloed mode + +- If a service is in the local silo mode, no serialization or network requests + are made. +- Request bodies and response are serialized as JSON +- Requests are made using `requests` and the region configuration. + +## Adding and changing RPC methods + +RPC methods must maintain backwards compatibility between deploys. Because deploys are not atomic, and updating all regions takes time, making changes to RPC methods requires additional care. + +### Adding RPC methods + +You **cannot add an RPC method and call it in the same pull request or deploy**. Doing so can result in callers being updated before the receiving side has been updated. Instead do the following: + +1. Deploy the new RPC method to all regions. +2. Deploy callers to the new RPC method. + +### Adding parameters to RPC methods + +You **cannot add new required parameters to existing RPC methods**. To add a new parameter to an RPC method do the following: + +1. Deploy the new parameter with a default value. +2. Update all callsites to provide a value for the parameter. Deploy those changes. +3. Make the parameter required by removing the default value. + +### Removing an RPC method + +1. Remove all callers of the RPC method. Deploy those changes. +2. Once there are no callers, the RPC method is safe to remove. + +### Removing parameters from an RPC method + +1. Make the parameter optional. Deploy that change. +2. Remove all usage of the parameter. Deploy those changes. +3. Remove the parameter. + +### Renaming RPC methods or parameters + +Renaming methods and parameters requires adding a new method, shifting callers to the new method and removing the old method. diff --git a/develop-docs/backend/development-server.mdx b/develop-docs/backend/development-server.mdx new file mode 100644 index 0000000000000..a2e22790b49a6 --- /dev/null +++ b/develop-docs/backend/development-server.mdx @@ -0,0 +1,29 @@ +--- +title: Backend Development Server +--- + +(*see also: [Frontend Development Server](/frontend/development-server/)*) + +It's possible to run Sentry locally without ever having to build the frontend. It requires to circumvent `sentry devserver`. Instead, you configure: + +```python +# ~/.sentry/sentry.conf.py +STATIC_URL = "https://sentry.io/_static/{version}" +STATIC_FRONTEND_APP_URL = "https://sentry.io/_static/dist/" +``` + +...and run: + +```shell +sentry run web +``` + +When you browse `localhost:9001`, the browser will load JavaScript from production instead of your local static folder. + +This will do literally nothing except bring up the web workers. You are now responsible to manually: + +* Starting Relay if you need it: `sentry devservices attach relay` +* Starting relevant Kafka consumers if you need them (e.g. for ingestion). This highly depends on which dataset you're working on. +* Starting Celery workers using `sentry run worker` to run most kinds of background jobs. + +Use `sentry run --help` to see what you might be missing by not running devserver. diff --git a/develop-docs/backend/kafka.mdx b/develop-docs/backend/kafka.mdx new file mode 100644 index 0000000000000..003beb9619942 --- /dev/null +++ b/develop-docs/backend/kafka.mdx @@ -0,0 +1,14 @@ +# Kafka consumers + +## Creating a new consumer in Sentry + +WIP! This is currently just a checklist. + +1. Add a new entry in the `KAFKA_CONSUMERS` key in + `src/sentry/consumers/__init__.py`. + +2. Create ops PR for adding your consumer to + `k8s/services/getsentry/consumer-deployment.yaml`. + +3. In the Sentry Consumers metrics dashboard, add a new saved view for your + consumer. diff --git a/develop-docs/backend/outboxes.mdx b/develop-docs/backend/outboxes.mdx new file mode 100644 index 0000000000000..dec16a5264efc --- /dev/null +++ b/develop-docs/backend/outboxes.mdx @@ -0,0 +1,169 @@ +--- +title: Outboxes +--- + +Outboxes are database backed deferred units of work that drive a large portion of our system's eventual consistency workflows. They were designed with a couple of key features in mind: + +- **Durability:** Outboxes are stored as database rows that are typically committed transactionally alongside associated model updates. They are retained until the processing code signals that work has been completed successfully. +- **Retriability:** If an outbox processor fails for whatever reason, it will be reprocessed indefinitely until it succeeds. + - _Note:_ We do not support deadlettering, which means a head of line blocking outbox will continue to backlog work until the processing code is fixed. + +## Outbox Properties + +An outbox consists of the following parts: + +`shard_scope` + +: The operational group the outbox belongs to. This tends to be aligned with the models or domains the outbox applies to. + +`shard_identifier` + +: The shard, or grouping Identifier. + +`object_identifier` + +: The ID of the impacted model object (if applicable).

*Note:* Not every outbox has an explicit model target, so this can be set to an arbitrary unique value if necessary. Just be aware that this identifier is used in tandem with the `shard_identifier` for coalescing purposes. + +`category` + +: The specific operation the outbox maps to, for example `USER_UPDATE` and `PROVISION_ORGANIZATION` . + +`payload` + +: An arbitrary JSON blob of additional data required by the outbox processing code. + +`scheduled_for` + +: The datetime when the outbox should be run next + +`scheduled_from` + +: The datetime of when the `scheduled_for` date was last set. + +## Outbox Sharding + +Outbox shards are groups of outboxes representing an interdependent chunk of work that must be processed *in order.* An outbox’s shard is determined via the combination of its `shard_scope` and `shard_identifier` columns, and selecting appropriate values for both is essential. + +`ControlOutbox` instances have 1 additional sharding column to consider: `region` , which ensures that each region processes its own shard of work independent of the other. + +Some pragmatic examples: + +- Organization Scoped outboxes use the organization’s ID as the shard identifier, meaning Organization, Project, Org Auth Token Updates (and more!) for a single organization are all processed in order. If any of these different outboxes get stuck, the entire shard will begin to backlog, so be mindful of failure points. +- Audit log event outboxes live in their own scope and use *distinct* shard identifiers for every outbox to ensure they are processed individually without the threat of being blocked by other outboxes. + +## Synchronous vs Asynchronous outboxes + +When creating a new outbox message, you can either attempt to immediately process it synchronously, or defer the work for later. Deciding which approach to take depends entirely on the use case and the source of the outbox's creation. + +Some quick heuristics for deciding: + +1. Was the outbox created by an API request that needs to report the status of the operation to the requestor? (process it synchronously) +2. Was the outbox a result of a task, automatic process, or webhook request? (process it asynchronously) + +Thankfully, choosing the desired behavior is simple: + +```python +from django.db import router, transaction +from sentry.models.outbox import outbox_context + +# Synchronous outbox processing +with outbox_context(transaction.atomic(router.db_for_write()): + RegionOutbox(...).save() + +# Asynchronous outbox processing +with outbox_context(flush=False): + RegionOutbox(...).save() +``` + +Both examples require the usage of the `outbox_context` context manager, but the key difference is in the arguments supplied. + +### Synchronous outboxes + +Supplying a transaction to the `outbox_context` signals our the intent to immediately process the outbox *after* the provided transaction has been committed. This is handled via Django’s `on_commit` handlers which are automatically generated by the context manager. + +The context manager will attempt to flush all outboxes generated within the context manager, unless their creation operations are wrapped in a nested asynchronous context manager: + +```python +with outbox_context(transaction.atomic(router_db_for_write(): + sync_outbox = RegionOutbox(...).save() + + with outbox_context(flush=False): + async_outbox = RegionOutbox(...).save() +``` + +Because processing occurs *after the transaction has been committed,* any outboxes that cannot be processed are treated as asynchronous outboxes after their initial flush attempts. + +### Asynchronous outboxes + +Supplying the `outbox_context` with `flush=False` instead of a transaction skips generating the `on_commit` handlers entirely, meaning any outboxes created within the context manager will not be processed in the current thread. Instead, they will be picked up by a future periodic celery task that queries all outstanding outbox shards and attempts to process them. + +## Choosing an outbox’s silo and database + +Outboxes should live in the same silo and database as the models or processes that produce them. + +For example, `Organization` model changes generate `OrganizationUpdate` outboxes. Because the `Organization` is a region-silo model that lives in the `sentry` database, the `OrganizationUpdate` outbox is a `RegionOutbox` that also lives in the `sentry` database. + +Having both models aligned to the same database ensures that both the model change and the outbox message creation can be committed in the same database transaction for consistency. + +## Coalescing + +Outboxes are coalesced in order to prevent backpressure from toppling our systems after a blocked outbox shard is cleared. We accomplish this by assuming the last message in a coalescing group is the source of truth, ignoring any preceding messages in the group. + +An outbox’s coalescing group is determined by the combination of its sharding, `category` and `object_identifier` columns. + +This coalescing strategy means that any outbox payloads that are stateful and order dependent in the same coalescing group *will* result in data loss when the group is processed. If you want to bypass coalescing entirely, you can set an arbitrary *unique* object identifier to ensure messages are run individually and in order; however, this can cause severe bottlenecks so be cautious when doing so. + +## Processing Outboxes + +Outboxes processors are implemented as Django signal receivers, with the sender of the receiver set to the outbox category. Here’s an example of a receiver that handles project update outboxes: + +```python +from sentry.models.outbox import OutboxCategory, process_region_outbox + +@receiver(process_region_outbox, sender=OutboxCategory.PROJECT_UPDATE) +def process_project_updates(object_identifier: int, shard_identifier: int, payload: Any, **kwargs): + pass +``` + +Each receiver is passed the following outbox properties as arguments: + +- `object_identifier` +- `shard_identifier` +- `payload` +- `region_name` [ControlOutbox only] + +If the receiver raises an exception, the shard’s processing will be halted and the entire shard will be scheduled for future processing. If the receiver returns gracefully, the outbox’s coalescing group will be deleted and the next outbox message in the shard will be processed. + +Because any outbox message can be retried multiple times in these exceptional cases, it’s crucial to make these processing receivers idempotent. + +## Steps for adding a new Outbox + +1. Add a category enum +2. Choose an outbox scope +3. Choose a shard identifier and object identifier +4. Create a receiver to process the new outbox type +5. Update your code to save new outboxes within an outbox context + +## Possible issues to be mindful of + +### Deadlocking + + If an outbox processor issues an RPC request to another silo, which in turn generates a synchronously processed outbox, this can result in a deadlock occurring if any of the outboxes in the newly generated outbox’s shard also generates or processes an outbox targeting the originating silo. + +This is fairly rare situation, but one we’ve encountered in production before. Make sure to choose your outbox scopes and synchronous outbox processing use-cases carefully, minimizing cross-silo outbox interdependency whenever possible. + +### **Data Inconsistency Between Silos** + +There’s no guarantee that an outbox will be processed in a timely manner. This can cause data between our silos to drift, so consider this when writing code that depends on outboxes to sync data. + +## When to use Outboxes + +- The operation requires retriability and persistence. +- The operation can be deferred and made eventually consistent. +- A higher latency than a typical API or RPC request (<10s) is acceptable. + +## When not to use Outboxes + +- The work can take more than 10 seconds to process. This is a current limitation due to the way we acquire locks on outbox shards when processing them. +- Data is ordered and has low cardinality, meaning it cannot be sharded efficiently. +- Dependent code requires strong consistency guarantees cross-silo. \ No newline at end of file diff --git a/develop-docs/code-review.mdx b/develop-docs/code-review.mdx new file mode 100644 index 0000000000000..f2aae0786e504 --- /dev/null +++ b/develop-docs/code-review.mdx @@ -0,0 +1,216 @@ +--- +title: "Code Review" +--- + +Code review is mandatory at Sentry. This adds overhead to each change, but ensures that simple, often overlooked problems are more easily avoided. Code review helps build shared context and collective ownership. It is also an opportunity to collaborate with other teams. Finally, code review can identify several classes of problem before customers are exposed to them. + +Code review is managed via GitHub’s Pull Requests (see below for rationale). Templates may exist on repositories, but if they do not, consider [creating one](https://help.github.com/articles/creating-a-pull-request-template-for-your-repository/). + +## Who should be reviewing code? + +All engineers should be reviewing code. As you gain more experience and context on the products we build and technologies we use, you can provide valuable feedback to other engineers who may be working in areas that are new to them but familiar to you. + +Code review is an opportunity to improve your mentoring and communication skills. Code review can have the important function of teaching engineers about the languages, frameworks and technologies we use in a collaborative environment that is about the changes being made. + +When creating a pull request, reference any tickets or Sentry issues which are being addressed. Additionally **@mention** an appropriate team (or teams) for review. + +## Why Pull Requests + +Because Sentry is an open source project maintained via GitHub we want to ensure that the barrier to entry for external contributions is minimal. By using GitHub features when possible, we make it easy for developers familiar with other projects on GitHub. While GitHub’s tools [aren’t always the best](http://cra.mr/2014/05/03/on-pull-requests), they work well enough. + +### GitHub Teams + +The following teams are defined in GitHub and can be used when creating Pull Requests: + +- [@getsentry/app-backend](https://github.com/orgs/getsentry/teams/app-backend) +- [@getsentry/app-frontend](https://github.com/orgs/getsentry/teams/app-frontend) +- [@getsentry/infrastructure](https://github.com/orgs/getsentry/teams/infrastructure) +- [@getsentry/ops](https://github.com/orgs/getsentry/teams/ops) +- [@getsentry/app](https://github.com/orgs/getsentry/teams/app) – the entire product team, use sparingly + +Sentry has more than 200 public repositories. A more comprehensive list of teams and repositories can be found in the [Sentry Structure](https://open.sentry.io/structure/) overview. + +## Commit Guidelines + +See the Commit Messages guide. + +## Code Reviews are for … + +### Identifying problematic code + +Above all, a code review should try to identify potential bugs that could cause the application to break – either now, or in the future. + +- Uncaught runtime exceptions (e.g. potential for an index to be out of bounds) +- Obvious performance bottlenecks (e.g. `O(n^2)` where `n` is unbounded) +- Code alters behavior elsewhere in an unanticipated way +- API changes are not backwards compatible (e.g. renaming or removing a key) +- Complex ORM interactions that may have unexpected query generation/performance +- Security vulnerabilities +- Missing or incorrect Permissions or Access Control. + +### Improving Design + +When reviewing code, consider if the interactions of the various pieces in the change make sense together. If you are familiar with the project, do the changes conflict with other requirements or goals? Could any of the methods being added be promoted to module level methods? Are methods being passed properties of objects when they could be passed the entire object? + +### Tests Included + +Look for tests. There should be functional tests, integration tests or end-to-end tests covering the changes. If not, ask for them. At Sentry we rely on our test suite to maintain a high quality bar and ship rapidly. + +When reviewing tests double check that the tests cover the requirements of the project or that they cover the defect being fixed. Tests should avoid branching and looping as much as possible to prevent bugs in the test code from gaining a foothold. + +Functional tests that simulate how a user would call our APIs or use our UX are key to preventing regressions and avoiding brittle tests that are coupled to the internals of the products we ship. + +Tests are also the ideal place to ensure that the changes have considered permissions and access control logic. + +### Assessing and approving long-term impact + +If you’re making significant architectural, schema, or build changes that will have long-term ramifications to the software or data, it is necessary to solicit a senior engineer’s acknowledgment and blessing. + +- Large refactors +- Database schema changes, like adding or removing columns and tables +- API changes (including JSON schema changes) +- Adopting new frameworks, libraries, or tools +- New product behavior that may permanently alter performance characteristics moving forward + +### Double-checking expected behavior + +The reviewer should make a genuine attempt to double-check that the goals of the PR appear to be satisfied by the code submitted. This requires the submitter to write a good description of the expected behavior, and why. *See also: Guidelines for submitters* *below*. + +### Information sharing and professional development + +Code reviews are an opportunity for more people to understand forthcoming code changes, so that they might in turn teach others down the road, and be in a position to fix something if/when the original author is not be available. + +Additionally, code reviews are an opportunity to learn about new techniques or approaches, and be exposed to code you might otherwise not have an opportunity to browse. + +### Reducing code complexity + +Research shows that LOC is correlated with a higher bug count. If reviewers see an easy opportunity to significantly reduce the amount of code that is submitted, they should suggest a different approach. + +For example, if a submitter has written a `for` loop to find an item in an array: + +``` +for (let i = 0; i < arr.length; i++) { + if (arr[i] === 'thingiwant') return i; +} +return undefined; +``` + +It’s fair game to suggest they instead use: + +``` +return arr.find(x => x === 'thingiwant'); +``` + +This is a mostly objective improvement: there are fewer variables, fewer statements, and fewer branches, and the method name `find` communicates intent. Suggesting these types of uncontroversial improvements is encouraged. + +Be careful though – it’s easy to go down a rabbit hole of re-writing code to be as small as possible, and in the end winding up with something ultimately more complicated. Be pragmatic and strive to reach a good balance. *See also: “Code reviews are not for getting it perfect” below.* + +### Enforcing coding standards + +As much as possible, we use automation to enforce code style and test coverage, but there are exceptions that cannot necessarily be automated (or perhaps more accurately, we haven’t automated them *yet*): + +- Variable, file, metric, and logger names are sensible, readable, and consistent with existing code +- Migrations have a deployment plan +- Unused or superfluous code isn’t committed accidentally + +## Code Reviews are not for … + +### Passing responsibility onto the reviewer + +It is not the responsibility of the reviewer that your code is correct, is bug free, or achieves its goals. Reviewers are there to help you, but if something is wrong, it’s your responsibility to correct it. + +### Boasting about your programming knowledge + +As a reviewer, try to stick to objective improvements and make a best-intent assumption that the submitter has done their homework. Sentry is a No Flex Zone™. + +### Introducing long-term architectural changes for the first time + +While code reviews are great for discussion, they’re not the place to introduce large, long-term changes for the first time. Before dropping a PR that implements those changes, you should write a proposal and reach out to the relevant parties beforehand. + +### Getting it perfect + +Code reviews are expensive. Every time you request a change, you’re probably delaying that PR by 24 hours or more. This can severely inhibit our ability to move fast. + +The goal of code reviews is to **reduce risk**, not to produce perfect code. It’s okay to ship code in stages, and to commit to improving something later. If you’re thinking – *if we don’t get it correct up-front, we’ll never come back to it* – consider that if it never needs coming back to, perhaps those changes were never necessary in the first place. + +> Perfect is the enemy of the good. – Voltaire, probably + +Please be pragmatic, and consider the cost of each incremental request for changes. + +## Guidelines for *Submitters* + +### Try to organize your work in a way that makes it conducive to review + +- Ideally, a pull request is limited to only a single feature or behavior change. +- This might feel like more work up-front, but it can make code review faster, reduce risk by letting you ship in stages, and ultimately end up being quicker. + +### Describe what your PR does in a few sentences in the description field + +- Additionally explain **_why_** we’re making these changes. +- If applicable, explain why other approaches were explored but not settled on. +- This gives the reviewer context, and prevents them going down the same rabbit holes that that submitter may have already explored when creating the code. + +### Annotate specific lines in your PR + +- If you can, give context to specific lines of code that could use elaboration. +- Example: [getsentry/sentry#6330](https://github.com/getsentry/sentry/pull/6330#discussion_r144441272) + +### Where appropriate, label in-progress PRs as WIP (work in progress) for early feedback + +- Labeling your work as WIP helps set expectations about the state of the PR. +- WIP PRs are good for having someone check-in to make sure you’re on the right path. +- Additionally, this is an opportunity to verify CI passes before involving a reviewer. + +### Be your own first reviewer + +- After you’ve put up your PR on GitHub, walk through the code yourself, before assigning an external reviewer. +- You’ll often catch code mistakes you didn’t see when writing it. +- This is also a good time to leave comments and refresh your memory in order to write a more helpful description. + +### Assign no more than 1-3 reviewers + +- It’s tempting to want to involve as many people as possible, but it can often be distracting, and create a situation where nobody’s clear on who should actually perform the review. +- If your work spans multiple teams (and thus, many reviewers), consider breaking up your PR into multiple compatible patches (e.g. a back-end change and a front-end change). +- Note: if you don’t know who from a team to assign, you can assign teams like @workflow and @platform. + +### Avoid rebasing unnecessarily + +- After a rebase, previous review comments will be orphaned from their now non-existent parent commits, making review more difficult +- Rewriting history makes it difficult for reviewers to isolate the scope of their review + +### Let reviewers know that you’ve made changes + +- Request review again via the "Reviewers" dropdown (There should be a yellow dot next to their name again). +- Don’t rely on reviewers' mind-reading skills to know that you’re ready to have them look things over again. +- If you resolve a point of actionable feedback, it's helpful to leave a comment to let the reviewer know that it was addressed, ideally with a reference to the commit that addressed it, e.g. [getsentry/sentry#6683 (comment)](https://github.com/getsentry/sentry/pull/6683#discussion_r155121800) + +## Guidelines for *Reviewers* + +### Be polite and empathetic + +- Avoid accusatory and/or judgmental comments like: “You should have done X” + +### Provide _actionable_ feedback + +- Instead of *“This is bad”*, try *“I feel this could be clearer. What if you renamed variable X to Y?”* + +### Distinguish between “requires changes” and “nitpicks” + +- Consider marking a PR as approved if the only requested changes are minor nits, so as not to block the author in another asynchronous review cycle. + +### Respond promptly to code review requests + +- We’re a team – *[we ride together, we die together](https://www.youtube.com/watch?v=1d5Q0vXbODs)* – and you need to unblock other developers so that we can all move forward. +- We recommend checking for open code reviews at the start and end of every day. +- [Github's Review Requests tab](https://github.com/pulls/review-requested) can be a helpful place to keep track of these. + +### Example of a *pretty good* code review + +[getsentry/sentry#5849](https://github.com/getsentry/sentry/pull/5849) + +- Submitted early as WIP to solicit feedback early +- Reviewers respond same-day (once the WIP label is removed) +- Reviewers are polite and complimentary +- Feedback is objective and actionable +- Submitter responds to feedback and makes changes promptly +- PR is approved and merged within 48 hours diff --git a/develop-docs/commit-messages.mdx b/develop-docs/commit-messages.mdx new file mode 100644 index 0000000000000..d2f1dcdf3f089 --- /dev/null +++ b/develop-docs/commit-messages.mdx @@ -0,0 +1,186 @@ +--- +title: "Commit Messages" +--- + +We have very precise rules over how our git commit messages can be formatted. This leads to more readable messages that are easy to follow when looking through the project history. + +### General Rules + +1. Separate subject from body with a blank line +2. Limit the subject line to 70 characters +3. Capitalize the subject line +4. Do not end the subject line with a period +5. Use the imperative mood in the subject line +6. Use the body to explain what and why vs. how +7. Each commit should be a single, stable change + +### Merge vs Rebase + +Sentry uses a rebase workflow. That means that every commit on its own should be a clear, functional, and stable change. This means then when you’re building a new feature, you should try to pare it down into functional steps, and when that’s not reasonable, the end patch should be a single commit. This is counter to having a Pull Request which may include “fix [unmerged] behavior”. Those commits should be squashed, and the final patch when landed should be rebased. + +Remember: each commit should follow the commit message format and be stable (green build). + +#### Rebase and Merge + +The GitHub UI exposes a “Rebase and Merge” option, which, if your commits are already in following the commit guidelines, is a great way to bring your change into the codebase. + +#### Squashing + +When you are squashing your branch, it’s important to make sure you update the commit message. If you’re using GitHub’s UI it will by default create a new commit message which is a combination of all commits and **does not follow the commit guidelines**. + +If you’re working locally, it often can be useful to `--amend` a commit, or utilize `rebase -i` to reorder, squash, and reword your commits. + +### Commit Message Format + +Each commit message consists of a **header**, a **body**, and an optional **footer**. + +The header has a special format that includes a type, a scope and a subject: + +``` +(): () + + + +