Skip to content

Commit

Permalink
Nail docs navigation (#5186)
Browse files Browse the repository at this point in the history
* begin refactoring sidebar

* completely remove using posthog section

* add icons to product sections

* remove integrate section altogether

* fix merge isssue

* remove left-over conflict marker

* slight tweaks to sidebar labels

* add placeholder landing pages

* create tutorials component for docs landing pages

* mockup of product-analytics landing page

* break apart existing session recordings page

* break up feature flags section

* break up experiments section

* move library comparison page to reference section

* mock up feature flags landing page

* temp: comment out wrapPageElement in gatsby-browser

* fix build

* mockup landing pages

* fix image positioning

* remove duplicates

* add clean script

* update docs homepage link grid

* move deploy options to important links

* Tutorial - How to set up a React app heatmap (#5454)

* react heatmap

* wider images

* fix: various self hosted clean-ups (#5473)

* remove references to /signup/self-host/deploy

* merge self host instructions into landing page

* fix wrong bracket

* redirect /contact to /contact-sales

* change intro message and update guidance on memory size

* remove /docs/self-host/open-source/deployment

* remove unused DeploymentOption component

* remove trailing whitespace

* fetch docs homepage info with query

* exclude build pages

* finalize docs landing page

* update path to search hog image

* tweak image width and make all landing pages less wide

* update links

* mobile style for getting started section

* fix dark mode sidebar menu

* update tutorials component

* update docs submenu links

* update data start here link

* update quick links on all pages

* regen mdx global components

* update gatsby-browser/ssr

* fix product icons

* use product icons

* moved data-related docs to new subsection, outside of product analytics

* text changes

* reworked product analytics index page to match chapters

* standardized naming for main product manual pages

* add key

* linked Sampling page

* icon strings

* polish product analytics getting started page

* dark mode fixes

* move graphic to top

* line up children with icons

* fix icon, experiments labeling

* dark mode fix

* styling docs → product index pages

* added getting started images

* getting started image mobile fix

* start-here images

* remove product analytics url

* merged identify users content, redirected old page

* moved User properties page into Getting started

* added User properties to Getting started index

* next steps text

* update quick links / add session recording & data

* text/image fixes

* add product analytics URL

* fixing links, moving nav items around

* fixed broken links, polished /docs styling

* redirected old /docs/integrate page

* spacing

---------

Co-authored-by: Ian Vanagas <[email protected]>
Co-authored-by: Eli Kinsey <[email protected]>
Co-authored-by: Cory Watilo <[email protected]>
  • Loading branch information
4 people authored Mar 13, 2023
1 parent 4cdd9aa commit a0eba18
Show file tree
Hide file tree
Showing 56 changed files with 3,028 additions and 1,907 deletions.
2 changes: 1 addition & 1 deletion contents/blog/posthog-vs-amplitude.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,7 @@ Both Amplitude and PostHog integrate with a large number of data sources. The ta
<tr>
<td>Zendesk</td>
<td className="text-center"><span className="text-green text-lg">✔</span></td>
<td className="text-center"><span className="text-green text-lg">✔</span></td>
<td className="text-center"><span className="text-green text-lg">✔</span></td>
</tr>
<tr>
<td>API</td>
Expand Down

Large diffs are not rendered by default.

28 changes: 28 additions & 0 deletions contents/docs/experiments/significance.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
title: Statistical significance
---

For your results and conclusions to be valid, any experiment must have a significant exposure. For instance, if you test a product change and only one user sees the change, you can't extrapolate from that single user that the change will be beneficial/detrimental for your entire user base. This is true for any experiment that is a simple randomized controlled experiment (e.g. this is also done when testing new drugs or vaccines).

Furthermore, even a large sample size (e.g. approx. 10,000 participants) can result in ambiguous results. If, for example, the difference in conversion rate between the variants is less than 1%, it's hard to say whether one variant is truly better than the other. To be significant, there must be enough difference between the conversion rates, given the exposure size.

PostHog computes this significance for you automatically - we will let you know if your experiment has reached significant results or not. Once your experiment reaches significant results, it's safe to use those results to reach a conclusion and terminate the experiment. You can read more about how we do this in our 'Advanced' section below.

## Calculating exposure

Since count data can be over a total count, vs. the number of unique users, we use a proxy metric to measure exposure: The number of times `$feature_flag_called` event returns `control` or `test` is the respective exposure for the variant.
This event is sent automatically when you do: `posthog.getFeatureFlag()`.

It's possible that a variant showing fewer count data can have higher probability, if its exposure is much smaller as well.

## How we determine significance

In the early days of an experiment, data can vary wildly, and sometimes one variant can seem overwhelmingly better. In this case, our significance calculations might say that the results are significant, but this shouldn't be the case, since we need more data.

Thus, before we hit 100 participants for each variant in an experiment, we default to results being not significant. Further, if the probability of the winning variant is less than 90%, we default to results being not significant.

So, you'll only see the green significance banner when all 3 conditions are met:

1. Each variant has >100 unique users
2. The calculations above declare significance
3. The probability of being the best > 90%.
105 changes: 105 additions & 0 deletions contents/docs/experiments/under-the-hood.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
title: Experiments under the hood
---

Below are all formulas and calculations we go through when recommending sample sizes and determining significance.

## How we ensure distribution of people

For every experiment, we leverage PostHog's multivariate feature flags. There's one control group, and up to three test groups. Based on their distinctID, each user is randomly distributed into one of these groups. This is stable, so the same users, even when they revisit your page, stay in the same group.

We do this by creating a hash out of the feature flag key and the distinct ID.
It's worth noting that when you have low data (&lt;1,000 users per variant), the difference in variant exposure can be up to 20%. This means, a test variant could have 800 people only, when control has 1,000.

All our calculations take this exposure into account.

## Recommendations for sample size and running time

When you're creating an experiment, we show you recommended running times and sample sizes based on parameters you've chosen.

For trend experiments, we use Lehr's equation [as explained here](http://www.columbia.edu/~cjd11/charles_dimaggio/DIRE/styled-4/code-12/#poisson-distributed-or-count-data) to determine sample sizes.

```
exposure = 4 / (sqrt(lambda1) - sqrt(lambda2))^2
```

where lambda1 is the baseline count data we've seen for the past two weeks,
and lambda2 is `baseline count + mde*(baseline count)`.

`mde` is the minimum acceptable improvement you choose in the UI.

For funnel experiments, we use the general [Sample Size Determination](https://en.wikipedia.org/wiki/Sample_size_determination) formula, with 80% power and 5% significance. The formula then becomes:

```
sample size per variant = 16 * conversion_rate * (1 - conversion_rate) / (mde)^2
```

where `mde` is again the minimum detectable effect chosen in the UI.

We give these values as an estimate for how long to run the experiment. It's possible to end experiments before you reach the end, if you see an outsized effect.

Note how the recommended sample size in each case is inversely related to the minimum acceptable improvement. This makes sense, since the smaller the `mde`, the more sensitive your experiment is, and the more data you need to judge significance.

## Bayesian A/B testing

We follow a mostly bayesian approach to A/B testing. While running any experiment, we calculate two parameters: (1) Probability of each variant being the best, and (2) whether the results are significant or not.

Below are calculations for each kind of experiment.

## Trend experiment calculations

Trend experiments capture count data. For example, if you want to measure the change in total count of clicks, you'd use this kind of experiment.

We use Monte Carlo simulations to determine the probability of each variant being the best. Every variant can be simulated as a gamma distribution with shape parameter = trend count, and exposure = relative exposure for this variant.

Then, for each variant, we can sample from their distributions and get a count value for each of them.

The probability of a variant being the best is given by:

![Probability](../images/docs/user-guides/experimentation/count_data_probability.png)

For calculating significance, we currently measure p-values using a poisson means test. [Here's a good primer of the formula](https://www.evanmiller.org/statistical-formulas-for-programmers.html#count_test). Results are significant when the p-value is less than 0.05

## Trend experiment exposure

Since count data can be over a total count, vs. the number of unique users, we use a proxy metric to measure exposure: The number of times `$feature_flag_called` event returns `control` or `test` is the respective exposure for the variant. This event is sent automatically when you do: `posthog.getFeatureFlag()`.

It's possible that a variant showing fewer count data can have higher probability, if its exposure is much smaller as well.

## Funnel experiment calculations

Funnel experiments capture conversion rates. For example, if you want to measure the change in conversion rate for buying a subscription to your site, you'd use this kind of experiment.

We use monte carlo simulations to determine the probability of each variant being the best. Every variant can be simulated as a beta distribution with alpha parameter = number of conversions, and beta parameter = number of failures, for this variant.

Then, for each variant, we can sample from their distributions and get a conversion rate for each of them.

The probability of a variant being the best is given by:

![Probability](../images/docs/user-guides/experimentation/conversion_probability.png)

To calculate significance, we calculate the expected loss, as first mentioned in [VWO's SmartStats whitepaper](https://vwo.com/downloads/VWO_SmartStats_technical_whitepaper.pdf).

To do this, we again run a monte carlo simulation, and calculate loss as:

![Loss](../images/docs/user-guides/experimentation/loss-calculation.png)

This represents the expected loss in conversion rate if you chose any other variant. If this is below 1%, we declare results as significant.

## How do we handle statistical significance?

For your results and conclusions to be valid, any experiment must have a significant exposure. For instance, if you test a product change and only one user sees the change, you can't extrapolate from that single user that the change will be beneficial/detrimental for your entire user base. This is true for any experiment that is a simple randomized controlled experiment (e.g. this is also done when testing new drugs or vaccines).

Furthermore, even a large sample size (e.g. approx. 10,000 participants) can result in ambiguous results. If, for example, the difference in conversion rate between the variants is less than 1%, it's hard to say whether one variant is truly better than the other. To be significant, there must be enough difference between the conversion rates, given the exposure size.

PostHog computes this significance for you automatically - we will let you know if your experiment has reached significant results or not. Once your experiment reaches significant results, it's safe to use those results to reach a conclusion and terminate the experiment. You can read more about how we do this in our 'Advanced' section below.

In the early days of an experiment, data can vary wildly, and sometimes one variant can seem overwhelmingly better. In this case, our significance calculations might say that the results are significant, but this shouldn't be the case, since we need more data.

Thus, before we hit 100 participants for each variant in an experiment, we default to results being not significant. Further, if the probability of the winning variant is less than 90%, we default to results being not significant.

So, you'll only see the green significance banner when all 3 conditions are met:

1. Each variant has >100 unique users
2. The calculations above declare significance
3. The probability of being the best > 90%.
174 changes: 174 additions & 0 deletions contents/docs/feature-flags/local-evaluation.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
---
title: Local evaluation
availability:
free: full
selfServe: full
enterprise: full
---

There is a delay between loading the library and feature flags becoming available to use. This can be detrimental if you want to do something like redirecting to a different page based on a feature flag.

To have your feature flags available immediately, you can bootstrap them with a distinct user ID and their values during initialization.

```js
posthog.init('<ph_project_api_key>', {
api_host: '<ph_instance_address>',
bootstrap: {
distinctID: 'your-anonymous-id',
featureFlags: {
'flag-1': true,
'variant-flag': 'control',
'other-flag': false,
},
},
})
```

To get the flag values for bootstrapping, you can call `getAllFlags()` in your server-side library, then pass the values to your frontend initialization. If you don't do this, your bootstrap values might be different than the values PostHog provides.

If the distinct user ID is an identified ID (the value you called `posthog.identify()` with), you can also pass the `isIdentifiedID` option. This ensures this ID is treated as an identified ID in the library. This is helpful as it warns you when you try to do something wrong with this ID, like calling identify again.

```js
posthog.init('<ph_project_api_key>', {
api_host: '<ph_instance_address>',
bootstrap: {
distinctID: 'your-identified-id',
isIdentifiedID: true,
featureFlags: {
'flag-1': true,
'variant-flag': 'control',
'other-flag': false,
},
},
})
```

## Forcing feature flags to update

In our client-side JavaScript library, we store flags as a cookie to reduce the load on the server and improve the performance of your app. This prevents always needing to make an HTTP request, flag evaluation can simply refer to data stored locally in the browser. This is known as 'local evaluation.'

While this makes your app faster, it means if your user does something mid-session which causes the flag to turn on for them, this does not immediately update. As such, if you expect your app to have scenarios like this _and_ you want flags to update mid-session, you can reload them yourself, by using the `reloadFeatureFlags` function.

```js
posthog.reloadFeatureFlags()
```

Calling this function forces PostHog to hit the endpoint for the updated information, and ensures changes are reflected mid-session.

## Server-side local evaluation

If you're using our server-side libraries, you can use local evaluation to improve performance instead of making additional API requests. This requires:

1. knowing and passing in all the person or group properties the flag relies on
2. initializing the library with your personal API key (created in your account settings)

Local evaluation, in practice, looks like this:

<MultiLanguage>

```js
await client.getFeatureFlag(
'beta-feature',
'distinct id',
{
personProperties: {'is_authorized': True}
}
)
# returns string or None
```

```python
posthog.get_feature_flag(
'beta-feature',
'distinct id',
person_properties={'is_authorized': True}
)
# returns string or None
```

```php
PostHog::getFeatureFlag(
'beta-feature',
'some distinct id',
[],
["is_authorized" => true]
)
// the third argument is for groups
```

```ruby
posthog.get_feature_flag(
'beta-feature',
'distinct id',
person_properties: {'is_authorized': True}
)
# returns string or Nil
```

```go
enabledVariant, err := client.GetFeatureFlag(
FeatureFlagPayload{
Key: "multivariate-flag",
DistinctId: "distinct-id",
PersonProperties: posthog.NewProperties().
Set("is_authorized", true),
},
)
```

</MultiLanguage>

This works for `getAllFlags` as well. It evaluates all flags locally if possible, and if not, falls back to making a `decide` HTTP request.

<MultiLanguage>

```node
await client.getAllFlags('distinct id', {
groups: {},
personProperties: { is_authorized: True },
groupProperties: {},
})
// returns dict of flag key and value pairs.
```

```php
PostHog::getAllFlags('distinct id', ["organisation" => "some-company"], [], ["organisation" => ["is_authorized" => true]])
```

```go
featureVariants, _ := client.GetAllFlags(FeatureFlagPayloadNoKey{
DistinctId: "distinct-id",
})
```

```python
posthog.get_all_flags('distinct id', groups={}, person_properties={'is_authorized': True}, group_properties={})
# returns dict of flag key and value pairs.
```

```ruby
posthog.get_all_flags('distinct id', groups: {}, person_properties: {'is_authorized': True}, group_properties: {})
# returns hash of flag key and value pairs.
```

</MultiLanguage>

## Using locally

To test feature flags locally, you can open your developer tools and override the feature flags. You will get a warning that you're manually overriding feature flags.

```js
posthog.feature_flags.override(['feature-flag-1', 'feature-flag-2'])
```

This will persist until you call override again with the argument `false`:

```js
posthog.feature_flags.override(false)
```

To see the feature flags that are currently active for you, you can call:

```js
posthog.feature_flags.getFlags()
```
Loading

1 comment on commit a0eba18

@vercel
Copy link

@vercel vercel bot commented on a0eba18 Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.