Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable optimizations in the 'mixed' mode of Jackknife/Bootstrap.compute_on_sql. #199

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ Currently built-in metrics include:
Sum(denominator)`.
+ `Quantile(variable, quantile(s))`: calculates the `quantile(s)` quantile for
`variable`.
+ `Nth(variable, sort_by, n, ascending=True, dropna=False)` computes the `n`th
value after sorting by `sort_by`.
+ `Variance(variable, unbiased=True)`: calculates the variance of `variable`;
`unbiased` determines whether the unbiased (sample) or population estimate
is used.
Expand Down Expand Up @@ -300,10 +302,10 @@ It can help you to sanity check complex Metrics.

## SQL

You can get the SQL query for all built-in Metrics and Operations (except
weighted Quantile) by calling `to_sql(sql_data_source,
split_by)` on the Metric. `sql_data_source` could be a table or a subquery. The
dialect it uses is the [standard SQL](https://cloud.google.com/bigquery/docs/reference/standard-sql)
You can get the SQL query for all built-in Metrics and Operations by calling
`to_sql(sql_data_source, split_by)` on the Metric. `sql_data_source` could be a
table or a subquery. The dialect it uses is the
[standard SQL](https://cloud.google.com/bigquery/docs/reference/standard-sql)
in Google Cloud's BigQuery. For example,

```python
Expand Down
237 changes: 232 additions & 5 deletions meterstick_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2299,6 +2299,236 @@
"Mean('clicks', 'impressions').compute_on(df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OKK1H6_3qszU"
},
"source": [
"## Nth\n",
"\n",
"`Nth(var, sort_by, n, ascending=True, dropna=False)` computes the `n`th value of `var` after sorting by `sort_by`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"height": 81
},
"executionInfo": {
"elapsed": 55,
"status": "ok",
"timestamp": 1697859374809,
"user": {
"displayName": "",
"userId": ""
},
"user_tz": 420
},
"id": "qeZdhz-Zfpd1",
"outputId": "5f55b436-a454-49e2-928c-b2901173a613"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \u003cdiv id=\"df-78fb1da5-44ea-49d5-9127-76c891a02da1\" class=\"colab-df-container\"\u003e\n",
" \u003cdiv\u003e\n",
"\u003cstyle scoped\u003e\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"\u003c/style\u003e\n",
"\u003ctable border=\"1\" class=\"dataframe\"\u003e\n",
" \u003cthead\u003e\n",
" \u003ctr style=\"text-align: right;\"\u003e\n",
" \u003cth\u003e\u003c/th\u003e\n",
" \u003cth\u003e1st(clicks) sort by impressions asc\u003c/th\u003e\n",
" \u003c/tr\u003e\n",
" \u003c/thead\u003e\n",
" \u003ctbody\u003e\n",
" \u003ctr\u003e\n",
" \u003cth\u003e0\u003c/th\u003e\n",
" \u003ctd\u003e1.163701\u003c/td\u003e\n",
" \u003c/tr\u003e\n",
" \u003c/tbody\u003e\n",
"\u003c/table\u003e\n",
"\u003c/div\u003e\n",
" \u003cdiv class=\"colab-df-buttons\"\u003e\n",
"\n",
" \u003cdiv class=\"colab-df-container\"\u003e\n",
" \u003cbutton class=\"colab-df-convert\" onclick=\"convertToInteractive('df-78fb1da5-44ea-49d5-9127-76c891a02da1')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\"\u003e\n",
"\n",
" \u003csvg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\"\u003e\n",
" \u003cpath d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/\u003e\n",
" \u003c/svg\u003e\n",
" \u003c/button\u003e\n",
"\n",
" \u003cstyle\u003e\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" \u003c/style\u003e\n",
"\n",
" \u003cscript\u003e\n",
" const buttonEl =\n",
" document.querySelector('#df-78fb1da5-44ea-49d5-9127-76c891a02da1 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-78fb1da5-44ea-49d5-9127-76c891a02da1');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '\u003ca target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb\u003edata table notebook\u003c/a\u003e'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" \u003c/script\u003e\n",
" \u003c/div\u003e\n",
"\n",
" \u003c/div\u003e\n",
" \u003c/div\u003e\n"
],
"text/plain": [
" 1st(clicks) sort by impressions asc\n",
"0 1.163701"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Nth('clicks', 'impressions', 0).compute_on(df)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"executionInfo": {
"elapsed": 54,
"status": "ok",
"timestamp": 1697859834379,
"user": {
"displayName": "",
"userId": ""
},
"user_tz": 420
},
"id": "2Nk-ymNiVP4M",
"outputId": "57fd30f0-666d-4310-ac44-a257a20b2fc4"
},
"outputs": [
{
"data": {
"text/plain": [
"1 1.163701\n",
"Name: clicks, dtype: float64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sort_values('impressions').clicks.head(1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"executionInfo": {
"elapsed": 54,
"status": "ok",
"timestamp": 1697093923046,
"user": {
"displayName": "",
"userId": ""
},
"user_tz": 420
},
"id": "V6uR6l6nrQ-u",
"outputId": "bbfc39de-bc38-4ea3-86f9-e3a6359197c2"
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# n can be negative and it's equivalent to reversing n and ascending together.\n",
"Nth('x', 'y', -1) == Nth('x', 'y', 0, False)"
]
},
{
"cell_type": "markdown",
"metadata": {
Expand Down Expand Up @@ -18261,12 +18491,9 @@
"source": [
"#SQL\n",
"\n",
"You can easily get SQL query for all built-in Metrics and Operations, except for weighted Quantile/CV/Correlation/Cov, by calling\n",
"\n",
"\u003e to_sql(sql_table, split_by).\n",
"You can easily get SQL query for all built-in Metrics and Operations by calling `to_sql(sql_table, split_by)`.\n",
"\n",
"You can also directly execute the query by calling\n",
"\u003e compute_on_sql(sql_table, split_by, execute, melted),\n",
"You can also directly execute the query by calling `compute_on_sql(sql_table, split_by, execute, melted)`,\n",
"\n",
"where `execute` is a function that can execute SQL queries. The return is very similar to compute_on().\n",
"\n",
Expand Down
Loading
Loading