Skip to content

Commit

Permalink
Correct typos, make Markdown cells more markdowny (#231)
Browse files Browse the repository at this point in the history
  • Loading branch information
habi authored Nov 17, 2022
1 parent c2aa3de commit 702f19d
Showing 1 changed file with 15 additions and 5 deletions.
20 changes: 15 additions & 5 deletions machine-learning/parallel-prediction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Sometimes you'll train on a smaller dataset that fits in memory, but need to predict or score for a much larger (possibly larger than memory) dataset. Perhaps your [learning curve](http://scikit-learn.org/stable/modules/learning_curve.html) has leveled off, or you only have labels for a subset of the data.\n",
"Sometimes you'll train on a smaller dataset that fits in memory, but need to predict or score for a much larger (possibly larger than memory) dataset.\n",
"Perhaps your [learning curve](http://scikit-learn.org/stable/modules/learning_curve.html) has leveled off, or you only have labels for a subset of the data.\n",
"\n",
"In this situation, you can use [ParallelPostFit](http://ml.dask.org/modules/generated/dask_ml.wrappers.ParallelPostFit.html) to parallelize and distribute the scoring or prediction steps."
]
Expand All @@ -25,7 +26,7 @@
"source": [
"from dask.distributed import Client, progress\n",
"\n",
"# Scale up: connect to your own cluster with bmore resources\n",
"# Scale up: connect to your own cluster with more resources\n",
"# see http://dask.pydata.org/en/latest/setup.html\n",
"client = Client(processes=False, threads_per_worker=4,\n",
" n_workers=1, memory_limit='2GB')\n",
Expand Down Expand Up @@ -155,9 +156,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"y_pred is Dask arary. Workers can write the predicted values to a shared file system, without ever having to collect the data on a single machine.\n",
"`y_pred` is a Dask array.\n",
"Workers can write the predicted values to a shared file system, without ever having to collect the data on a single machine.\n",
"\n",
"Or we can check the models score on the entire large dataset. The computation will be done in parallel, and no single machine will have to hold all the data."
"Or we can check the models score on the entire large dataset.\n",
"The computation will be done in parallel, and no single machine will have to hold all the data."
]
},
{
Expand All @@ -168,6 +171,13 @@
"source": [
"clf.score(X_large, y_large)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand All @@ -186,7 +196,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
"version": "3.10.6"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 702f19d

Please sign in to comment.