Skip to content

Commit

Permalink
upgraded pkg version and more refactoring
Browse files Browse the repository at this point in the history
  • Loading branch information
ricfog committed Feb 19, 2025
1 parent 9fc3ad4 commit 81ee20c
Show file tree
Hide file tree
Showing 5 changed files with 356 additions and 226 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ For the difference estimator under simple random sampling, run

```python
evaluator = ModelPerformanceEvaluator(Yh=Yh, budget=n) # initialize sampler
sampled_idx = evaluator.sample(sampling_method="srs") # 2. sample
sampled_idx = evaluator.sample(sampling_method="srs") # 3. sample
Yl = Y[sampled_idx] # 4. annotate
estimate, variance_estimate = evaluator.compute_estimate(Yl, estimator="df") # 5. estimate
print(estimate, variance_estimate)
Expand Down
19 changes: 13 additions & 6 deletions examples/sampling-and-estimation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"How would you choose $n$ observations from a total of $N$ to effectively estimate (say) the accuracy of a classifier? For example, imagine that our budget is limited and we can only annotate $n=100$ examples from data of size $N=10^{7}$! \n",
"\n",
"In this notebook, we show how to \n",
"* Sample via simple random sampling (SRS) and stratified simple random sampling (SSRS) with proportional and Neyman allocation, all without replacement\n",
"* Sample with stratified simple random sampling (SSRS) with proportional allocation\n",
"* Estimate the metric of interest $\\mathbb{E}[Z]$ with the Horvitz-Thompson (HT) and difference (DF) estimators\n",
"\n",
"Besides estimating the value of the metric, we also computs its variance, which would allow us to create confidence intervals for the estimates. \n",
Expand Down Expand Up @@ -163,15 +163,22 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Mean is [0.09030375]\n",
"Variance is [0.00086577]\n"
"Mean is [0.12947919]\n",
"Variance is [0.00100167]\n",
"Mean is [0.13163876]\n",
"Variance is [0.00100172]\n"
]
}
],
"source": [
"estimates = evaluator.compute_estimate(performance[sampled_idx])\n",
"print('Mean is ', estimates[0])\n",
"print('Variance is ', estimates[1])"
"estimates_ht = evaluator.compute_estimate(performance[sampled_idx])\n",
"print('Mean is ', estimates_ht[0])\n",
"print('Variance is ', estimates_ht[1])\n",
"\n",
"# for the difference estimator\n",
"estimates_df = evaluator.compute_estimate(performance[sampled_idx], estimator = \"df\")\n",
"print('Mean is ', estimates_df[0])\n",
"print('Variance is ', estimates_df[1])"
]
}
],
Expand Down
Loading

0 comments on commit 81ee20c

Please sign in to comment.