upgraded pkg version and more refactoring

amazon-science · Feb 19, 2025 · 81ee20c · 81ee20c
1 parent 9fc3ad4
commit 81ee20c
Show file tree

Hide file tree

Showing 5 changed files with 356 additions and 226 deletions.
diff --git a/README.md b/README.md
@@ -75,7 +75,7 @@ For the difference estimator under simple random sampling, run
 
 ```python
 evaluator = ModelPerformanceEvaluator(Yh=Yh, budget=n) # initialize sampler
-sampled_idx = evaluator.sample(sampling_method="srs") # 2. sample
+sampled_idx = evaluator.sample(sampling_method="srs") # 3. sample
 Yl = Y[sampled_idx] # 4. annotate
 estimate, variance_estimate = evaluator.compute_estimate(Yl, estimator="df") # 5. estimate
 print(estimate, variance_estimate)

diff --git a/examples/sampling-and-estimation.ipynb b/examples/sampling-and-estimation.ipynb
@@ -9,7 +9,7 @@
     "How would you choose $n$ observations from a total of $N$ to effectively estimate (say) the accuracy of a classifier? For example, imagine that our budget is limited and we can only annotate $n=100$ examples from data of size $N=10^{7}$! \n",
     "\n",
     "In this notebook, we show how to \n",
-    "* Sample via simple random sampling (SRS) and stratified simple random sampling (SSRS) with proportional and Neyman allocation, all without replacement\n",
+    "* Sample with stratified simple random sampling (SSRS) with proportional allocation\n",
     "* Estimate the metric of interest $\\mathbb{E}[Z]$ with the Horvitz-Thompson (HT) and difference (DF) estimators\n",
     "\n",
     "Besides estimating the value of the metric, we also computs its variance, which would allow us to create confidence intervals for the estimates.  \n",
@@ -163,15 +163,22 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Mean is  [0.09030375]\n",
-      "Variance is  [0.00086577]\n"
+      "Mean is  [0.12947919]\n",
+      "Variance is  [0.00100167]\n",
+      "Mean is  [0.13163876]\n",
+      "Variance is  [0.00100172]\n"
      ]
     }
    ],
    "source": [
-    "estimates = evaluator.compute_estimate(performance[sampled_idx])\n",
-    "print('Mean is ', estimates[0])\n",
-    "print('Variance is ', estimates[1])"
+    "estimates_ht = evaluator.compute_estimate(performance[sampled_idx])\n",
+    "print('Mean is ', estimates_ht[0])\n",
+    "print('Variance is ', estimates_ht[1])\n",
+    "\n",
+    "# for the difference estimator\n",
+    "estimates_df = evaluator.compute_estimate(performance[sampled_idx], estimator = \"df\")\n",
+    "print('Mean is ', estimates_df[0])\n",
+    "print('Variance is ', estimates_df[1])"
    ]
   }
  ],