Primed and Fitted models

obsidiandynamics · Nov 9, 2023 · 168ba23 · 168ba23
1 parent 2a64315
commit 168ba23
Show file tree

Hide file tree

Showing 19 changed files with 347 additions and 205 deletions.
diff --git a/README.md b/README.md
@@ -22,7 +22,7 @@ use stanza::renderer::Renderer;
 use brumby::display::DisplaySlice;
 use brumby::file::ReadJsonFile;
 use brumby::market::{Market, OverroundMethod};
-use brumby::model::{Calibrator, Config, WinPlace};
+use brumby::model::{Fitter, FitterConfig, WinPlace, Model};
 use brumby::model::cf::Coefficients;
 use brumby::model::fit::FitOptions;
 use brumby::print;
@@ -53,13 +53,13 @@ fn main() -> Result<(), Box<dyn Error>> {
         28.0,
     ];
 
-    // load coefficients from a file and create a calibrator for model fitting
+    // load coefficients from a file and create a fitter
     let coefficients = Coefficients::read_json_file(PathBuf::from("config/thoroughbred.cf.json"))?;
-    let config = Config {
+    let config = FitterConfig {
         coefficients,
         fit_options: FitOptions::fast() // use the default presents in production; fast presets are used for testing
     };
-    let calibrator = Calibrator::try_from(config)?;
+    let fitter = Fitter::try_from(config)?;
 
     // fit Win and Place probabilities from the supplied prices, undoing the overrounds
     let wp_markets = WinPlace {
@@ -72,10 +72,10 @@ fn main() -> Result<(), Box<dyn Error>> {
     let overrounds = wp_markets.extrapolate_overrounds()?;
 
     // fit a model using the Win/Place prices and extrapolated overrounds
-    let model = calibrator.fit(wp_markets, &overrounds)?.value;
+    let model = fitter.fit(&wp_markets, &overrounds)?.value;
 
     // nicely format the derived price matrix
-    let table = print::tabulate_derived_prices(&model.top_n.as_price_matrix());
+    let table = print::tabulate_derived_prices(&model.prices().as_price_matrix());
     println!("\n{}", Console::default().render(&table));
 
     // simulate a same-race multi for a chosen selection vector using the previously fitted model
@@ -111,17 +111,17 @@ Note, when all rows are identical, the biased model behaves identically to the n
 
 Take, for example, a field of 6 with win probabilities _P_ = (0.05, 0.1, 0.25, 0.1, 0.35, 0.15). For a two-place podium, _W_ might resemble the following:
 
-_W_<sub>1,_</sub> = (0.05, 0.1, 0.25, 0.1, 0.35, 0.15) = _P_
+_W_<sub>1,_</sub> = (0.05, 0.1, 0.25, 0.1, 0.35, 0.15) = _P_;
 
-_W_<sub>2,_</sub> = (0.09, 0.13, 0.22, 0.13, 0.28, 0.15)
+_W_<sub>2,_</sub> = (0.09, 0.13, 0.22, 0.13, 0.28, 0.15).
 
-In other words, the high-probability runners have had their relative ranking probabilities penalised, while low-probability runners were instead boosted. This reflects our updated assumption that low(/high)-probability runners are under(/over)estimated to place by a naive model.
+In other words, the high-probability runners have had their relative ranking probabilities suppressed, while low-probability runners were instead boosted. This reflects our updated assumption that low(/high)-probability runners are under(/over)estimated to place by a naive model.
 
-A pertinent questions is how to assign the relative probabilities in rows 2–_N_, given _P_ and possibly other data. An intuitive approach is to fit the probabilities based on historical data. Brumby uses a linear regression model with a configurable set of regressors. For example, a third degree polynomial comprising runner prices and the field size. (Which we found to be a reasonably effective predictor.) Distinct models may be used for different race types, competitor classes, track conditions, and so forth. The fitting process is performed offline; its output is a set of regression factor and coefficient pairs.
+A pertinent questions is how to assign the relative probabilities in rows 2–_N_, given _P_ and possibly other data. An intuitive approach is to fit the probabilities based on historical data. Brumby uses a linear regression model with a configurable set of regressors. For example, a third degree polynomial comprising runner prices and the field size. (Which we found to be a reasonably effective predictor.) Distinct models may be used for different race types, competitor classes, track conditions, and so forth. The fitting process is performed offline; its output is a set of regression factors and corresponding coefficients.
 
 The offline-fitted model does not cater to specific biases present in individual races and, crucially, it does not protect the operator of the model against _internal arbitrage_ opportunities. Let the Place market be paying _X_ places, where _X_ is typically 2 or 3. When deriving the Top-1.._N_ price matrix solely from Win prices, it is possible that the Top-_X_ prices differ from the Places price when the latter are sourced from an alternate model. This creates an internal price incoherency, where a semi-rational bettor will select the higher of the two prices, all other terms being equal. In the extreme case, the price difference may expose value in the bet and even enable rational bettors to take a risk-free position across a pair of incoherent markets.
 
-This problem is ideally solved by unifying the models so that the Place prices are taken directly from the Top-1.._N_ matrix. Often this is not viable, particularly when the operator sources its headline Win and Place markets from a commodity pricing supplier and/or applies manual price overrides on select runners. As such, Brumby allows the fitting of the Top-_X_ prices to the offered Place prices. The fitting is entirely online, typically following a price update, iterating while adjusting _W_<sub>_X_, _</sub> until the Top-_X_ prices match the Place prices within some margin of error. 
+This problem is ideally solved by unifying the models so that the Place prices are taken directly from the Top-1.._N_ matrix. Often this is not viable, particularly when the operator sources its Win and Place markets from a commodity pricing supplier and/or trades them manually. As such, Brumby allows the fitting of the Top-_X_ prices to the offered Place prices. The fitting is entirely online, typically following a price update, iterating while adjusting _W_<sub>_X_, _</sub> until the Top-_X_ prices match the Place prices within some acceptable margin of error. 
 
 Fitting of the Top-_X_ market to the Place market is a _closed loop_ process, using the fitted residuals to moderate subsequent adjustments and eventually terminate the fitting process. In each iteration, for every rank _i_ and every runner _j_, a price is fitted and compared with the sample price. The difference is used to scale the probability at _W_<sub>_i_,_j_</sub>. For example, let the fitted price _f_ be 2.34 and the sample price _s_ be 2.41 for runner 5 in rank 3. The adjustment factor is _s_ / _f_ = 1.03. _W′_<sub>3,5</sub> = _W_<sub>3,5</sub> × 1.03.
 

diff --git a/benches/cri_mc_engine.rs b/benches/cri_mc_engine.rs
@@ -27,7 +27,7 @@ fn criterion_benchmark(c: &mut Criterion) {
     let mut bitmap = [true; 14];
     let mut totals = [1.0; 4];
     let mut engine = MonteCarloEngine::default()
-        .with_iterations(1_000)
+        .with_trials(1_000)
         .with_bitmap(CaptureMut::Borrowed(&mut bitmap))
         .with_totals(CaptureMut::Borrowed(&mut totals))
         .with_podium(CaptureMut::Borrowed(&mut podium))

diff --git a/examples/basic.rs b/examples/basic.rs
@@ -25,7 +25,7 @@ fn main() {
 
     // create an MC engine for reuse
     let mut engine = mc::MonteCarloEngine::default()
-        .with_iterations(100_000)
+        .with_trials(100_000)
         .with_probs(Capture::Owned(
             DilatedProbs::default()
                 .with_win_probs(Capture::Borrowed(&probs))

diff --git a/examples/multi.rs b/examples/multi.rs
@@ -7,7 +7,7 @@ use stanza::renderer::Renderer;
 use brumby::display::DisplaySlice;
 use brumby::file::ReadJsonFile;
 use brumby::market::{Market, OverroundMethod};
-use brumby::model::{Calibrator, Config, WinPlace};
+use brumby::model::{Fitter, FitterConfig, WinPlace, Model};
 use brumby::model::cf::Coefficients;
 use brumby::model::fit::FitOptions;
 use brumby::print;
@@ -38,13 +38,13 @@ fn main() -> Result<(), Box<dyn Error>> {
         28.0,
     ];
 
-    // load coefficients from a file and create a calibrator
+    // load coefficients from a file and create a fitter
     let coefficients = Coefficients::read_json_file(PathBuf::from("config/thoroughbred.cf.json"))?;
-    let config = Config {
+    let config = FitterConfig {
         coefficients,
         fit_options: FitOptions::fast(),
     };
-    let calibrator = Calibrator::try_from(config)?;
+    let fitter = Fitter::try_from(config)?;
 
     // fit Win and Place probabilities from the supplied prices, undoing the effect of the overrounds
     let wp_markets = WinPlace {
@@ -57,10 +57,10 @@ fn main() -> Result<(), Box<dyn Error>> {
     let overrounds = wp_markets.extrapolate_overrounds()?;
 
     // fit a model using the Win/Place prices and extrapolated overrounds
-    let model = calibrator.fit(wp_markets, &overrounds)?.value;
+    let model = fitter.fit(&wp_markets, &overrounds)?.value;
 
     // nicely format the derived prices
-    let table = print::tabulate_derived_prices(&model.top_n.as_price_matrix());
+    let table = print::tabulate_derived_prices(&model.prices().as_price_matrix());
     println!("\n{}", Console::default().render(&table));
 
     // simulate a same-race multi for a chosen selection vector using the previously fitted model

diff --git a/justfile b/justfile
@@ -41,6 +41,10 @@ test:
     cargo doc --no-deps
     cargo bench --no-run --profile dev
 
+# run clippy with pedantic checks
+clippy:
+    cargo clippy -- -D clippy::pedantic -A clippy::must-use-candidate -A clippy::struct-excessive-bools -A clippy::single-match-else -A clippy::inline-always -A clippy::cast-possible-truncation -A clippy::cast-precision-loss -A clippy::items-after-statements
+
 # install Rust
 install-rust:
     curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
diff --git a/src/bin/datadump.rs b/src/bin/datadump.rs
@@ -101,28 +101,28 @@ fn main() -> Result<(), Box<dyn Error>> {
                 Market::fit(&OVERROUND_METHOD, prices, rank as f64 + 1.0)
             })
             .collect();
-        let fit_outcome = fit::fit_all(FitOptions::default(), &markets)?;
+        let fit_outcome = fit::fit_all(&FitOptions::default(), &markets)?;
         debug!(
             "individual fitting complete: stats: {:?}, probs: \n{}",
             fit_outcome.stats,
             fit_outcome.fitted_probs.verbose()
         );
 
-        let num_runners = markets[0].probs.len();
+        let runners = markets[0].probs.len();
         let active_runners = markets[0].probs.iter().filter(|&&prob| prob != 0.).count();
         let stdev = markets[0].probs.stdev();
-        for runner in 0..num_runners {
+        for runner in 0..runners {
             if markets[0].probs[runner] != 0.0 {
                 let mut record = Record::with_capacity(Factor::COUNT);
-                record.set(Factor::RaceId, race.id);
-                record.set(Factor::RunnerIndex, runner);
-                record.set(Factor::ActiveRunners, active_runners);
-                record.set(Factor::PlacesPaying, race.places_paying);
-                record.set(Factor::Stdev, stdev);
-                record.set(Factor::Weight0, fit_outcome.fitted_probs[(0, runner)]);
-                record.set(Factor::Weight1, fit_outcome.fitted_probs[(1, runner)]);
-                record.set(Factor::Weight2, fit_outcome.fitted_probs[(2, runner)]);
-                record.set(Factor::Weight3, fit_outcome.fitted_probs[(3, runner)]);
+                record.set(Factor::RaceId, &race.id);
+                record.set(Factor::RunnerIndex, &runner);
+                record.set(Factor::ActiveRunners, &active_runners);
+                record.set(Factor::PlacesPaying, &race.places_paying);
+                record.set(Factor::Stdev, &stdev);
+                record.set(Factor::Weight0, &fit_outcome.fitted_probs[(0, runner)]);
+                record.set(Factor::Weight1, &fit_outcome.fitted_probs[(1, runner)]);
+                record.set(Factor::Weight2, &fit_outcome.fitted_probs[(2, runner)]);
+                record.set(Factor::Weight3, &fit_outcome.fitted_probs[(3, runner)]);
                 debug!("{record:?}");
                 csv.append(record)?;
                 csv.flush()?;

diff --git a/src/bin/evaluate.rs b/src/bin/evaluate.rs
@@ -18,7 +18,7 @@ use brumby::data::{EventDetailExt, PlacePriceDeparture, PredicateClosures, RaceS
 use brumby::file::ReadJsonFile;
 use brumby::market::{Market, OverroundMethod};
 use brumby::model::cf::Coefficients;
-use brumby::model::{fit, Calibrator, Config, TopN, WinPlace};
+use brumby::model::{fit, Fitter, FitterConfig, TopN, WinPlace};
 
 const OVERROUND_METHOD: OverroundMethod = OverroundMethod::Multiplicative;
 const TOP_SUBSET: usize = 25;
@@ -83,7 +83,7 @@ fn main() -> Result<(), Box<dyn Error>> {
             EventType::Harness => unimplemented!(),
         };
         debug!("loading {race_type} config from {filename}");
-        let config = Config {
+        let config = FitterConfig {
             coefficients: Coefficients::read_json_file(filename)?,
             fit_options: Default::default(),
         };
@@ -108,7 +108,7 @@ fn main() -> Result<(), Box<dyn Error>> {
         );
         let departure = race_file.race.place_price_departure();
         let race = race_file.race.summarise();
-        let calibrator = Calibrator::try_from(configs[&race.race_type].clone())?;
+        let calibrator = Fitter::try_from(configs[&race.race_type].clone())?;
         let sample_top_n = TopN {
             markets: (0..race.prices.rows())
                 .map(|rank| {
@@ -123,7 +123,7 @@ fn main() -> Result<(), Box<dyn Error>> {
             places_paying: race.places_paying,
         };
         let sample_overrounds = sample_top_n.overrounds()?;
-        let model = calibrator.fit(sample_wp, &sample_overrounds)?.value;
+        let model = calibrator.fit(&sample_wp, &sample_overrounds)?.value;
         let derived_prices = model.top_n.as_price_matrix();
         let errors: Vec<_> = (0..derived_prices.rows())
             .map(|rank| {

diff --git a/src/bin/prices.rs b/src/bin/prices.rs
@@ -2,7 +2,7 @@ use std::env;
 use std::error::Error;
 use std::path::PathBuf;
 
-use anyhow::bail;
+use anyhow::{anyhow, bail};
 use clap::Parser;
 use racing_scraper::models::{EventDetail, EventType};
 use stanza::renderer::console::Console;
@@ -16,8 +16,8 @@ use brumby::display::DisplaySlice;
 use brumby::file::ReadJsonFile;
 use brumby::market::{Market, Overround, OverroundMethod};
 use brumby::model::cf::Coefficients;
-use brumby::model::fit::compute_msre;
-use brumby::model::{fit, Calibrator, Config, TopN, WinPlace, PODIUM};
+use brumby::model::fit::{compute_msre, FitOptions};
+use brumby::model::{fit, Fitter, FitterConfig, TopN, WinPlace, PODIUM, Model, Primer};
 use brumby::print::{tabulate_derived_prices, tabulate_prices, tabulate_probs, tabulate_values};
 use brumby::selection::Selections;
 
@@ -35,6 +35,10 @@ struct Args {
 
     /// selections to price
     selections: Option<Selections<'static>>,
+
+    /// model type
+    #[clap(short = 'm', long, value_parser = parse_model_type, default_value = "fitted")]
+    model: ModelType
 }
 impl Args {
     fn validate(&self) -> anyhow::Result<()> {
@@ -47,6 +51,19 @@ impl Args {
     }
 }
 
+#[derive(Debug, Clone)]
+enum ModelType {
+    Primed,
+    Fitted
+}
+fn parse_model_type(s: &str) -> anyhow::Result<ModelType> {
+    match s.to_lowercase().as_str() {
+        "primed" => Ok(ModelType::Primed),
+        "fitted" => Ok(ModelType::Fitted),
+        _ => Err(anyhow!("unsupported model type {s}")),
+    }
+}
+
 #[tokio::main]
 async fn main() -> Result<(), Box<dyn Error>> {
     if env::var("RUST_BACKTRACE").is_err() {
@@ -83,11 +100,6 @@ async fn main() -> Result<(), Box<dyn Error>> {
             })
             .collect(),
     };
-
-    let calibrator = Calibrator::try_from(Config {
-        coefficients,
-        fit_options: Default::default(),
-    })?;
     let sample_wp = WinPlace {
         win: sample_top_n.markets[0].clone(),
         place: sample_top_n.markets[race.places_paying - 1].clone(),
@@ -129,14 +141,29 @@ async fn main() -> Result<(), Box<dyn Error>> {
         );
     }
 
-    let model = calibrator.fit(sample_wp, &sample_overrounds)?;
-    debug!("fitted {model:?}");
-    let model = model.value;
+    let fit_options = FitOptions::default();
+    let model: Box<dyn Model> = match args.model {
+        ModelType::Primed => {
+            let primer = Primer::try_from(coefficients)?;
+            let model = primer.prime(&sample_wp.win, sample_wp.places_paying, fit_options.mc_trials, &sample_overrounds)?;
+            debug!("fitted {model:?}");
+            Box::new(model.value)
+        }
+        ModelType::Fitted => {
+            let calibrator = Fitter::try_from(FitterConfig {
+                coefficients,
+                fit_options
+            })?;
+            let model = calibrator.fit(&sample_wp, &sample_overrounds)?;
+            debug!("fitted {model:?}");
+            Box::new(model.value)
+        }
+    };
 
-    let probs_table = tabulate_probs(&model.fit_outcome.fitted_probs);
+    let probs_table = tabulate_probs(model.weighted_probs());
     println!("{}", Console::default().render(&probs_table));
 
-    let derived_prices = model.top_n.as_price_matrix();
+    let derived_prices = model.prices().as_price_matrix();
     let table = tabulate_derived_prices(&derived_prices);
     info!("\n{}", Console::default().render(&table));
 

diff --git a/src/capture.rs b/src/capture.rs
@@ -1,5 +1,5 @@
-//! [Capture] is a minimalistic analogue of [Cow](std::borrow::Cow) that relaxes the [ToOwned] constrain while
-//! supporting [?Sized](Sized) types. [CaptureMut] extends [Capture] with support for mutable references.
+//! [`Capture`] is a minimalistic analogue of [`Cow`](std::borrow::Cow) that relaxes the [`ToOwned`] constrain while
+//! supporting [`?Sized`](Sized) types. [`CaptureMut`] extends [`Capture`] with support for mutable references.
 
 use std::borrow::{Borrow, BorrowMut};
 use std::ops::{Deref, DerefMut};
@@ -9,9 +9,6 @@ pub enum Capture<'a, W: Borrow<B>, B: ?Sized> {
     Owned(W),
     Borrowed(&'a B),
 }
-// impl<W: Borrow<B> + Default, B: ?Sized> Capture<'_, W, B> {
-//     pub fn
-// }
 
 impl<'a, W: Borrow<B> + Default, B: ?Sized> Default for Capture<'a, W, B> {
     fn default() -> Self {

diff --git a/src/csv.rs b/src/csv.rs
@@ -24,7 +24,7 @@ impl CsvWriter {
         R::Item: AsRef<str>,
     {
         let mut first = true;
-        for datum in record.into_iter() {
+        for datum in record {
             if first {
                 first = false;
             } else {
@@ -94,8 +94,8 @@ impl Record {
         Self { items }
     }
 
-    pub fn set(&mut self, ordinal: impl Into<usize>, value: impl ToString) {
-        self.items[ordinal.into()] = Cow::Owned(value.to_string())
+    pub fn set(&mut self, ordinal: impl Into<usize>, value: &impl ToString) {
+        self.items[ordinal.into()] = Cow::Owned(value.to_string());
     }
 
     pub fn len(&self) -> usize {

diff --git a/src/linear/regression.rs b/src/linear/regression.rs
@@ -227,7 +227,7 @@ impl<O: AsIndex> RegressionModel<O> {
             table.push_row(Row::new(
                 Styles::default(),
                 vec![
-                    format!("{:?}", regressor).into(),
+                    format!("{regressor:?}").into(),
                     format!("{:.8}", self.predictor.coefficients[regressor_index]).into(),
                     format!("{:.6}", self.std_errors[regressor_index]).into(),
                     format!("{:.6}", self.p_values[regressor_index]).into(),