-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Using vw regr
The vw-regr
script is included in the main Vowpal Wabbit distribution, under the /utl
directory. The script is particularly useful for processing and manipulating datasets with examples having continuously-valued labels. This is the traditional regression framework, exemplified in the batch setting by the closed-form formulas for ordinary least squares (OLS) and ridge regression.
The script is written in Ruby but has no external dependencies beyond the standard installation. More specifically, Ruby gems or the Rails web framework are not required.
vw-regr
expects to read a stream of Vowpal Wabbit examples from STDIN
, and expects to write its results (usually modified examples) to STDOUT
. Feedback and logging is written to STDERR
. Therefore a typical call to vw-regr
would like the following:
$ cat house_dataset |./vw-regr --min_ex_val=-0.01 >selected_house_dataset
This example will filter (remove) any example from the house_dataset
that has a numeric label of less than -0.01 or -1%. Note that we do not redirect STDERR with the 2>
bash shell syntax, so feedback appears on our console.
Here are the other basic vw-regr
functions, which can be combined on the same command-line:
--min_ex_val=X
Output just those examples with a label greater than or equal to X. This helps to filter outlier examples.
--max_ex_val=X
Output just those examples with a label less than or equal to X. This also helps to filter outlier examples.
--min_num_feats=X
Output just those example with at least X features across all namespaces. This helps to filter examples that are too sparse.
--max_num_feats=X
Output just those example with no more than X features across all namespaces.
--pos_ex_val_imp=X
Output the input examples but setting the importance of positively-signed examples to X. This helps to emphasize positive examples during training, especially useful for imbalanced datasets (over-sampling).
--neg_ex_val_imp=X
Output the input examples but setting the important of negatively-signed examples to X. This helps to emphasize negative examples during training, especially useful for imbalanced datasets (over-sampling).
--to_class
Output the input examples but ”binning“ all positively-signed examples to a +1 label, and all negatively-signed examples to a -1 label. This converts the dataset from a regression framework to classification, and is especially useful for testing logistic loss on a regression-style dataset.
--ex_val_desc
Output a description of the input examples, including the number of examples with positive sign, the number with negative sign and the mean & standard deviation of the example labels.
--vals
Output just the example labels one per line, without importance, tags and features. This is good for importing a column of “correct” example labels into a column in a spreadsheet.
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: