Appropriate training data is a requirement for building good machine-learned models. In this project, we study the notion of coverage for ordinal and continuous-valued attributes, by formalizing the intuition that the learned model can accurately predict only at data points for which there are "enough" similar data points in the training data set. We develop an efficient algorithm to identify uncovered regions in low-dimensional attribute feature space, by making a connection to Voronoi diagrams. We also develop a randomized approximation algorithm for use in high-dimensional attribute space.
[1] Abolfazl Asudeh, Nima Shahbazi, Zhongjun Jin, H. V. Jagadish. Identifying Insufficient Data Coverage for Ordinal Continuous-Valued Attributes. SIGMOD, 2021, ACM.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
What things you need to install the software and how to install them
In console
mvn clean install
In Eclipse or other IDE, all packages should be automatically installed once imported. ished
Explain how to run the automated tests for this system
Command line arguments to use when running test scripts
Option | Descriptions | Has arguments | Allow multiple values |
---|---|---|---|
-a | selected attribute values | Yes | Yes |
-e | epsilon values | Yes | Yes |
-h | show help | No | |
-i | input dataset data file name | Yes | No |
-k | k values | Yes | Yes |
-n | number of query points | Yes | Yes |
-o | if store test result in a file | No | |
-p | number of repeats | Yes | No |
-phi | phi values | Yes | Yes |
-r | rho values | Yes | Yes |
-s | input dataset schema file name | Yes | No |
Format
mvn -e exec:java@accuracy -Dexec.args="{command-line-arguments}"
Example
mvn -e exec:java@accuracy -Dexec.args="-i data/iris.data -s data/iris.schema -a sepalLength sepalWidth petalLength -k 3 -r 0.05 0.1 0.15 -n 2000 -p 100 -e 0.1 0.2 -phi 0.1 0.2"
Format
mvn -e exec:java@accuracy -Dexec.args="{command-line-arguments}"
Example
mvn -e exec:java@efficiency -Dexec.args="-i data/iris.data -s data/iris.schema -a sepalLength sepalWidth -k 2 -r 0.05 0.1 0.15 -n 1000 2000 -p 100"
In Eclipse or other IDE, run src/test/java/umichdb/coverage2/TestCoverageChecker.java
This project is licensed under the MIT License - see the LICENSE.md file for details