data-compare

This is a tool to compare data in Hyena and Presto. It is intended to be a part of the Hyena development/testing pipeline.

Building

$ lein uberjar

Usage

$ java -jar target/uberjar/data-compare-0.1.0-SNAPSHOT-standalone.jar [args]
where args are:
-h, --help           Print this help
-m, --min TIMESTAMP  Lower bound for timestamps
-x, --max TIMESTAMP  Upper bound for timestamps
-f, --file FILENAME  Name of the file describing data columns

More details

The input for the tool is:

timestamps range
a file with three columns.

The columns are:

Hyena column name (one or many)
Kudu's table_name.column_name (one or many)
[Optional] metainformation informing the tool about conversion between the columns

For example, IP numbers in Kudu are stored as two i64 columns, and in Hyena either 2 u64 columns or 1 u128 column. Tool needs to be told to convert one format to another, or both to a common format. The specification of the metainformation: TBD.

Basing on the column names, the tool will generate a SQL queries for Drill and Presto, then run them minding the provided timestamps range (probably adding LIMIT and OFFSET for batch compares to limit the amount of data downloaded from the DB in one go).

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
src/data_compare		src/data_compare
test/data_compare		test/data_compare
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
project.clj		project.clj
test_schema_big.csv		test_schema_big.csv
test_schema_small.csv		test_schema_small.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data-compare

Building

Usage

More details

About

Releases

Packages

Languages

License

FCG-LLC/hyena-data-compare

Folders and files

Latest commit

History

Repository files navigation

data-compare

Building

Usage

More details

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages