GitHub - sameeragarwal/blinkdb: BlinkDB: Sub-Second Approximate Queries on Very Large Data.

Queries with Bounded Errors and Bounded Response Times on Very Large Data

BlinkDB is a large-scale data warehouse system built on Shark and Spark and is designed to be compatible with Apache Hive. It can answer HiveQL queries up to 200-300 times faster than Hive by executing them on user-specified samples of data and providing approximate answers that are augmented with meaningful error bars. BlinkDB 0.1.0 is an alpha developer release that supports creating/deleting samples on any input table and/or materialized view and executing approximate HiveQL queries with those aggregates that have statistical closed forms (i.e., AVG, SUM, COUNT, VAR and STDEV).

BlinkDB requires:

Scala 2.10.x
Spark 0.9.x

Name		Name	Last commit message	Last commit date
Latest commit History 1,271 Commits
bin		bin
conf		conf
data/files		data/files
hive_blinkdb @ 8a5d550		hive_blinkdb @ 8a5d550
lib		lib
project		project
sbt		sbt
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
run		run
scalastyle-config.xml		scalastyle-config.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Queries with Bounded Errors and Bounded Response Times on Very Large Data

BlinkDB requires:

For current documentation, see the BlinkDB Wiki.

For more information about the BlinkDB Project, see the BlinkDB Website.

About

Releases

Packages

Contributors 23

Languages

License

sameeragarwal/blinkdb

Folders and files

Latest commit

History

Repository files navigation

Queries with Bounded Errors and Bounded Response Times on Very Large Data

BlinkDB requires:

For current documentation, see the BlinkDB Wiki.

For more information about the BlinkDB Project, see the BlinkDB Website.

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 23

Languages

Packages