From 5104d67510288159b2c9ea535e837ff014119731 Mon Sep 17 00:00:00 2001 From: Feng George Yu Date: Thu, 19 Sep 2024 21:29:03 -0400 Subject: [PATCH] Create AQP_with_error_assessment_on_large_datasets.yaml --- ...th_error_assessment_on_large_datasets.yaml | 27 +++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 projects/AQP_with_error_assessment_on_large_datasets.yaml diff --git a/projects/AQP_with_error_assessment_on_large_datasets.yaml b/projects/AQP_with_error_assessment_on_large_datasets.yaml new file mode 100644 index 000000000..9dee6a1c3 --- /dev/null +++ b/projects/AQP_with_error_assessment_on_large_datasets.yaml @@ -0,0 +1,27 @@ +Department: Computer Science and Information Systems +Description: "Approximate query processing (or AQP) is an emerging research topic\ + \ in big data analytics. AQP focuses on deriving fast and accurate estimations for\ + \ complex queries that are usually time-consuming and expensive to run on large\ + \ datasets. Traditional methods, such as histogram and sketch, are insufficient\ + \ when applied to big data because of the processing limits. An essential question\ + \ lacking research is how to assess the errors of AQP estimations.\nThis project\ + \ focuses on assessing the errors of AQP query estimations, especially for common\ + \ selection queries. Traditional methods can generate confidence intervals for query\ + \ estimations based on strict assumptions such as the normal distribution assumption.\ + \ Therefore, they are not applicable to massive datasets. In this project, the PI\ + \ will employ a novel non-parametric statistical method called bootstrap sampling\ + \ which requires less strict assumptions and brings many statistical advantages.\n\ + A prototype system will be developed employing bootstrap sampling to efficiently\ + \ compute standard errors and confidence intervals for AQP systems, especially those\ + \ answering selection queries, namely \u03C3-AQP. Selection queries comprise a large\ + \ portion of daily data queries. For broader applications, this framework will allow\ + \ selection queries to include common aggregation operators such as average, sum,\ + \ and count. The PI will investigate the computing and storage costs when bootstrap\ + \ replicas are computed. A framework will be developed to automate both the AQP\ + \ estimation and error estimation operations. Extensive benchmarks will be performed\ + \ on large datasets such as the TPC-H benchmark." +FieldOfScience: Computer Science +FieldOfScienceID: '11.0701' +InstitutionID: Unknown +Organization: Youngstown State University +PIName: Feng Yu