Skip to content
/ SDE-E Public

Data Sketches of data streams on Flink with persistent storage capabilities based on A.Kontaxakis previous work.

License

Notifications You must be signed in to change notification settings

petroud/SDE-E

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

And Synopses for All: a Synopses Data Engine for Extreme Scale Analytics-as-a-Service

SDEaaS: Synopses Data Engine-as-a-Service

*Cite as:

@article{sdeaas, author = {Antonios Kontaxakis and Nikos Giatrakos and Dimitris Sacharidis and Antonios Deligiannakis}, title = {{And Synopses for All: a Synopses Data Engine for Extreme Scale Analytics-as-a-Service}}, journal = {Inf. Syst.}, volume = {116}, year = {2023}, } *

Input MAIN CLASS (Run)

args[0]={@link #kafkaDataInputTopic} DEFAULT: "Forex")
args[1]={@link #kafkaRequestInputTopic} DEFAULT: "Requests")
args[2]={@link #kafkaOutputTopic} (DEFAULT: "OUT")
args[3]={@link #kafkaBrokersList} (DEFAULT: "localhost:9092")
args[4]={@link #parallelism} Job parallelism (DEFAULT: "4")

SDE reads the Data and the Requests from a Kafka Topic each, given as arguments. Based on the Request's KEY - VALUE pair SDE performs different actions.

List of Available Synopsis.

SynopsisID Synopsis Estimation Parameters Estimates Mostly Used Parameters
1 CountMin KEY Count Frequent Itemsets KeyField, ValueField,OperationMode, epsilon, cofidence, seed
2 BloomFilter KEY Member of a Set Membership KeyField, ValueField,OperationMode, numberOfElements, FalsePositive
3 AMS KEY L2 norm, innerProduct, Count Frequent Itemsets KeyField, ValueField,OperationMode, Depth, Buckets
4 DFT similarity score Fourier Coefficients Correlation KeyField, ValueField, timeField,OperationMode,Interval in Seconds, Basic Window Size in Seconds, Sliding Window Size in Seconds , #coefficients
5 LSH none BucketID - Projected features Correlation KeyField, ValueField,OperationMode, windowSize, Dimensions, numberOfBuckets
6 Coresets theNumberOfClustersK Coresets used for kmeans Clustering KeyField, ValueField,OperationMode, maxBucketSize,dimensions
7 FM Sketch none Cardinality Cardinality keyField, ValueField, OperationMode, Bitmap size, epsilon relative error, probabilityofFailure
8 HyperLogLog none Cardinality Cardinality keyField, ValueField, OperationMode, rsd ( relative standard deviation )
9 StickySampling KEY FrequentItems, isFrequent, Count Frequent Itemsets keyField, ValueField, OperationMode, support, epsilon, probabilityofFailure
10 LossyCounting KEY Count, FrequentItems Frequent Itemsets keyField, ValueField, OperationMode, epsilon ( the maximum error bound )
11 ChainSampler none Sample of the data Sampling keyField, ValueField, OperationMode, size of sample, size of the window
12 GKQuantiles KEY Quantile Quantiles keyField , ValueField, OperationMode, epsilon ( the maximum error bound )
13 MarinetimeSKetch none Ship positions(Sample) Sampling keyField, ValueField, OperationMode, minsamplingperiod, minimumDistance, speed(knots) ,corse(degrees)
14 TopK none TopK TopK keyField, ValueField, OperationMode, numberOfK, countDown
15 OptimalDistributedWindowSampling none Sample of the data Sampling keyField, ValueField, OperationMode, windowSize
16 OptimalDistributedSampling none Sample of the data Sampling keyField, ValueField, OperationMode
17 WindowedQuantiles KEY Quantile Quantiles keyField , ValueField, OperationMode, epsilon ( the maximum error bound ),windowSize
18 Radius Sketch Family similarity score List of streams/windows similarity/distance KeyField, ValueField, OperationMode,Group Size, Sketch Size,Window Size, Number of Groups, Threshold
30 Spatial Sketch KEY, minX, maxX, minY, maxY Count in spatial range Count KeyField, ValueField, OperationMode, BasicSketchParameters, BasicSketchSynopsisID, minX, maxX, minY, maxY, maxResolution
31 OmniSketch Attr1, Attr1Value, Attr2, Attr2Value, ... Count with multi-dimensional predicates Count KeyField, ValueField, OperationMode, #Attributes, delta, epsilon, B, b, seed

|

Full list of available Requests

RequestID OperationType Description
1 ADD add
2 DELETE delete a Synopsis
3 ESTIMATE request an estimation of a queryable Synopsis
4 ADD add Synopsis with Random partioning
5 ADD add continuous Synopsis
6 ESTIMATE request a more advance estimation
7 UPDATE update a Synopsis state
777 SYNOPSISLIST Get list of synopses

Messages Example

Input Data Example

{
  "values" : 
	  "{"time":"02/19/2019 06:07:14",
	  "StockID":"ForexALLNoExpiry",
	  "price":"110.11"}",
  "streamID" : "ForexALLNoExpiry",
  "dataSetkey" : "Forex"
}

Request Example for adding new Synopses

{
  "streamID" : "INTEL",
  "synopsisID" : 4,
  "requestID" : 1,
  "dataSetkey" : "Forex",
  "param" : [ "StockID", "price","Queryable", “1“, “720", “3600", "8" ],
  "noOfP" : 4,
  "uid" : 1110
}

Getting Started

Contact

For any questions don't hesitate to ask:

Antonios Kontaxakis, antonios.kontaxakis-ATNOSPAM-ulb.be

Nikos Giatrakos, ngiatrakos-ATNOSPAM-tuc.gr

Dimitris Sacharidis, dimitris.sacharidis-ATNOSPAM-ulb.be

Antonios Deligiannakis, adeligiannakis-ATNOSPAM-tuc.gr

About

Data Sketches of data streams on Flink with persistent storage capabilities based on A.Kontaxakis previous work.

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •  

Languages