Skip to content
This repository has been archived by the owner on Jun 29, 2021. It is now read-only.

Frequently Asked Questions

ali1rathore edited this page Sep 10, 2018 · 1 revision

SNAP FAQ

What is Sparkline SNAP?

SNAP is a scale-out Enterprise B.I platform, available natively on Apache Spark, to power operational, exploratory and advanced analytics. SNAP connects easily to Tableau and other B.I tools using Spark SQL/JDBC/ODBC and consumes data from any Spark data source such as S3/HDFS or relational databases.

SNAP can be used with Python Notebooks and R for combining traditional business intelligence with machine learning in one workflow.

What are the core components Sparkline SNAP?

SNAP consists of the following four components

SNAP Metadata

Define B.I. Models comprising of Star Schemas, Cubes, Hierarchies, Dimensions mapped to data in any kind of Spark Table.

SNAP Cube File format

A Cube file format that provides very fast access and partial Aggregation to arbitrary regions of a multi-dimensional space. Uses columnar storage and inverted indexes Provides special features for optimizing B.I. query patterns like Metric Binning and Timestamp extraction functions.

SNAP Optimizer

An enhancement of the Spark Logical and Physical Optimizer specialized for B.I. Query Patterns; uses the Metadata to perform optimizations like StarJoin elimination, Eager/Partial Aggregation etc.

SNAP Run time

An enhancement of the Spark Runtime that has specialized operations for Cube processing and is optimized to serve Cube Segments to the Spark Operator pipelines.

What business problems does SNAP address?

SNAP is designed to do the following 3 things - Provides instant response times for Drag and Drop, slice and dice multidimensional analysis on large volume datasets. - Enables fast responses to complex queries involving hierarchy navigation, dimension joins allocations and attribution etc. - Enables analysts to operate on all data and live data eliminating performance lag

Our analysts are used to SQL - Does SNAP support SQL?

Yes, SNAP is accessed using Spark SQL

We already have B.I tools - Tableau, OBIEE, Spotfire - How does SNAP work with them?

SNAP plugs into these tools using the Spark SQL connectors. SNAP is also optimized to work with SQL generated by these tools.

Tableau, Qlik, Spotfire, Domo - All these BI tools say they have in-memory fast engines. How are you different?

Tableau has extracts and others have in-memory engines. But when operating on Big Data, these engines fall short. SNAP is a distributed scale-out compute platform built on Apache Spark. These tools were designed to work with summarized datasets extracted and organized in SQL server, MY SQL or Postgres. They do not work well when enterprise data resides in data lakes on S3 or HDFS. Their in-memory layers are restricted to single nodes and do not scale out.

How is this different from other indexing technologies like Druid?

SNAP Vs Druid Pls see comparison chart

Why can’t I do the same with Elastic Search or Splunk or Thoughtspot?

Search is not B.I - Otherwise, Google would have no need for Big Query or SQL on Big Query. Search based architectures are limited to faceted navigation of a few dimensions and summary metrics. They are not a substitute for Enterprise B.I.

Can these not be solved by Redshift/SQL on Hadoop or Impala?

B.I is more than reporting. Most leading companies have a deployment of Cubes or Advanced analytics platforms for deriving real insights. Reporting by querying data can be accomplished on any database whether it is Redshift, MySQL, Oracle or SQL Server. But rows and columns without metadata models describing the business entity relationships cannot help with getting deeper insights. Just as with machine learning and data science, where the value is in modeling features, with B.I one has to model the data for deriving insights. Enterprise B.I is Data science and is different than just querying data in tables.

So, Is this the same as OLAP cubes?

OLAP cubes, are popular to serve analytics needs beyond reporting. Products like SQL Server Analysis Services, Cognos, Oracle Essbase and more are used to derive a high value for applications ranging from planning, forecasting, allocations etc. SNAP can be used for much of the same use cases. Where it is different from the OLAP cubes is in its ability to scale out with Big Data and avoid pre-aggregating data along pre-defined dimensions. SNAP can keep data at the detail level without moving it into a separate cluster and physically SNAP uses a highly optimized file format with indexing rather than pre-aggregated data structures. Hence SNAP is well suited to modern data lakes and high volume high variety data.

What happens to cubes that were built using Hyperion/Cognos etc?

These tools play a huge part in Business analysis. SNAP fits in well to plug the serious gap these tools have. drill through. For example, from within Hyperion if one has to drill through to various levels of detail from a summary cube to the data in Hadoop, it is almost impossible. With SNAP, plugged into OBIEE( Oracle Business Intelligence Explorer), such drill throughs can return data in seconds.

What does it cost to operationally manage SNAP?

SNAP is built on Apache Spark. So all it needs to run is a Spark cluster.

Do I need data in Hadoop to use SNAP?

While SNAP works well with Hadoop, your data does not need to be in Hadoop for SNAP to work. For example, SNAP works on S3 without HDFS.

Does this run only on AWS or on-premise as well?

SNAP is available on Oracle Big Data Cloud post our recent acquisition by Oracle.

Serverless options like Athena and Big Query provide high performance - How does SNAP compare?

Athena and Big Query are efficient if all you need is SQL reporting on flat data. So instead of using Hive or Impala, you can use these if you are willing to move data to their cloud. A recent customer benchmark showed SNAP to be 3x faster than Athena for even SQL reporting. However SNAP is more than just SQL reporting and can handle complex query workloads that are not possible with Athena or Big Query included star schema joins, modeling for type 1 changes etc.

What does it take to build a SNAP Index or LOAD data into SNAP?

SNAP’s operations are SQL based. Loading data into SNAP is a simple “insert” statement using Spark SQL. For users of Big Data it is similar to converting text files to Parquet or Orc. Inserting into SNAP is essentially transforming a source data into our file format for optimized reads.

Why is SNAP built on Spark?

SNAP is designed for Enterprise B.I and Data science workloads where query response times matter a lot. Apache Spark is the best compute platform for B.I/A.I workloads. Further Spark is independent of Hadoop and is much simpler to deploy and manage. Spark also provides a robust set of tools and platform components to build scale out applications.

Can I use SNAP for “Query as a Service”?

We have customers using SNAP to serve visualizations on Web applications, with response times of 100 ms. In these cases a query is issued to SNAP using REST with high selectivity. For example if your website is populating a visualization showing top 10 regions with ad revenue over the past 10 days with a drill into top 5 campaigns from there and then into top creative from there SNAP can return those queries in sub seconds even when hundreds of concurrent requests are being issued. Expanding the above example, SNAP can also be used for more complex queries showing comparison of campaigns month to month by creative or channel for a given advertiser, a ratio of clicks of a specific creative to sum of clicks for all the creatives in a campaign to measure effectiveness. The power is in being able do such exploratory analysis on ANY dimension for any metric since these are not pre-canned, pre-aggregated metrics and KPIs.

Clone this wiki locally