-
Notifications
You must be signed in to change notification settings - Fork 4
Frequently Asked Questions
SNAP is a scale-out Enterprise B.I platform, available natively on Apache Spark, to power operational, exploratory and advanced analytics. SNAP connects easily to Tableau and other B.I tools using Spark SQL/JDBC/ODBC and consumes data from any Spark data source such as S3/HDFS or relational databases.
SNAP can be used with Python Notebooks and R for combining traditional business intelligence with machine learning in one workflow.
SNAP consists of the following four components
Define B.I. Models comprising of Star Schemas, Cubes, Hierarchies, Dimensions mapped to data in any kind of Spark Table.
A Cube file format that provides very fast access and partial Aggregation to arbitrary regions of a multi-dimensional space. Uses columnar storage and inverted indexes Provides special features for optimizing B.I. query patterns like Metric Binning and Timestamp extraction functions.
An enhancement of the Spark Logical and Physical Optimizer specialized for B.I. Query Patterns; uses the Metadata to perform optimizations like StarJoin elimination, Eager/Partial Aggregation etc.
An enhancement of the Spark Runtime that has specialized operations for Cube processing and is optimized to serve Cube Segments to the Spark Operator pipelines.
SNAP is designed to do the following 3 things - Provides instant response times for Drag and Drop, slice and dice multidimensional analysis on large volume datasets. - Enables fast responses to complex queries involving hierarchy navigation, dimension joins allocations and attribution etc. - Enables analysts to operate on all data and live data eliminating performance lag
Yes, SNAP is accessed using Spark SQL
SNAP plugs into these tools using the Spark SQL connectors. SNAP is also optimized to work with SQL generated by these tools.
Tableau, Qlik, Spotfire, Domo - All these BI tools say they have in-memory fast engines. How are you different?
Tableau has extracts and others have in-memory engines. But when operating on Big Data, these engines fall short. SNAP is a distributed scale-out compute platform built on Apache Spark. These tools were designed to work with summarized datasets extracted and organized in SQL server, MY SQL or Postgres. They do not work well when enterprise data resides in data lakes on S3 or HDFS. Their in-memory layers are restricted to single nodes and do not scale out.
SNAP Vs Druid Pls see comparison chart
Search is not B.I - Otherwise, Google would have no need for Big Query or SQL on Big Query. Search based architectures are limited to faceted navigation of a few dimensions and summary metrics. They are not a substitute for Enterprise B.I.
B.I is more than reporting. Most leading companies have a deployment of Cubes or Advanced analytics platforms for deriving real insights. Reporting by querying data can be accomplished on any database whether it is Redshift, MySQL, Oracle or SQL Server. But rows and columns without metadata models describing the business entity relationships cannot help with getting deeper insights. Just as with machine learning and data science, where the value is in modeling features, with B.I one has to model the data for deriving insights. Enterprise B.I is Data science and is different than just querying data in tables.
OLAP cubes, are popular to serve analytics needs beyond reporting. Products like SQL Server Analysis Services, Cognos, Oracle Essbase and more are used to derive a high value for applications ranging from planning, forecasting, allocations etc. SNAP can be used for much of the same use cases. Where it is different from the OLAP cubes is in its ability to scale out with Big Data and avoid pre-aggregating data along pre-defined dimensions. SNAP can keep data at the detail level without moving it into a separate cluster and physically SNAP uses a highly optimized file format with indexing rather than pre-aggregated data structures. Hence SNAP is well suited to modern data lakes and high volume high variety data.
These tools play a huge part in Business analysis. SNAP fits in well to plug the serious gap these tools have. drill through. For example, from within Hyperion if one has to drill through to various levels of detail from a summary cube to the data in Hadoop, it is almost impossible. With SNAP, plugged into OBIEE( Oracle Business Intelligence Explorer), such drill throughs can return data in seconds.
SNAP is built on Apache Spark. So all it needs to run is a Spark cluster.
While SNAP works well with Hadoop, your data does not need to be in Hadoop for SNAP to work. For example, SNAP works on S3 without HDFS.
SNAP is available on Oracle Big Data Cloud post our recent acquisition by Oracle.
Athena and Big Query are efficient if all you need is SQL reporting on flat data. So instead of using Hive or Impala, you can use these if you are willing to move data to their cloud. A recent customer benchmark showed SNAP to be 3x faster than Athena for even SQL reporting. However SNAP is more than just SQL reporting and can handle complex query workloads that are not possible with Athena or Big Query included star schema joins, modeling for type 1 changes etc.
SNAP’s operations are SQL based. Loading data into SNAP is a simple “insert” statement using Spark SQL. For users of Big Data it is similar to converting text files to Parquet or Orc. Inserting into SNAP is essentially transforming a source data into our file format for optimized reads.
SNAP is designed for Enterprise B.I and Data science workloads where query response times matter a lot. Apache Spark is the best compute platform for B.I/A.I workloads. Further Spark is independent of Hadoop and is much simpler to deploy and manage. Spark also provides a robust set of tools and platform components to build scale out applications.
We have customers using SNAP to serve visualizations on Web applications, with response times of 100 ms. In these cases a query is issued to SNAP using REST with high selectivity. For example if your website is populating a visualization showing top 10 regions with ad revenue over the past 10 days with a drill into top 5 campaigns from there and then into top creative from there SNAP can return those queries in sub seconds even when hundreds of concurrent requests are being issued. Expanding the above example, SNAP can also be used for more complex queries showing comparison of campaigns month to month by creative or channel for a given advertiser, a ratio of clicks of a specific creative to sum of clicks for all the creatives in a campaign to measure effectiveness. The power is in being able do such exploratory analysis on ANY dimension for any metric since these are not pre-canned, pre-aggregated metrics and KPIs.