Skip to content

Commit

Permalink
Update README.md (#441)
Browse files Browse the repository at this point in the history
  • Loading branch information
wmoustafa authored Aug 10, 2023
1 parent 4956719 commit 3684ed5
Showing 1 changed file with 30 additions and 23 deletions.
53 changes: 30 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,24 @@
<img src="docs/coral-logo.jpg" width="400" title="Coral Logo">
</p>

**Coral** is a library for analyzing, processing, and rewriting views defined in the Hive Metastore, and sharing them
across multiple execution engines. It performs SQL translations to enable views expressed in HiveQL (and potentially
other languages) to be accessible in engines such as [Trino (formerly PrestoSQL)](https://trino.io/),
[Apache Spark](https://spark.apache.org/), and [Apache Pig](https://pig.apache.org/).
Coral not only translates view definitions between different SQL/non-SQL dialects, but also rewrites expressions to
produce semantically equivalent ones, taking into account the semantics of the target language or engine.
For example, it automatically composes new built-in expressions that are equivalent to each built-in expression in the
source view definition. Additionally, it integrates with [Transport UDFs](https://github.com/linkedin/transport)
to enable translating and executing user-defined functions (UDFs) across Hive, Trino, Spark, and Pig. Coral is under
active development. Currently, we are looking into expanding the set of input view language APIs beyond HiveQL,
and implementing query rewrite algorithms for data governance and query optimization.
**Coral** is a SQL translation, analysis, and rewrite engine. It establishes a standard intermediate representation,
Coral IR, which captures the semantics of relational algebraic expressions independently of any SQL dialect. Coral IR
is defined in two forms: one is the at the abstract syntax tree (AST) layer, and the other is at the logical plan layer.
Both forms are isomorphic and convertible to each other.

Coral exposes APIs for implementing conversions between SQL dialects and Coral IR in both directions.
Currently, Coral supports converting HiveQL and Spark SQL to Coral IR, and converting Coral IR to HiveQL, Spark SQL,
and Trino SQL. With multiple SQL dialects supported, Coral can be used to translate SQL statements and views defined in
one dialect to equivalent ones in another dialect. It can also be used to interoperate between engines and SQL-powered
data sources. For dialect conversion examples, see the modules [coral-hive](coral-hive), [coral-spark](coral-spark),
and [coral-trino](coral-trino).

Coral also exposes APIs for Coral IR rewrite and manipulation. This includes rewriting Coral IR expressions to produce
semantically equivalent, but more performant expressions. For example, Coral automates
incremental view maintenance by rewriting a view definition to an incremental one. See the module [coral-incremental](coral-incremental)
for more details. Other Coral rewrite applications include data governance and policy enforcement.

Coral can be used as a library in other projects, or as a service. See instructions below for more details.

## <img src="https://user-images.githubusercontent.com/10084105/141652009-eeacfab4-0e7b-4320-9379-6c3f8641fcf1.png" width="30" title="Slack Logo"> Slack

Expand All @@ -24,28 +31,27 @@ and implementing query rewrite algorithms for data governance and query optimiza

**Coral** consists of following modules:

- Coral-Hive: Converts definitions of Hive views with UDFs to equivalent view logical plan.
- Coral-Trino: Converts view logical plan to Trino (formerly PrestoSQL) SQL, and vice versa.
- Coral-Spark: Converts view logical plan to Spark SQL.
- Coral-Pig: Converts view logical plan to Pig-latin.
- Coral-Dbt [WIP]: DBT package that houses materialization modes that exercise Coral logic.
- Coral-Incremental [WIP]: Derives an incremental query from input SQL for incremental view maintenance.
- Coral-Hive: Converts HiveQL to Coral IR (can be typically used with Spark SQL as well).
- Coral-Trino: Converts Coral IR to Trino SQL. Converting Trino SQL to Coral IR is WIP.
- Coral-Spark: Converts Coral IR to Spark SQL (can be typically used with HiveQL as well).
- Coral-Dbt: Integrates Coral with DBT. It enables applying Coral transformations on DBT models.
- Coral-Incremental: Derives an incremental query from input SQL for incremental view maintenance.
- Coral-Schema: Derives Avro schema of view using view logical plan and input Avro schemas of base tables.
- Coral-Spark-Plan [WIP]: Converts Spark plan strings to equivalent logical plan.
- Coral-Visualization [WIP]: Visualizes Coral SqlNode and RelNode trees and renders them to an output file.
- Coral-Visualization: Visualizes Coral SqlNode and RelNode trees and renders them to an output file.
- Coral-Service: Service that exposes REST APIs that allow users to interact with Coral (see [Coral-as-a-Service](#Coral-as-a-Service) for more details).

## Version Upgrades

This project adheres to semantic versioning, where the format x.y.z represents major, minor, and patch version upgrades. Consideration should be given to potential changes required when integrating different versions of this project.

**'y' Upgrade**
**Major version Upgrade**

An 'y' upgrade represents a version change that introduces backward incompatibility by removal or modification of methods.
A major version upgrade represents a version change that introduces backward incompatibility by removal or renaming of classes.

**'x' Upgrade**
**Minor version Upgrade**

An 'x' upgrade signifies a version change that introduces backward incompatibility by affecting the availability of classes.
A minor version upgrade represents a version change that introduces backward incompatibility by removal or renaming of methods.

Please carefully review the release notes and documentation accompanying each version upgrade to understand the specific changes and the recommended steps for migration.

Expand Down Expand Up @@ -78,6 +84,7 @@ Please see the [Contribution Agreement](CONTRIBUTING.md).
## Resources

- [Coral: A SQL translation, analysis, and rewrite engine for modern data lakehouses](https://engineering.linkedin.com/blog/2020/coral), LinkedIn Engineering Blog, 12/10/2020.
- [Incremental View Maintenance with Coral, DBT, and Iceberg](https://www.slideshare.net/walaa_eldin_moustafa/incremental-view-maintenance-with-coral-dbt-and-iceberg), Tech Talk, Iceberg Meetup, 5/11/2023.
- [Coral & Transport UDFs: Building Blocks of a Postmodern Data Warehouse](https://www.slideshare.net/walaa_eldin_moustafa/coral-transport-udfs-building-blocks-of-a-postmodern-data-warehouse-229545076), Tech-talk, Facebook HQ, 2/28/2020.
- [Transport: Towards Logical Independence Using Translatable Portable UDFs](https://engineering.linkedin.com/blog/2018/11/using-translatable-portable-UDFs), LinkedIn Engineering Blog, 11/14/2018.
- [Dali Views: Functions as a Service for Big Data](https://engineering.linkedin.com/blog/2017/11/dali-views--functions-as-a-service-for-big-data), LinkedIn Engineering Blog, 11/9/2017.
Expand Down Expand Up @@ -196,7 +203,7 @@ curl --header "Content-Type: application/json" \
```
The translation result is:
```
Original query in Hive QL:
Original query in HiveQL:
SELECT * FROM db1.airport
Translated to Trino SQL:
SELECT "name", "country", "area_code", "code", "datepartition"
Expand Down

0 comments on commit 3684ed5

Please sign in to comment.