Presto Community Roadmap Discussion April 6, 2017

Attendance: Facebook, Teradata, Twitter, Jet, Walmart, Netflix, Bloomberg, Turbine/WB,Innominds & others (please add your company name here!)

Facebook (Martin Traverso)

1. Overview of what's new in Presto

*all community contributions since the last roadmap meeting over a year ago)

Data types: VARCHAR(n), DECIMAL, REAL, TINYINT, SMALLINT, INTEGER, CHAR(n)

Improved ROW type

Language:

Support for correlated subqueries
EXISTS
INTERSECT/EXCEPT
Quantified comparisons
Selective aggregates
Non-equality outer joins
Lambda expressions

CREATE TABLE IF NOT EXISTS

Prepared statements

EXPLAIN ANALYZE GRANT/REVOKE SHOW CREATE [TABLE | VIEW] CREATE/DROP/RENAME SCHEMA

Implicit coercions for INSERT

Performance improvements:

JOIN/GROUP BY
window functions
approx_percentile, map_agg, array_agg
local parallelism
communication between coordinator-worker
adaptive concurrency
intra-node parallelism for intermediate stages

Pluggable event listeners

Resource groups

Hive connector

Kerberos support
Optimized RCFile reader
Transactional DELETE+INSERT
Writing to bucketed tables
Support for mismatched table/partition schema

Cassandra connector improvements

New connectors

MongoDB
Accumulo

2. Upcoming / planned features

Language:

GROUPING
Ordered aggregations

Optimizer Spill to disk

ZSTD support for ORC reader/writer Optimized RCFile writer Optimized ORC writer

Bucket-by-bucket execution

Performance and scalability improvements

Coordinator CPU utilization
Structural types
Intra-task scheduling/prioritization
Memory-aware scheduling
HTTP/2

Connectors

TPC-DS
Thrift
SQL Server

SSL-based authentication

Also Vaughn indicated FB is working on concrete plans for increasing PR throughput.

Teradata (Kamil Bajda-Pawlikowski)

1. Functionality delivered in the past year

*some may not be yet merged upstream but they are already shipped in the Teradata distro (www.teradata.com/presto)

Connectors

Cassandra (major improvements)
MS SQL Server
TPC-DS
Memory

Security

Kerberized Hadoop support
LDAP Authentication
GRANT / REVOKE / ROLES

Misc

JDBC & ODBC by Simba (free to use)
BI Tools support & certifications

SQL syntax

DECIMAL, REAL, CHAR
Subqueries
EXISTS / EXCEPT / INTERSECT
Non-equi joins
grouping()
Prepared statements

Performance

Windowing functions
Joins

2. Roadmap for 2017

Spill to disk

Aggregation, Join, and other operators

Statistics for Hive connector

Collected in Hive Metastore

Execution Engine performance

Distributed sorting
Runtime filtering

Cost-Based Optimizer

Selectivity / cost calculator
Join ordering
Join distribution type selection

Twitter (Bill Graham)

LZO Thrift (Persisted)

The vast majority of data on Twitter’s internal HDFS are in LZO Thrift format.
Working on adding support as a new input format in the Presto-hive module.

Security - start up UI on a different port

Presto currently uses a single port for all HTTP communications (e.g. UI, JMX, client API, coordination, query execution). It would be helpful to be able to configure the use of different ports to have more granular control of firewall settings.
Launch new module in a new port and bind the UI and related instances from the main app into new module
https://github.com/prestodb/presto/pull/7106

Nested Schema evolution/push down for Parquet

For non-primitive fields, Presto currently compares the type in a particular partition to what's in HMS. It doesn't support the evolution on those fields. If we can pass the type to the specific reader and give the chance that allows reader to figure out the conversion from partition schema to table schema, it will reduce much cost of maintaining the metadata(for example, data with old schema for a table, data with new schema for another table) or reproducing old data with a new schema.
Depends on https://github.com/prestodb/presto/pull/4714
Yaliang’s patch: https://github.com/prestodb/presto/pull/7305
Pruning nested fields: https://github.com/prestodb/presto/pull/6675

Dain (FB) commented: "Nezih is working on getting everyone focused on the new optimized parquet reader to get rid of the old one and then getting all the patches together"

Tableau connection

The issue with Tableau support is with the secure connection. The tricky part is Presto embeds the next url to get the next chunk of result in the response and in http and sneaks through the proxy.
Connection via JDBC is closed source in Teradata

Improved resource pooling

Scheduled Zeppelin queries are starting to hog the cluster, after midnight. We're exploring how to best mitigate that effect and will be looking into Presto resource groups functionality.

Bloomberg (Adam Shook)

interested in leveraging stats for the accumulo connector

Walmart (???)

Using Swift api connector to connect to cloud storage.
also using Teradata JDBC drivers

Miscellaneous Questions

Are there plans to implement WholeStageCodeGeneration?

TD has a prototype from a hackathon, we'll probably be working on that in the second half of the year

Are we planning to work on supporting OR predicates in tuple domain in addition to AND?

There's an open PR, we may do it that way, or we may expose whole expression tree to connectors

Is concern about extending the SPI still holding back support for nested data types?

We may be able to do it without changing the SPI by creating virtual columns, and then from the API's perspective it's just a regular column

Planning time increase from 0.160 for table with 600 columns

Please try latest version because we had some regressions that we fixed. File an issue on github and we'll look at it.

Can we have hooks to determine the number of splits based on the number of worker threads?

splits are sized so that they're not too small (so you don't spend tons of cpu time setting up the computation and not doing work) and not too large (Because don't want high latency for query). For the hive connector we have a heuristic of starting with small splits and then increase the size in order to help short running queries.

Where's the Raptor documentation? (Lawrence from Jet)

We're rewriting it to be transactional. And it won't be backwards compatible. Once there's a new version out there and using it in production and we have a story for upgrading, etc. then we'll document. It could still be used now as a cache (because then you don't care that it's not backwards compatible)