Skip to content

Agenda Presto Community Roadmap Discussion 3.9.2016

Martin Traverso edited this page Apr 15, 2016 · 12 revisions

Agenda

Roadmap Discussion

  • Facebook to summarize their roadmap for 2016
  • Teradata to summarize their roadmap for 2016
  • What would the community like to see that hasn't been mentioned in roadmap plans?
  • Discuss how to provide a unified roadmap for which the community can collaborate

Pull Requests

  • Can we get the Pull Requests to a manageable number?
  • Need the community to help go through and do initial reviews, suggest stale PRs to close, etc.

Community

  • Should we have a reoccurring meeting?
  • Should we try other communication methods other than IRC and the mailing list
  • Meetups

Notes from the call

Facebook

Features:

  • grouping sets, aggregation,
  • Additional data type support: int/small int/..,
  • How to run the UDF functions that people write in hive in Presto
  • Improve performance: we don’t see gains that we expected to see vs hive.
  • custom version of ORC writer — fast implementation
  • Scalability issues, stability issues
  • Improve resource management - clients, users,
  • Implement optimization to allow to run large workloads
  • Internal projects: migrate pipelines to Presto — based on Presto running internal data stores
  • Materialized query tables — can view them as MVs — to speed up queries
  • Use cases to run on Raptor
  • Make Presto more robust, stable and scalable
  • Make query engine understand the physical layout of the data so that queries can run efficiently

Teradata

In Progress:

  • Decimal
  • Kerberos
  • Non equi joins
  • More subquery support
  • Grant privileges
  • JDBC version 1 (4.0, 4.1, 4.2)
  • ODBC version 2

2016:

  • Spill to disk
  • Community Continuous Integration
  • BI Tool Integration / Certification
  • Broader SQL Support -- ex. correlated subqueries, TPC-DS and TPC-H
  • Performance
  • Security

Uber

  • Geo Spatial Functions
  • Nested Schema Evolution for Parquet
  • New Parquet Reader
  • Projection Pushdown for Structs in Parquet
  • Upgrade to Parquet 1.8.1

Netflix

  • Better resource management & scheduling: Workload isolation between interactive & non-interactive queries
  • BI support: Tableau, Microstrategy, etc.. Happy to help with improving the support.
  • Improve S3 integration: Assume role support.
  • Improve Parquet support

Amazon

  • Put up sandbox because of customers’ support
  • Authorization

Twitter

  • Parquet reader improvements (e.g., filter pushdown for nested datasets)
  • Support for LZO/Thrift

Action Items

Clone this wiki locally