Skip to content

Latest commit

 

History

History
1550 lines (698 loc) · 103 KB

README.md

File metadata and controls

1550 lines (698 loc) · 103 KB

Researches and Practices in Autonomous Databases

【English | 中文

Continuously update the autonomous database works based on our past tutorials.

Kindly let us know if we have missed any great papers. Thank you!

Conference deadlines: https://github.com/ccfddl/ccf-deadlines

(Note conference postponement may not be promptly synchronized, so just consider it as a reference.)

Table of Contents


Great talks you should not miss >>

Make Your Database System Dream of Electric Sheep : Towards Self-Driving Operation. Andy Pavlo, Matthew Butrovich, Lin Ma, et al. [link]

Towards instance-optimized data systems. Tim Kraska. [link]

AI-Native Database. Guoliang Li. [link]

From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management. Immanuel Trummer. [link]

Retrieval-based Language Models and Applications. Akari Asai, Sewon Min, Zexuan Zhong, Danqi Chen. [link]


0. Survey and Tutorial

Survey

Database meets deep learning: Challenges and opportunities.

Wei Wang, Meihui Zhang, Gang Chen, et al. SIGMOD Record, 2016. [paper]

Database Meets Artificial Intelligence: A Survey.

Xuanhe Zhou, Chengliang Chai, Guoliang Li, et al. TKDE, 2020. [paper]

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration.

Hai Lan, Zhifeng Bao, Yuwei Peng. Data Science and Engineering, 2021. [paper]

A Survey on Deep Reinforcement Learning for Data Processing and Analytics.

Qingpeng Cai, Can Cui, Yiyuan Xiong, et al. TKDE, 2022. [paper]

Self-Driving Database Papers (CMU Spring Course). 2022.

https://15799.courses.cs.cmu.edu/spring2022/schedule.html

Automatic Database Knob Tuning: A Survey.

Xinyang Zhao, Xuanhe Zhou, Guoliang Li. TKDE, 2023. [paper] [code]

Tutorial

From auto-tuning one size fits all to self-designed and learned data-intensive systems.

Stratos Idreos, Tim Kraska. SIGMOD, 2019. [paper]

Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems.

Jiaheng Lu, Yuxing Chen, Herodotos Herodotou, Shivnath Babu. VLDB, 2019. [paper] [slides]

Tutorial: Adaptive Replication and Partitioning in Data Systems.

Brad Glasbergen, Michael Abebe, Khuzaima Daudjee. Middleware, 2018. [paper]

A Tutorial on Learned Multi-dimensional Indexes.

Abdullah Al-Mamun, Hao Wu, Walid G. Aref. SIGSPATIAL, 2020. [paper]

AI Meets Database: AI4DB and DB4AI.

Guoliang Li, Xuanhe Zhou, Lei Cao. SIGMOD, 2021. [paper] [slides]

Machine Learning for Databases.

Guoliang Li, Xuanhe Zhou, Lei Cao. VLDB, 2021. [paper][slides]

Machine Learning for Cloud Data Systems: the Promise, the Progress, and the Path Forward.

Alekh Jindal, Matteo Interlandi. VLDB, 2021. [paper]

Workload-Aware Performance Tuning for Autonomous DBMSs.

Zhengtong Yan, Jiaheng Lu, Naresh Chainani, et al. ICDE, 2021. [paper]

Learned Query Optimizer: At the Forefront of AI-Driven Databases.

Zhu, Rong, Ziniu Wu, Chengliang Chai, et al. EDBT, 2022. [paper]

From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management.

Immanuel Trummer. VLDB, 2022. [paper]

1. Database Configuration

Knob Tuner

Heuristic

PGTune: https://pgtune.leopard.in.ua.

OpenTuner: An Extensible Framework for Program Autotuning

Ansel J, Kamil S, Veeramachaneni K, et al. PACT, 2014. [paper]

BestConfig: Tapping the Performance Potential of Systems via Automatic Configuration Tuning

Zhu Y, Liu J, Guo M, et al. SoCC, 2017. [paper]

BestConfig: Tapping the Performance Potential of Systems via Automatic Configuration Tuning

Zhu Y, Liu J, Guo M, et al. SoCC, 2017. [paper]

An Efficient Transfer Learning Based Configuration Adviser for Database Tuning

Xinyi Zhang, Hong Wu, Yang Li, et al. VLDB, 2023. [paper]


BO-based

Tuning Database Configuration Parameters with iTuned

Duan, S., Thummala, V., & Babu, S. VLDB, 2009. [paper]

Automatic database management system tuning through large-scale machine learning

Van Aken D, Pavlo A, Gordon G J, et al. SIGMOD, 2017. [paper]

Black or White? How to Develop an AutoTuner for Memory-based Analytics

Kunjir M, Babu S. SIGMOD, 2020. [paper]

ResTune: Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases

Zhang X, Wu H, Chang Z, et al. SIGMOD, 2021. [paper]

CGPTuner: a Contextual Gaussian Process Bandit Approach for the Automatic Tuning of IT Configurations Under Varying Workload Conditions

Cereda S, Valladares S, Cremonesi P, et al. VLDB, 2021. [paper]

Towards Dynamic and Safe Configuration Tuning for Cloud Databases

Zhang X, Wu H, Li Y, et al. SIGMOD, 2022. [paper]

LlamaTune: Sample-Efficient DBMS Configuration Tuning

Kanellis K, Ding C, Kroth B, et al. VLDB, 2022. [paper]


DL-based

iBTune: Individualized Buffer Tuning for Large-scale Cloud Databases

Jian Tan, Tieying Zhang, Feifei Li, et al. VLDB, 2019. [paper]


RL-based

An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning

Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, et al. SIGMOD, 2019. [paper]

QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning

Li G, Zhou X, Li S, et al. VLDB, 2019. [paper]

Watuning: A workload-aware tuning system with attention-based deep reinforcement learning

Ge J K, Chai Y F, Chai Y P. JCST, 2021. [paper]

The Case for NLP-Enhanced Database Tuning: Towards Tuning Tools that "Read the Manual"

Trummer I. VLDB, 2021. [paper]

DB-BERT: a Database Tuning Tool that “Reads the Manual”

Trummer I. SIGMOD, 2022. [paper]

HUNTER- An Online Cloud Database Hybrid Tuning System for Personalized Requirements

Cai B, Liu Y, Zhang C, et al. SIGMOD, 2022. [paper]


Knob Selection

SARD: A statistical approach for ranking database tuning parameters

Debnath B K, Lilja D J, Mokbel M F. ICDE Workshops 2008. [paper]

Too Many Knobs to Tune? Towards Faster Database Tuning by Pre-selecting Important Knobs

Kanellis K, Alagappan R, Venkataraman S. HotStorage 2020. [paper]


Benefit Estimation

IWEK: An Interpretable What-If Estimator for Database Knobs

Yu Yan, Hongzhi Wang, Jian Geng, et al. arXiv 2023. [paper]


Experiments

An inquiry into machine learning-based automatic configuration tuning services on real-world database management systems

Van Aken D, Yang D, Brillard S, et al. VLDB, 2021. [paper]

Facilitating Database Tuning with Hyper-Parameter Optimization- A Comprehensive Experimental Evaluation

Zhang X, Chang Z, Li Y, et al. VLDB, 2022. [paper]


View Advisor

Selecting subexpressions to materialize at datacenter scale

A. Jindal, K. Karanasos, S. Rao, and H. Patel. PVLDB, 11(7):800–812, 2018. [paper]

Automated generation of materialized views in Oracle

Ahmed, R., Bello, R., Witkowski, A., & Kumar, P. (2020). VLDB, 2020. [paper]

Computation reuse in analytics job service at microsoft

Jindal, A., Qiao, S., Patel, H., Yin, Z., Di, J., Bag, M., Friedman, M., Lin, Y., Karanasos, K. and Rao, S., SIGMOD, 2018 (pp. 191-203). [paper]

Automatic View Generation for Equivalent Subqueries with Deep Learning and Reinforcement Learning

Yuan, H., Sun, J., & Li, G. (2020). ICDE, 2020. [paper]

An Autonomous Materialized View Management System with Deep Reinforcement Learning

Han, Y., Li, G., Yuan, H., & Sun, J. ICDE, 2021. [paper]

AutoView: An Autonomous Materialized View Management System with Encoder-Reducer

Han, Y., Li, G., Yuan, H. and Sun, J., TKDE, 2022. [paper]

Dynamic Materialized View Management using Graph Neural Network

Yue Han, Chengliang Chai, Jiabin Liu, Guoliang Li, Chuangxian Wei, Chaoqun Zhan. ICDE 2023. [paper]

A novel coral reefs optimization algorithm for materialized view selection in data warehouse environments

Azgomi, H. and Sohrabi, M.K., Applied Intelligence, 2019, 49, pp.3965-3989. [paper]

DBSP: Automatic Incremental View Maintenance for Rich Query Languages

Mihai Budiu, Tej Chajed, Frank McSherry, et al. VLDB, 2023. [paper]


Index Advisor

Workload Compression

ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index Tuning

Siddiqui, Tarique and Jo, Saehan and Wu, Wentao and Wang, Chi and Narasayya, Vivek and Chaudhuri, Surajit. SIGMOD, 2022 [paper]

[GSUM] Primitives for workload summarization and implications for SQL

Chaudhuri, Surajit and Narasayya, Vivek and Ganesan, Prasanna. VLDB, 2023 [paper]

Compressing sql workloads

Chaudhuri, Surajit and Gupta, Ashish Kumar and Narasayya, Vivek. SIGMOD, 2022 [paper]

Comprehensive and efficient workload compression

Deep, Shaleen and Gruenheid, Anja and Koutris, Paraschos and Naughton, Jeffrey and Viglas, Stratis. VLDB, 2020 [paper]


Offline Index Tuning

Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms
Kossmann, Jan and Halfpap, Stefan and Jankrift, Marcel and Schlosser, Rainer. VLDB, 2020. [paper]

Exact and approximate algorithms for the index selection problem in physical database design

Caprara, Alberto and Fischetti, Matteo and Maio, Dario. TKDE, 1995 [paper]

A branch-and-cut algorithm for a generalization of the uncapacitated facility location problem

Caprara, Alberto and Gonz{'a}lez, JJ. Top, 1996 [paper]

Separating lifted odd-hole inequalities to solve the index selection problem

Caprara, Alberto and Gonz{'a}lez, Juan Jos{'e} Salazar. Discrete Applied Mathematics, 1999 [paper]

[ILP] An integer linear programming approach to database design

Papadomanolakis, Stratos and Ailamaki, Anastassia. ICDEW, 2007 [paper]

Cophy: A scalable, portable, and interactive index advisor for large workloads

Dash, Debabrata and Polyzotis, Neoklis and Ailamaki, Anastasia. VLDB, 2011 [paper]

Efficient use of the query optimizer for automated physical design

Papadomanolakis, Stratos and Dash, Debabrata and Ailamaki, Anastasia. VLDB, 2007 [paper]

An integer programming approach for the view and index selection problem

Talebi, Zohreh Asgharzadeh and Chirkova, Rada and Fathi, Yahya. Data & Knowledge Engineering, 2013 [paper]

Automated Management of Indexes for Dataflow Processing Engines in IaaS Clouds

Kllapi, Herald and Pietri, Ilia and Kantere, Verena and Ioannidis, Yannis E. EDBT, 2020 [paper]

The optimal selection of secondary indices for files

Mario Schkolnick. Information Systems, 1975 [paper]

Intelligent Index Tuning Approach for Relational Databases

Qiu, Tao and Wang, Bin and Shu, Zhaowei and Zhao, Zhibo and Song, Ziwen and Zhong, Yanhui. Journal of Software, 2020 [paper]

CedarAdvisor: A load-adaptive automatic indexing recommendation tool

Yang, Wencan and Hu, Huiqi and Duan, Huichao and Hu, Yaoyi and Qian, Weining. Journal of East China Normal University (Natural Science), 2020 [paper]

[AutoAdmin] An efficient, cost-driven index selection tool for Microsoft SQL server

Chaudhuri, Surajit and Narasayya, Vivek R. VLDB 1997 [paper]

[DB2Advis] DB2 advisor: An optimizer smart enough to recommend its own indexes

Valentin, Gary and Zuliani, Michael and Zilio, Daniel C and Lohman, Guy and Skelley, Alan. ICDE, 2000 [paper]

[Extend] Efficient scalable multi-attribute index selection using recursive strategies

Schlosser, Rainer and Kossmann, Jan and Boissier, Martin. ICDE, 2019 [paper]

Anytime Algorithm of Database Tuning Advisor for Microsoft SQL Server

S. Chaudhuri and V. Narasayya. 2020 [paper]

On the selection of an optimal set of indexes

Ip, Maggie Y. L. and Saxton, Lawrence V. and Raghavan, Vijay V.. IEEE Transactions on Software Engineering, 1983 [paper]

Index selection for databases: A hardness study and a principled heuristic solution

Chaudhuri, Surajit and Datar, Mayur and Narasayya, Vivek. TKDE, 2004 [paper]

[Drop] Index selection in relational databases

Whang, Kyu-Young. Foundation of Data Organization, 1983 [paper]

[Relaxation] Automatic physical database tuning: A relaxation-based approach

Bruno, Nicolas and Chaudhuri, Surajit. SIGMOD, 2005 [paper]

Index merging

Chaudhuri, S. and Narasayya, V.. ICDE, 1999 [paper]

On a new approach to the index selection problem using mining algorithms

Ameri, Parinaz and Meyer, J{"o}rg and Streit, Achim. Big Data, 2015 [paper]

Semi-automatic index tuning: Keeping dbas in the loop

Schnaitter, Karl and Polyzotis, Neoklis. VLDB, 2012 [paper]

Automatically indexing millions of databases in microsoft azure sql database

Sudipto Das, Miroslav Grbic, Igor Ilic, Isidora Jovandic, Andrija Jovanovic, Vivek R. Narasayya, Miodrag Radulovic, Maja Stikic, Gaoxiang Xu, Surajit Chaudhuri. SIGMOD, 2019 [paper]

Automatic index selection for large-scale datalog computation

Suboti{'c}, Pavle and Jordan, Herbert and Chang, Lijun and Fekete, Alan and Scholz, Bernhard. VLDB, 2018 [paper]

AIM: A practical approach to automated index management for SQL databases

Yadav, Ritwik and Valluri, Satyanarayana R. and Zaït, Mohamed. ICDE, 2023 [paper]

Genetic algorithms and the search for optimal database index selection

Fotouhi, Farshad and Galarce, Carlos E. Great Lakes CS Conference on New Research Results in Computer Science, 1989 [paper]

A genetic algorithm for the index selection problem

Kratica, Jozef and Ljubi{'c}, Ivana and To{\v{s}}i{'c}, Du{\v{s}}an. Workshops on Applications of Evolutionary Computation, 2003 [paper]

Genetic algorithm for database indexing

Korytkowski, Marcin and Gabryel, Marcin and Nowicki, Robert and Scherer, Rafa{\l}. International Conference on Artificial Intelligence and Soft Computing, 2004 [paper]

An adaptive approach for index tuning with learning classifier systems on hybrid storage environments

Pedrozo, Wendel G{'o}es and Nievola, J{'u}lio Cesar and Ribeiro, Deborah Carvalho. International conference on hybrid artificial intelligence systems, 2018 [paper]

GADIS: A genetic algorithm for database index selection

Neuhaus, Priscilla and Couto, Julia and Wehrmann, Jonatas and Ruiz, Duncan Dubugras Alcoba and Meneguzzi, Felipe Rech. The 31st International Conference on Software Engineering & Knowledge Engineering, 2019 [paper]

The index selection problem with configurations and memory limitation: A scatter search approach

Kain, Raslan and Manerba, Daniele and Tadei, Roberto. Computers & Operations Research, 2021 [paper]

Automatic index selection in RDBMS by exploring query execution plan space

Ko{\l}aczkowski, Piotr and Rybi{'n}ski, Henryk. Advances in Data Management, 2009 [paper]

Cost-model oblivious database tuning with reinforcement learning

Basu, Debabrota and Lin, Qian and Chen, Weidong and Vo, Hoang Tam and Yuan, Zihong and Senellart, Pierre and Bressan, St{'e}phane. Database and Expert Systems Applications, 2015 [paper] [code]

The case for automatic database administration using deep reinforcement learning

Sharma, Ankur and Schuhknecht, Felix Martin and Dittrich, Jens. arXiv, 2018 [paper] [code]

Learning index selection with structured action spaces

Welborn, Jeremy and Schaarschmidt, Michael and Yoneki, Eiko. arXiv, 2019 [paper]

An index advisor using deep reinforcement learning

Lan, Hai and Bao, Zhifeng and Peng, Yuwei. CIKM, 2020 [paper] [code]

SMARTIX: A database indexing agent based on reinforcement learning

Paludo Licks, Gabriel and Colleoni Couto, Julia and de F{'a}tima Miehe, Priscilla and De Paris, Renata and Dubugras Ruiz, Duncan and Meneguzzi, Felipe. Applied Intelligence, 2020 [paper] [code]

Online index selection using deep reinforcement learning for a cluster database

Sadri, Zahra and Gruenwald, Le and Leal, Eleazar. ICDEW, 2020 [paper] [code]

Learning an Index Advisor with Deep Reinforcement Learning

Lai, Sichao and Wu, Xiaoying and Wang, Senyang and Peng, Yuwei and Peng, Zhiyong. APWeb and WAIM, 2021 [paper]

MANTIS: Multiple Type and Attribute Index Selection using Deep Reinforcement Learning

Sharma, Vishal and Dyreson, Curtis and Flann, Nicholas. 25th International Database Engineering & Applications Symposium, 2021 [paper]

Index selection for NoSQL database with deep reinforcement learning (https://img.shields.io/badge/-NoSQL-blue)

Yan, Yu and Yao, Shun and Wang, Hongzhi and Gao, Meng. Information Sciences, 2021 [paper]

SWIRL: Selection of Workload-aware Indexes using Reinforcement Learning

Kossmann, Jan and Kastius, Alexander and Schlosser, Rainer. EDBT, 2022 [paper] [code]

Budget-aware Index Tuning with Reinforcement Learning

Wu, Wentao and Wang, Chi and Siddiqui, Tarique and Wang, Junxiong and Narasayya, Vivek and Chaudhuri, Surajit and Bernstein, Philip A. SIGMOD, 2022 [paper]


Online Index Tuning

A benchmark for online index selection

Schnaitter, Karl and Polyzotis, Neoklis. ICDE, 2009 [paper]

QUIET: continuous query-driven index tuning

K. Sattler, I. Geist, and E. Schallehn. VLDB, 2003 [paper]

Online autoadmin: (physical design tuning)

Bruno, Nicolas and Chaudhuri, Surajit. SIGMOD, 2007 [paper]

[COLT] On-line index selection for shifting workloads

Schnaitter, Karl and Abiteboul, Serge and Milo, Tova and Polyzotis, Neoklis. ICDEW, 2007 [paper]

DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees

Perera, R Malinga and Oetomo, Bastian and Rubinstein, Benjamin IP and Borovica-Gajic, Renata. ICDE, 2021 [paper] [code]

HMAB: self-driving hierarchy of bandits for integrated physical database design tuning

Perera, R Malinga and Oetomo, Bastian and Rubinstein, Benjamin IP and Borovica-Gajic, Renata. VLDB, 2022 [paper] [code]

Indexer++ workload-aware online index tuning with transformers and reinforcement learning

Sharma, Vishal and Dyreson, Curtis. SIGAPP, 2022 [paper]

Autoindex: An incremental index management system for dynamic workloads

Zhou, Xuanhe and Liu, Luyang and Li, Wenbo and Jin, Lianyuan and Li, Shifu and Wang, Tianqing and Feng, Jianhua. ICDE, 2022 [paper] [code]


Index Benefit Estimation

A quantitative approach to the selection of secondary indexes

F. P. Palermo. IBM Research RJ 730, 1970

An optimization problem on the selection of secondary keys

Lum, Vincent Y and Ling, Huei. Proceedings of the 1971 26th annual conference [paper]

Secondary index optimization

Mario Schkolnick. SIGMOD, 1975 [paper]

Minimum Cost Selection of Secondary Indexes for Formatted Files

Anderson, Henry D. and Berra, P. Bruce. Association for Computing Machinery, 1977 [paper]

The optimal selection of secondary indices for files

Mario Schkolnick. Information Systems, 1975 [paper]

Index selection in relational databases

Whang, Kyu-Young. Foundations of Data Organization, 1987 [paper]

CedarAdvisor: A load-adaptive automatic indexing recommendation tool

Yang, Wencan and Hu, Huiqi and Duan, Huichao and Hu, Yaoyi and Qian, Weining. Journal of East China Normal University (Natural Science), 2020 [paper]

AutoAdmin "what-if" index analysis utility

Chaudhuri, Surajit and Narasayya, Vivek. SIGMOD, 1998 [paper]

Ai meets ai: Leveraging query executions to improve index recommendations

Ding, Bailu and Das, Sudipto and Marcus, Ryan and Wu, Wentao and Chaudhuri, Surajit and Narasayya, Vivek R. SIGMOD, 2019 [paper]

[SmartIndex] SmartIndex: An Index Advisor with Learned Cost Estimator

Gao, Jianling and Zhao, Nan and Wang, Ning and Hao, Shuang. CIKM, 2022 [paper] [code]

[DISTILL] DISTILL: low-overhead data-driven techniques for filtering and costing indexes for scalable index tuning

Siddiqui, Tarique and Wu, Wentao and Narasayya, Vivek and Chaudhuri, Surajit. VDLB, 2022 [paper]

[LIB] Learned Index Benefits: Machine Learning Based Index Performance Estimation

Shi, Jiachen and Cong, Gao and Li, Xiao-Li. VLDB, 2022 [paper] [code]

[QueryFormer] QueryFormer: a tree transformer model for query plan representation

Zhao, Yue and Cong, Gao and Shi, Jiachen and Miao, Chunyan. VLDB, 2022 [paper] [code]

Zero-shot cost models for out-of-the-box learned cost prediction

Hilprecht, Benjamin and Binnig, Carsten. arXiv, 2022 [paper] [code]


Adaptive Indexing

Self-Selecting, Self-Tuning, Incrementally Optimized Indexes

Graefe, Goetz and Kuno, Harumi. EDBT, 2010 [paper]

Concurrency control for adaptive indexing

Graefe, Goetz and Halim, Felix and Idreos, Stratos and Kuno, Harumi and Manegold, Stefan. VLDB, 2012 [paper]

Database Cracking

Idreos, Stratos and Kersten, Martin L and Manegold, Stefan and others. CIDR, 2007 [paper]

Merging what's cracked, cracking what's merged: adaptive indexing in main-memory column-stores

Idreos, Stratos and Manegold, Stefan and Kuno, Harumi and Graefe, Goetz. VLDB, 2011 [paper]

Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores

Halim, Felix and Idreos, Stratos and Karras, Panagiotis and Yap, Roland H. C.. VLDB, 2012 [paper]

Holistic Indexing in Main-Memory Column-Stores

Petraki, Eleni and Idreos, Stratos and Manegold, Stefan. SIGMOD, 2015 [paper]

Predictive indexing

Arulraj, Joy and Xian, Ran and Ma, Lin and Pavlo, Andrew. arXiv, 2019 [paper]


Partition Advisor

Automating physical database design in a parallel database.

Jun Rao, Chun Zhang, Nimrod Megiddo, Guy M. Lohman. SIGMOD, 2002. [paper]

Schism: a Workload-Driven Approach to Database Replication and Partitioning.

Carlo Curino, Yang Zhang, Evan P. C. Jones, Samuel Madden. PVLDB, 2010. [paper]

Locality-aware partitioning in parallel database systems.

Erfan Zamanian, Carsten Binnig, Abdallah Salama. SIGMOD, 2015. [paper]

Query centric partitioning and allocation for partially replicated database systems.

Tilmann Rabl, Hans-Arno Jacobsen. SIGMOD, 2017. [paper]

Workload-driven horizontal partitioning and pruning for large HTAP systems.

Martin Boissier, Kurzynski Daniel. ICDE Workshop, 2018. [paper]

Towards learning a partitioning advisor with deep reinforcement learning.

Benjamin Hilprecht, Carsten Binnig, Uwe Röhm. aiDM@SIGMOD, 2019. [paper]

Automated vertical partitioning with deep reinforcement learning.

Campero Durand G, Piriyev R, Pinnecke M, et al. ADBIS, 2019. [paper]

Fast and effective distribution-key recommendation for amazon redshift.

Panos Parchas, Yonatan Naamad, Peter Van Bouwel, et al. PVLDB, 2020. [paper]

Adaptive partitioning and indexing for in situ query processing.

Olma, M., Karpathiotakis, M., Alagiannis, I., Athanassoulis, et al. VLDB Journal. [paper]

Learning a Partitioning Advisor for Cloud Databases.

Benjamin Hilprecht, Carsten Binnig, Uwe Röhm. SIGMOD, 2020. [paper]

Grep: A Graph Learning Based Database Partitioning System.

Xuanhe Zhou, Guoliang Li, Jianhua Feng, et al. SIGMOD, 2023. [paper] [demo]


Hybrid Advisor

Universal Database Optimization using Reinforcement Learning

Wang J, Trummer I, Basu D. VLDB, 2021. [paper]

A Unified and Efficient Coordinating Framework for Autonomous DBMS Tuning

Xinyi Zhang, Zhuo Chang, HONG WU, et al. SIGMOD, 2023. [paper]

2. Query Optimization

Query Rewriter

(note other interesting problems like text2SQL are not within the scope)

Traditional

[Rewrite Rules] Béatrice Finance, Georges Gardarin. A Rule-Based Query Rewriter in an Extensible DBMS. ICDE 1991. [paper]

[Rewrite Rules] Hamid Pirahesh, Joseph M. Hellerstein, Waqar Hasan. Extensible/Rule Based Query Rewrite Optimization in Starburst. SIGMOD Conference 1992. [paper]

[Cost/Heuristic Rewrite] Rafi Ahmed, Allison W. Lee, Andrew Witkowski, et al. Cost-Based Query Transformation in Oracle. VLDB 2006: 1026-1036. [paper]

[Heuristic Rewrite] De Araújo, A. H. M., Monteiro, J. M., Antônio, J., De Macêdo, F., Tavares, J. A., Brayner, A., & Lifschitz, S. (2014). ARe-SQL: An Online, Automatic and Non-Intrusive Approach for Rewriting SQL Queries. JIDM, 2014. [paper]

[Semantic Equivalence] Shumo Chu, Konstantin Weitz, Alvin Cheung, Dan Suciu. HoTTSQL: proving query rewrites with univalent SQL semantics. PLDI 2017: 510-524. [paper]

[Optimization Engine] Begoli, E., Camacho-Rodríguez, J., Hyde, J., Mior, M. J., & Lemire, D. (2018). Apache calcite: A foundational framework for optimized query processing over heterogeneous data sources. SIGMOD, 2018. [paper]

[Map-Reduce Rewrite] Partho Sarthi, Kaushik Rajan, Akash Lal, Abhishek Modi, et al. Generalized Sub-Query Fusion for Eliminating Redundant I/O from Big-Data Queries. OSDI 2020: 209-224. [paper]

[Streaming] Wentao Wu, Philip A. Bernstein, Alex Raizman, Christina Pavlopoulou. Cost-based Query Rewriting Techniques for Optimizing Aggregates Over Correlated Windows. CoRR abs/2008.12379 (2020) [paper]

[Program Synthesis] Rui Dong, Jie Liu, Yuxuan Zhu, Cong Yan, Barzan Mozafari, Xinyu Wang. SlabCity: Whole-Query Optimization Using Program Synthesis. VLDB, 2023: 3151-3164. [paper]

[Rewrite Rules] Zhaoguo Wang, Zhou Zhou, Yicun Yang, Haoran Ding, Gansen Hu, Ding Ding, Chuzhe Tang, Haibo Chen, Jinyang Li. WeTune: Automatic Discovery and Verification of Query Rewrite Rules. SIGMOD Conference 2022: 94-107. [paper]

[Rewrite Rules] Qiushi Bai, Sadeem Alsudais, Chen Li. QueryBooster: Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting. VLDB, 2023. [paper]

Learning-based

[Predicate Rewrite] Qi Zhou, Joy Arulraj, Shamkant B. Navathe, William Harris, Jinpeng Wu. Sia : Optimizing Queries using Learned Predicates. SIGMOD, 2021. [paper]

[Rewrite Strategy] Xuanhe Zhou, Guoliang Li, Chengliang Chai, Jianhua Feng. A Learned Query Rewrite System using Monte Carlo Tree Search. VLDB, 2022. [paper]

Cardinality Estimation

[Card, Query-based] Dutt, A., Wang, C., Nazi, A., Kandula, S., Narasayya, V., & Chaudhuri, S. (2018). Selectivity estimation for range predicates using lightweight models. Proceedings of the VLDB Endowment, 12(9), 1044–1057, 2018. [paper]

[Card, Query-based] Kipf A, Kipf T, Radke B, et al. Learned cardinalities: Estimating correlated joins with deep learning. CIDR, 2019. [paper]

[Card, Query-based] Woltmann L, Hartmann C, Thiele M, et al. Cardinality estimation with local deep learning models. aiDM, 2019. [paper]

[Card, Query-based] Hayek, R., & Shmueli, O. (2020). NN-based Transformation of Any SQL Cardinality Estimator for Handling DISTINCT, AND, OR and NOT. arXiv, 2020. [paper]

[Card, Query-based] Tzoumas K, Deshpande A, Jensen C S. Lightweight graphical models for selectivity estimation without independence assumptions[J]. Proceedings of the VLDB Endowment, 4(11): 852-863, 2011. [paper]

[Card, Query-based] Xiao Hu, Yuxi Liu, Haibo Xiu, Pankaj K. Agarwal, Debmalya Panigrahi, Sudeepa Roy, Jun Yang. Selectivity Functions of Range Queries are Learnable. SIGMOD, 2022. [paper]

[Card, Query-based, Adaptability] Beibin Li, Yao Lu, Srikanth Kandula: Warper: Efficiently Adapting Learned Cardinality Estimators to Data and Workload Drifts. SIGMOD Conference 2022: 1920-1933 [paper]

[Card, Query-based, Robust Encoding & Training] Negi, Parimarjan, Ziniu Wu, Andreas Kipf, Nesime Tatbul, Ryan Marcus, Sam Madden, Tim Kraska, and Mohammad Alizadeh: Robust Query Driven Cardinality Estimation under Changing Workloads. VLDB, 2023. [paper]

[Card, Data-based] Leis, V., Radke, B., Gubichev, A., Kemper, A., & Neumann, T. (2017). Cardinality estimation done right: Index-based join sampling. CIDR, 2017. [paper]

[Card, Data-based] Yang, Z., Liang, E., Kamsetty, A., Wu, C., Duan, Y., Chen, X., … Stoica, I. (2019). Deep Unsupervised Cardinality Estimation. VLDB, 2019. [paper]

[Card, Data-based] Yang, Z., Kamsetty, A., Luan, S., Liang, E., Duan, Y., Chen, X., & Stoica, I. (2020). Neurocard: One cardinality estimator for all tables. Proceedings of the VLDB Endowment, 14(1), 61–73, 2020. [paper]

[Card, Data-based] Zhu R, Wu Z, Han Y, et al. FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation[J]. arXiv preprint arXiv:2011.09022, 2020. [paper]

[Card, Data-based] Wu Z, Shaikhha A, Zhu R, et al. BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation. arXiv preprint arXiv: 2012.14743, 2020. [paper]

[Card, Data-based] Hilprecht, B., Schmidt, A., Kulessa, M., Molina, A., Kersting, K., & Binnig, C. (2020). DeepDB: Learn from data, not from queries! VLDB, 13(7), 992–1005, 2020. [paper]

[Card, Data-based] Yongjoo Park, Shucheng Zhong, and Barzan Mozafari. Quicksel: Quick selectivity learning with mixture models. SIGMOD 2020. [paper]

[Card, Data-based] Lu Y, Kandula S, König A C, et al. Pre-training summarization models of structured datasets for cardinality estimation[J]. Proceedings of the VLDB Endowment, 2021. [paper]

[Card, Data-based] Zhu, R., Wu, Z., Han, Y., Zeng, K., Pfadler, A., Qian, Z., … Cui, B. (2020). FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation. VLDB, 2021. [paper]

[Card, Data-based] Jiayi Wang, Chengliang Chai, Jiabin Liu, Guoliang Li. FACE: A Normalizing Flow based Cardinality Estimator. VLDB 2022. [paper]

[Card, Data-based] Yao Lu, Srikanth Kandula, Arnd Christian König, Surajit Chaudhuri. Pre-training summarization models of structured datasets for cardinality estimation. VLDB 2022. [paper]

[Card, Query&Data-based] Wu P, Cong G. A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation. SIGMOD. 2021: 2009-2022. [paper]

[Card] Parimarjan Negi, Ryan C. Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, Mohammad Alizadeh. Flow-Loss: Learning Cardinality Estimates That Matter. VLDB Endow, 14(11): 2019-2032, 2021. [paper]

[Card, Model Selection] Jintao Zhang, Chao Zhang, Guoliang Li, Chengliang Chai. AutoCE: An Accurate and Efficient Model Advisor for Learned Cardinality Estimation. ICDE, 2023. [paper]

[Card] Xiaoye Miao, Yangyang Wu, Jiazhen Peng, et al. Efficient and Effective Cardinality Estimation for Skyline Family. SIGMOD, 2023. [paper]

[Card] Ziniu Wu, Parimarjan Negi, Mohammad Alizadeh, Tim Kraska, Samuel Madden. FactorJoin: A New Cardinality Estimation Framework for Join Queries. SIGMOD, 2023. [paper]

[Card, Query-based] Fang Wang, Xiao Yan, Man Lung Yiu, Shuai Li, Zunyao Mao, and Bo Tang. Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation. SIGMOD, 2023. [paper]

[ EA&B ] Wang, X., Qu, C., Wu, W., Wang, J., & Zhou, Q. (2021). Are We Ready For Learned Cardinality Estimation? Proc. VLDB Endow. 14(9): 1640-1654 (2021). [paper]

[ EA&B ] Sun, J., Zhang, J., Sun, Z., Li, G., & Tang, N. (n.d.). Learned Cardinality Estimation : A Design Space Exploration and a Comparative Evaluation [ EA & B ]. 14(1). VLDB, 2022. [paper]

[ EA&B ] Yuxing Han, Ziniu Wu, Peizhi Wu, et al. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation Yuxing. VLDB, 2022. [paper]

[ EA&B ] Kyoungmin Kim, Jisung Jung, In Seo, Wook-Shin Han, Kangwoo Choi, Jaehyok Chong: Learned Cardinality Estimation: An In-depth Study. SIGMOD Conference 2022: 1214-1227 [paper]

[ EA&B ] Harmouch, H., & Naumann, F. (2018). Cardinality Estimation: An Experimental Survey. Pvldb, 11(4), 4999–512, 2017. [paper]

[ EA&B, Data Update ] Meghdad Kurmanji, Eleni Triantafillou, Peter Triantafillou. Machine Unlearning in Learned Databases: An Experimental Analysis. SIGMOD, 2024. [paper] [code]

Cost Estimation

[Cost] Marcus, R., & Papaemmanouil, O. (2019). Plan-Structured Deep Neural Network Models for Query Performance Prediction. 1733–1746. [paper]

[Cost] Sun, J., & Li, G. (n.d.). An End-to-End Learning-based Cost Estimator. VLDB, 2020. [paper]

[Cost] Benjamin Hilprecht, Carsten Binnig. Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction. VLDB, 2022. [paper]

Plan Optimization

Continuously Adaptive Query Processing

Ron Avnur, Joseph M. Hellerstein. Eddies. SIGMOD, 2000. [paper]

How Good Are Query Optimizers, Really?

Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., & Neumann, T. Proceedings of the VLDB Endowment (2016), 9(3), 204–215. [paper]

Neo: A Learned query optimizer

Marcus, R., Negi, P., Mao, H., Zhang, C., Alizadeh, M., Kraska, T., … Tatbul, N. (2018). Proceedings of the VLDB Endowment, 12(11), 1705–1718, 2018. [paper]

Deep reinforcement learning for join order enumeration

Marcus, R., & Papaemmanouil, O. (2018). Proceedings of the 1st International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, AiDM 2018, 0–3. [paper]

SkinnerDB : Regret-Bounded Query Evaluation via Reinforcement Learning

Trummer, I., Wang, J., Maram, D., Moseley, S., Jo, S., & Antonakakis, J. (n.d.). SIGMOD, 2019. [paper]

Progressive Join Algorithms Considering User Preference

Ding, M., Chen, S., & Manegold, S. (2021). CIDR, 2021. [paper]

Reinforcement Learning with Tree-LSTM for Join Order Selection

Yu, X., Li, G., Tang, N. (n.d.). ICDE, 2020. [paper]

Towards a Learning Optimizer for Shared Clouds

Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, Sriram Rao. Proc. VLDB Endow. 12(3): 210-222, 2018. [paper]

SQL Plan Observability through Hints in Oracle Autonomous Database

Pasupuleti, K., Park, M., & Valluri, S. (n.d.).

Bao: Making Learned Query Optimization Practical

Marcus, R., Negi, P., Mao, H., Tatbul, N., Alizadeh, M., & Kraska, T. (2020). SIGMOD, 2021. [paper]

Steering Query Optimizers: A Practical Take on Big Data Workloads

Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc Friedman, Alekh Jindal. SIGMOD, 2021. [paper]

SkinnerMT: Parallelizing for Efficiency and Robustness in Adaptive Query Processing on Multicore Platforms

Ziyun Wei, Immanuel Trummer. PVLDB, 2022. [paper]

Learning a Query Optimizer Without Expert Demonstrations

Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, Ion Stoica. Balsa. SIGMOD, 2022 [paper]

Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization

Jan Kossmann. CIDR, 2022 [paper]

LOGER: A Learned Optimizer towards Generating Efficient and Robust Query Execution Plans

Tianyi Chen, Jun Gao, Hedui Chen, and Yaofeng Tu. PVLDB, 2023. [paper]

BASE: Bridging the Gap between Cost and Latency for Query Optimization

Chen, Xu, Zhen Wang, Shuncheng Liu, et al. [paper]

COOOL: A Learning-To-Rank Approach for SQL Hint Recommendations

Xu, Xianghong, Zhibing Zhao, Tieying Zhang, et al. [paper]

Lero: A Learning-to-Rank Query Optimizer

Rong Zhu, Wei Chen, Bolin Ding, Xingguang Chen, Andreas Pfadler, Ziniu Wu, Jingren Zhou. VLDB 2023. [paper]

Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection

Xiang Yu, Chengliang Chai, Guoliang Li, Jiabin Liu. VLDB 2023. [paper]

Leveraging Query Logs and Machine Learning for Parametric Query Optimization

Kapil Vaidya, Anshuman Dutt, Vivek Narasayya, Surajit Chaudhuri. VLDB 2022. [paper]

Kepler: Robust Learning for Parametric Query Optimization

Lyric Doshi, Vincent Zhuang, Gaurav Jain, Ryan C Marcus, Haoyu Huang, Deniz Altınbüken, Eugene Brevdo, Campbell Fraser. SIGMOD 2023.[paper]

LEON: A New Framework for ML-Aided Query Optimization

Xu Chen, Haitian Chen, Zibo Liang, Shuncheng Liu, Jinghong Wang, Kai Zeng, Han Su, and Kai Zheng.[paper]

3. Workload Scheduling

Ibrahim Sabek, Tenzin Samten Ukyab, Tim Kraska. LSched: A Workload-Aware Learned Query Scheduler for Analytical Database Systems. SIGMOD, 2022. [paper]

Chi Zhang, Ryan Marcus, and et al. Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning. In VLDB, 2020. [paper]

4. Database Design

Index

One-dimensional Index

[1-D, Immutable] Kraska, T., Beutel, A., Chi, E. H., Dean, J., & Polyzotis, N. (2018). The case for learned index structures. SIGMOD, 2018. [paper] [code]

[1-D, Mutable] Galakatos, A., Markovitch, M., Binnig, C., Fonseca, R., & Kraska, T. (2019). Fiting-tree: A data-aware index structure. SIGMOD, 2019. [paper]

[1-D, Mutable, Secondary] Wu, Y., Yu, J., Tian, Y., Sidle, R., Barber, R. (2019). Designing succinct secondary indexing mechanism by exploiting column correlations. SIGMOD 2019. [paper]

[1-D, Mutable] Ferragina, P., & Vinciguerra, G. (2020). The PGM-index : a fully-dynamic compressed learned index with provable worst-case bounds. VLDB, 2020. [paper]

[1-D, Mutable] Ding, J., Minhas, U. F., Yu, J., Wang, C., Do, J., Li, Y., Zhang, H., Chandramouli, B., Gehrke, J., Kossmann, D., Lomet, D., & Kraska, T. (2020). ALEX: An Updatable Adaptive Learned Index. SIGMOD, 2020. [paper] [code]

[1-D, Mutable, Persistent] Lu, B., Ding, J., Lo, E., Minhas, U. F., & Wang, T. (2021). APEX: A High-Performance Learned Index on Persistent Memory. VLDB, 2021. [paper]

[1-D, Immutable, Auto-generated] Dittrich, J., Nix, J., & Schön, C. (2021). The next 50 Years in Database Indexing or: The Case for Automatically Generated Index Structures. VLDB, 2021. [paper] [code]

[1-D, Mutable, Concurrency] Li, P., Hua, Y., Jia, J., Zuo, P. (2021). FINEdex: A Fine-grained Learned Index Scheme for Scalable and Concurrent Memory Systems. VLDB, 2021. [paper]

[1-D, Mutable] Wu, J., Zhang, Y., Chen, S., Wang, J., Chen, Y., Xing, C. (2021). Updatable learned index with precise positions. VLDB, 2021. [paper]

[1-D, Mutable] Ma, C., Yu, X., Li, Y., Meng, X., & Maoliniyazi, A. (2022). FILM: A Fully Learned Index for Larger-Than-Memory Databases. VLDB, 2022. [paper]

[1-D, Mutable, Concurrency] Wang, Z., Chen, H., Wang, Y., & Tang, C. (2022). The Concurrent Learned Indexes for Multicore Data Storage. ACM Transactions on Storage, 18(1), 1-35. [paper] [code]

[1-D, Mutable] Jiaoyi Zhang, Yihan Gao. (2022). CARMI: A Cache-Aware Learned Index with a Cost-based Construction Algorithm. VLDB, 2022. [paper]

[1-D, Mutable] Shangyu Wu. (2022). NFL: Robust Learned Index via Distribution Transformation. VLDB, 2022. [paper]

[1-D, Mutable, Persistent] Zhang, Z., Chu, Z., Jin, P., Luo, Y., Xie, X., Wan, S., Luo, Y., Wu, X., Zou, P., Zheng, C., Wu, G., Rudoff. A. (2022). PLIN: A Persistent Learned Index for Non-Volatile Memory with High Performance and Instant Recovery. VLDB, 2022. [paper]

[1-D, Mutable] Li, Pengfei, Hua Lu, Rong Zhu, Bolin Ding, et al. (2023). DILI: A Distribution-Driven Learned Index. VLDB, 2023. [paper]

[1-D, Mutable, Persistent] Yulai Tong, Jiazhen Liu, Hua Wang, Ke Zhou, Rongfeng He, Qin Zhang, and Cheng Wang. (2023). Sieve: A Learned Data-Skipping Index for Data Analytics. VLDB, 2023. [paper]

Multi-dimensional Index

[Multi-D, Immutable] Nathan, V., Ding, J., Alizadeh, M., & Kraska, T. (2020). Learning multi-dimensional indexes. SIGMOD, 2020. [paper]

[Multi-D, Mutable, Persistent] Li, P., Lu, H., Zheng, Q., Yang, L., & Pan, G. (2020). LISA: A Learned Index Structure for Spatial Data. SIGMOD, 2020. [paper]

[Multi-D, Mutable, Persistent] Qi, J., Liu, G., Jensen, C.S., Kulik, L. (2020). Effectively learning spatial indices. VLDB, 2020. [paper]

[Multi-D, Immutable] Ding, J., Nathan, V., Alizadeh, M., & Kraska, T. (2020). Tsunami: A learned multi-dimensional index for correlated data and skewed workloads. VLDB, 2020. [paper]

[Multi-D, Mutable] Dong, H., Chai, C., Luo, Y., Liu, J., Feng, J., Zhan, C. (2022). RW-Tree: A Learned Workload-aware Framework for R-tree Construction. ICDE, 2022. [paper]

[Multi-D, Immutable] Gao, J., Cao, X., Yao, X., Zhang, G., & Wang, W. (2023). LMSFC: A Novel Multidimensional Index based on Learned Monotonic Space Filling Curves. VLDB, 2023. [paper]

Experiment and Analysis

[1-D, Immutable, Analysis] Ferragina, P., Lillo, F., & Vinciguerra, G. (2020). Why are learned indexes so effective?. ICML, 2020. [paper]

[1-D, Immutable, Experiment] Marcus, R., Stoian, M., Kipf, A., Misra, S., van Renen, A., Kemper, A., Neumann, T., & Kraska, T. (2020). Benchmarking learned indexes. VLDB, 2020. [paper] [code]

[1-D, Poisoning Attack] Evgenios M. Kornaropoulos, Silei Ren, Roberto Tamassia. (2022). The Price of Tailoring the Index to Your Data: Poisoning Attacks on Learned Index Structures. SIGMOD, 2022. [paper]

[1-D, Mutable, Experiment] Wongkham, C., Lu, B., Liu, C., Zhong, Z., Lo, E., Wang, T. (2022). Are Updatable Learned Indexes Ready?. VLDB, 2022. [paper]

[1-D, Immutable, Experiment] Maltry, M., Dittrich, J. (2022). A critical analysis of recursive model indexes. VLDB, 2022. [paper]

[1-D, Hash Index, Experiment] Sabek, I., Vaidya, K., Horn TUM, D., Kipf, A., Mitzenmacher, M., Kraska, T., Horn, D., Kraska Can, T. (2022) Can Learned Models Replace Hash Functions?. VLDB, 2022. [paper]

[1-D, Mutable, Experiment] Sun, Z., Zhou, X., Li, G. (2023). Learned Index: A Comprehensive Experimental Evaluation. VLDB, 2023. [paper] [code]

[1-D, Immutable, Experiment] Sabek, I., & Kraska, T. (2023). The Case for Learned In-Memory Joins. VLDB, 2023. [paper]

Layout

[Learned Layout] Liwen Sun, Michael J. Franklin, Sanjay Krishnan, et al. Fine-grained partitioning for aggressive data skipping. SIGMOD, 2014. [paper]

[Learned Layout] Yang, Z., Chandramouli, B., Wang, C., Gehrke, J., Li, Y., Minhas, U. F., … Acharya, R. (n.d.). Qd-tree: Learning Data Layouts for Big Data Analytics. SIGMOD, 2020. [paper]

[Learned Layout] Jialin Ding, Umar Farooq Minhas, Badrish Chandramouli, et al. Instance-Optimized Data Layouts for Cloud Analytics Workloads. SIGMOD, 2021. [paper]

[Data Container] Madden S, Ding J, Kraska T, Sudhir S, Cohen D, Mattson T, Tatbul N. Self-Organizing Data Containers. CIDR, 2022. [paper]

[Learned Layout] Teng Zhang, Jian Tan, Xin Cai, Jianying Wang, Feifei Li, Jianling Sun. SA-LSM : Optimize Data Layout for LSM-tree Based Storage using Survival Analysis. VLDB, 2022. [paper]

[Learned Layout] Michael Abebe. Tiresias: Enabling Predictive Autonomous Storage and Indexing. VLDB, 2022. [paper]

Query Execution

[Scheduling] Zhang, C., Marcus, R., Kleiman, A., & Papaemmanouil, O. (2020). Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning. AIDB@VLDB, 2020. [paper]

[Operators] Bandle, M., Giceva, J., & Neumann, T. (2021). To Partition, or Not to Partition, That is the Join Question in a Real System. SIGMOD, 2021. [paper]

[CodeGen] Immanuel Trummer. CodexDB: Synthesizing Code for Qery Processing from Natural Language Instructions using GPT-3 Codex. VLDB, 2022. [paper]

5. Database Monitoring

[Trend Prediction] L. Ma, D. V. Aken, A. Hefny, G. Mezerhane, A. Pavlo, and G. J. Gordon, “Query-based Workload Forecasting for Self-driving Database Management Systems,” in SIGMOD, 2018. [paper]

[Performance Prediction] Dorn, J., Apel, S., & Siegmund, N. (n.d.). Mastering Uncertainty in Performance Estimations of Configurable Software Systems. (3).

[Performance Prediction] Marcus, R., & Papaemmanouil, O. (2019). Plan-structured deep neural network models for query performance prediction. Proceedings of the VLDB Endowment, 12(11), 1733–1746. [paper]

[Performance Prediction] Wu, W., Chi, Y., Hacig̈um̈uş, H., & Naughton, J. F. (2013). Towards predicting query execution time for concurrent and dynamic database workloads. Proceedings of the VLDB Endowment, 6(10), 925–936. [paper]

[Performance Prediction] Duggan, J., Papaemmanouil, O., Cetintemel, U., & Upfal, E. (2014). Contender: A resource modeling approach for concurrent query performance prediction. Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings, 109–120. [paper]

[Performance Prediction] Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümüş, H., & Naughton, J. F. (2013). Predicting query execution time: Are optimizer cost models really unusable? Proceedings - International Conference on Data Engineering, (1), 1081–1092. [paper]

[Performance Prediction] Higginson, A. S., Dediu, M., Arsene, O., Paton, N. W., & Embury, S. M. (2020). Database Workload Capacity Planning using Time Series Analysis and Machine Learning. Proceedings of the ACM SIGMOD International Conference on Management of Data, 769–783. [paper]

[Performance Prediction] Unterbrunner, P., Giannikis, G., Alonso, G., Fauser, D., & Kossmann, D. (2009). Predictable performance for unpredictable workloads. Proceedings of the VLDB Endowment, 2(1), 706–717. [paper]

[Performance Prediction] Xuanhe Zhou, Ji Sun, Guoliang Li, Jianhua Feng. Query Performance Prediction for Concurrent Queries using Graph Embedding. [paper]

6. Database Diagnosis

System and Kernel Causes

Automatic Performance Diagnosis and Tuning in Oracle

Karl Dias, Mark Ramacher, Uri Shaft, et al. CIDR, 2005. [paper]

DBSherlock: A Performance Diagnostic Tool for Transactional Databases.

Yoon, D. Y., Niu, N., & Mozafari, B. SIGMOD, 2016. [paper]

iQCAR: inter-Query Contention Analyzer for Data Analytics Frameworks.

Kalmegh, P., Babu, S., & Roy, S. SIGMOD, 2019. [paper]

FluxInfer: Automatic Diagnosis of Performance Anomaly for Online Database System

Ping Liu, Shenglin Zhang, Yongqian Sun, et al. IPCCC, 2020. [paper]

Diagnosing Root Causes of Intermittent Slow Queries in Cloud Databases.

Minghua Ma, Zheng Yin, Shenglin Zhang, et al. VLDB, 2020. [paper]

Generic and Robust Performance Diagnosis via Causal Inference for OLTP Database Systems.

Xianglin Lu, Zhe Xie, Zeyan Li, et al. CCGrid, 2022. [paper]

DBPA: A Benchmark for Transactional Database Performance Anomalies.

Shiyue Huang,Ziwei Wang, Xinyi Zhang, et al. SIGMOD, 2023. [paper]

Bottleneck Queries

PinSQL: Pinpoint Root Cause SQLs to Resolve Performance Issues in Cloud Databases.

Xiaoze Liu, Zheng Yin, Chao Zhao, et al. ICDE, 2022. [paper]

7. General Techniques

Feature Engineering for DB

[PlanEncoding] Yue Zhao, Gao Cong, Jiachen Shi, Chunyan Miao. QueryFormer: A Tree Transformer Model for Query Plan Representation. VLDB, 2022. [paper]

[Plan2Feature] Debjyoti Paul, Jie Cao, Feifei Li, Vivek Srikumar. Database Workload Characterization with Query Plan Encoders. VLDB, 2022. [paper]

[Pretrained Representation] Xiu Tang, Sai Wu, Mingli Song, Shanshan Ying, Feifei Li, Gang Chen: PreQR: Pre-training Representation for SQL Understanding. SIGMOD Conference 2022: 204-216 [paper]

[WorkloadAsGraph] Sanjay Agrawal, Eric Chu, Vivek R. Narasayya. Automatic physical design tuning: workload as a sequence. SIGMOD, 2006. [paper]

[DataSummary] Brit Youngmann et al. Guided Exploration of Data Summaries. VLDB, 2022. [paper]

Jiang H, Liu C, Paparrizos J, et al. Good to the Last Bit: Data-Driven Encoding with CodecDB. SIGMOD 2021. [paper]

Feature Engineering for AI

Apache flink: Stream and batch processing in a single engine[J].

Carbone P, Katsifodimos A, Ewen S, et al. The Bulletin of the Technical Committee on Data Engineering, 2015, 38(4). [paper]

Feature selection in machine learning: A new perspective[J/OL].

CAI J, LUO J, WANG S et al. Neurocomputing (Amsterdam), 2018, 300: 70-79. DOI:10.1016/j.neucom.2017.11.077. [paper]

Optimizing in-memory database engine for AI-powered on-line decision augmentation using persistent memory.

Cheng Chen, Jun Yang, Mian Lu, Taize Wang, Zhao Zheng, Yuqiang Chen, Wenyuan Dai, Bingsheng He, Weng-Fai Wong, Guoan Wu, Yuping Zhao, and Andy Rudoff. 2021. Proc. VLDB Endow. 14, 5 (January 2021), 799–812. [paper]

Managing ML pipelines: feature stores and the coming wave of embedding ecosystems[J].

Orr L, Sanyal A, Ling X, et al. arXiv preprint arXiv:2108.05053, 2021. [paper]

A System for Time Series Feature Extraction in Federated Learning.

Siqi Wang, Jiashu Li, Mian Lu, Zhao Zheng, Yuqiang Chen, and Bingsheng He. 2022. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM '22). Association for Computing Machinery, New York, NY, USA, 5024–5028. [paper]

FEBench: A Benchmark for Real-Time Relational Data Feature Extraction.

Xuanhe Zhou, Cheng Chen, Kunyi Li, Bingsheng He, Mian Lu, Qiaosheng Liu, Wei Huang, Guoliang Li, Zhao Zheng, Yuqqiang Chen. 2023. Proc. VLDB Endow. [paper]

Model Transfer

Meghdad Kurmanji, Peter Triantafillou. Detect, Distill and Update: Learned DB Systems Facing Out of Distribution Data. SIGMOD, 2023. [paper]

Query And Data Generation

Query Generation

L.Zhang, C.Chai, X.Zhou, and G.Li. Learned sqlgen: Constraint-aware sql generation using reinforcement learning. In SIGMOD, 2022. [paper]

Liu X, Kong X, Liu L, et al. TreeGAN: syntax-aware sequence generation with generative adversarial networks. In ICDM, 2018. [paper]

Data Generation

[DeepAR] Jingyi Yang, Peizhi Wu, Gao Cong, Tieying Zhang, Xiao He. SAM: Database Generation from Query Workloads with Supervised Autoregressive Models. SIGMOD, 2022. [paper]

Francesco Ventura, Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz, Volker Markl. Expand your training limits! Generating training data for ML-based data management. SIGMOD, 2021 [paper]

Ju Fan, Tongyu Liu, Guoliang Li, Yuwei Shen, Xiaoyong Du. Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration. VLDB 2020. [paper]

8. Database Frameworks

Self-Driving Database Management Systems.

Andrew Pavlo, Gustavo Angulo, Joy Arulraj, et al. CIDR, 2017. [paper]

Cloud native database systems at Alibaba: Opportunities and challenges.

Feifei Li. VLDB, 2018. [paper]

SageDB: A learned database system.

Tim Kraska, Mohammad Alizadeh, Alex Beutel, et al. CIDR, 2019. [paper]

MonetDBLite: An embedded analytical database.

Mark Raasveldt. SIGMOD, 2018. [paper]

XuanYuan: An AI-Native Database.

Guoliang Li, Xuanhe Zhou, Sihao Li. Data Eng., 2019 [paper]

DBMS Fitting: Why should we learn what we already know?

Benjamin Hilprecht, Tiemo Bang, Muhammad El-Hindi, et al. CIDR, 2020. [paper]

MB2 : Decomposed Behavior Modeling for Self-Driving Database Management Systems.

Lin Ma, William Zhang, Jie Jiao, et al. SIGMOD, 2021. [paper]

openGauss: An Autonomous Database System.

Guoliang Li, Xuanhe Zhou, Ji Sun, et al. VLDB, 2021. [paper]

From Natural Language Processing to Neural Databases.

James Thorne, Majid Yazdani, Marzieh Saeidi, et al. VLDB, 2021. [paper]

One Model to Rule them All: Towards Zero-Shot Learning for Databases.

Benjamin Hilprecht, Carsten Binnig. CIDR, 2022. [paper]

A Unified Transferable Model for ML-Enhanced DBMS.

Ziniu Wu, et al. CIDR, 2022. [paper]

PerfGuard: Deploying ML-for-Systems without Performance Regressions, Almost!

Remmelt Ammerlaan, Gilbert Antonius, Marc Friedman, et al. VLDB, 2022. [[paper]

Database Gyms.

Lim, Wan Shen, Matthew Butrovich, William Zhang, et al. CIDR, 2023. [paper]

mutable: A Modern DBMS for Research and Fast Prototyping.

Immanuel L Haffner, Jens Dittrich. CIDR, 2023. [paper]

SageDB: An Instance-Optimized Data Analytics System.

Jialin Ding, Ryan Marcus, Andreas Kipf, et al. VLDB, 2022. [paper]

Towards Building Autonomous Data Services on Azure.

Yiwen Zhu, Yuanyuan Tian, Joyce Cahoon, et al. SIGMOD, 2023. [paper]

9. Demonstrations

[DB Tuning] Bohan Zhang, Dana Van Aken, Justin Wang, Tao Dai, Shuli Jiang, Jacky Lao, Siyuan Sheng, Andrew Pavlo, Geoffrey J. Gordon. A Demonstration of the ottertune automatic database management system tuning service. VLDB, 2018. [paper]

[O&M Platform] Xuanhe Zhou, Lianyuan Jin, Ji Sun, Xinyang Zhao, Xiang Yu, Shifu Li, Tianqing Wang, Kun Li, luyang liu. DBMind: A Self-Driving Platform in openGauss. VLDB, 2021. [paper] [website]

[DB Tuning] Junxiong Wang, Immanuel Trummer, Debabrota Basu. Demonstrating UDO: A Unified Approach for Optimizing Transaction Code, Physical Design, and System Parameters via Reinforcement Learning. SIGMOD, 2021. [paper]

[DB Tuning] Immanuel Trummer. Demonstrating DB-BERT: A Database Tuning Tool that "Reads" the Manual. SIGMOD, 2022. [paper]

[DB Tuning] Luming Sun, Tao Ji, Cuiping Li, Hong Chen. DeepO: A Learned Query Optimizer. SIGMOD, 2022. [paper]

[DB Tuning] Xuanhe Zhou, Guoliang Li, Jianming Wu, Jiesi Liu, Zhaoyan Sun, Xinning Zhang. A Learned Query Rewrite System. VLDB, 2023. [paper] [website]

[DB Tuning] Wei Zhou, Chen Lin, Xuanhe Zhou, Guoliang Li, Tianqing Wang. Demonstration of ViTA: Visualizing, Testing and Analyzing Index Advisors. CIKM, 2023. [video]

[DB Tuning] Qiushi Bai, Sadeem Alsudais, Chen Li. Demo of QueryBooster: Supporting Middleware-based SQL Query Rewriting as a Service. VLDB, 2023. [paper]

[DB Tuning] Christoph Anneser, Mario Petruccelli, Nesime Tatbul, David Cohen, Zhenggang Xu, Prithviraj Pandian, Nikolay Laptev, Ryan Marcus, and Alfons Kemper. QO-Insight: Inspecting Steered Query Optimizers. VLDB, 2023. [paper]

[DB Tuning] Zilong Wang, Qixiong Zeng, Ning Wang, Haowen Lu, Yue Zhang. CEDA: Learned Cardinality Estimation with Domain Adaptation. VLDB, 2023. [paper]

[DB Tuning] Junxiong Wang, Immanuel Trummer, Ahmet Kara, Dan Olteanu. ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Join Algorithms via Reinforcement Learning. VLDB, 2023. [paper]

[DB Tuning] Immanuel Trummer. Demonstrating GPT-DB: Generating Query-Specifc and Customizable Code for SQL Processing with GPT-4. VLDB, 2023. [paper]

[DB Diagnosis] Xiu Tang, Sai Wu, Dongxiang Zhang, Ziyue Wang, Gongsheng Yuan, and Gang Chen. A Demonstration of DLBD: Database Logic Bug Detection System. VLDB, 2023. [paper]

📧 Special Issues

S1 Large Language Models Meet Database

Peer-Reviewed

Annotating Columns with Pre-trained Language Models.

Y Suhara, J Li, Y Li, D Zhang, Ç Demiralp, C Chen, WC Tan. SIGMOD 2022 [paper]

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs.

Jinyang Li, Binyuan Hui, Ge Qu, et al. NeurIPS, 2023. [pdf].

Can Foundation Models Wrangle Your Data?

Avanika Narayan, Ines Chami, Laurel Orr, and Christopher Ré. VLDB, 2023. [pdf].

OmniscientDB: A Large Language Model-Augmented DBMS That Knows What Other DBMSs Do Not Know.

M Urban, DD Nguyen, C Binnig. aiDM@SIGMOD 2023 [paper]

DB-GPT: Large Language Model Meets Database.

Xuanhe Zhou, Zhaoyan Sun, Guoliang Li. Data Science and Engineering 2023. [pdf].

CatSQL: Towards Real World Natural Language to SQL Applications.

H Fu, C Liu, B Wu, F Li, J Tan, J Sun. VLDB 2023 [paper]

DeepJoin: Joinable Table Discovery with Pre-trained Language Models.

Y Dong, C Xiao, T Nozawa, M Enomoto, M Oyamada. VLDB 2023 [paper]

How Large Language Models Will Disrupt Data Management.

RC Fernandez, AJ Elmore, MJ Franklin, S Krishnan, C Tan. VLDB 2023 [paper]

Demonstrating GPT-DB: Generating Query-Specific and Customizable Code for SQL Processing with GPT-4.

I Trummer. VLDB 2023 [paper]

Can Large Language Models Predict Data Correlations from Column Names?

I Trummer. VLDB 2023

From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management.

I Trummer. VLDB 2022 [paper]

CodexDB: Synthesizing Code for Query Processing from Natural Language Instructions using GPT-3 Codex.

I Trummer. VLDB 2022 [paper]

Analyzing How BERT Performs Entity Matching.

M Paganelli, F Del Buono, A Baraldi, F Guerra. VLDB 2022 [paper]

Schema Matching using Pre-Trained Language Models.

Y Zhang, A Floratou, J Cahoon, S Krishnan, AC Müller, D Banda, F Psallidas, JM Patel. ICDE 2023 [paper]

DB-BERT: a Database Tuning Tool that "Reads the Manual".

I Trummer. SIGMOD 2022 [paper]

Others

Querying Large Language Models with SQL [Vision].

Mohammed Saeed, Nicola De Cao, Paolo Papotti. arXiv 2023. [pdf].

CAESURA: Language Models as Multi-Modal Query Planners.

Matthias Urban, Carsten Binnig. arXiv 2023. [pdf].

Towards Multi-Modal DBMSs for Seamless Querying of Texts and Tables.

Matthias Urban, Carsten Binnig. arXiv 2023. [pdf].

Multimodal Neural Databases.

Giovanni Trappolini, Andrea Santilli, Emanuele Rodolà, Alon Halevy, Fabrizio Silvestri. arXiv 2023. [pdf].

ChatDB: Augmenting LLMs with Databases AS Their Symbolic Memory.

Chenxu Hu, Jie Fu, Chenzhuang Du, Simian Luo, Junbo Zhao, Hang Zhao. arXiv 2023. [pdf].

Chat2DB [project]

From Large Language Models to Databases and Back A discussion on research and education.

Sihem Amer-Yahia, Angela Bonifati, Lei Chen, Guoliang Li, Kyuseok Shim, Jianliang Xu, Xiaochun Yang. arXiv 2023. [pdf]

Table-GPT: Table-tuned GPT for Diverse Table Tasks.

Peng Li, Yeye He, Dror Yashar, et al. arXiv 2023. [pdf]

Efficient Memory Management for Large Language Model Serving with PagedAttention.

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, et al. arXiv 2023. [pdf].

PDFTriage: Question Answering over Long, Structured Documents.

Jon Saad-Falcon, Joe Barrow, Alexa Siu, et al. arXiv, 2023. [pdf].

Automatic Root Cause Analysis via Large Language Models for Cloud Incidents.

Yinfang Chen, Huaibing Xie, Minghua Ma, et al. arXiv 2023. [pdf]

LLM As DBA [Vision]. arXiv 2023. [pdf].

D-Bot: Database Diagnosis System using Large Language Models.

Xuanhe Zhou, Guoliang Li, Zhaoyan Sun, Zhiyuan Liu, Weize Chen, et al. arXiv 2023. [pdf] [code].

GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization.

Jiale Lao, Yibo Wang, Yufei Li, et al. arXiv, 2023. [pdf] [code]

DBCopilot: Scaling Natural Language Querying to Massive Databases.

Tianshu Wang, Hongyu Lin, Xianpei Han, et al. arXiv, 2023. [pdf]

LLM-SQL-Solver: Can LLMs Determine SQL Equivalence?

Fuheng Zhao, Lawrence Lim, Ishtiyaque Ahmad, et al. arXiv, 2023. [pdf]

OpsEval: A Comprehensive Task-Oriented AIOps Benchmark for Large Language Models

Yuhe Liu, Changhua Pei, Longlong Xu, et al. arXiv, 2023. [pdf]

S2 AI Paper And Code List

Partially Filtered NLP Papers

https://qinyuenlp.com/read/

Prompt Engineering for LLMs

https://www.promptingguide.ai/papers

Deployed AI Algorithms

https://github.com/labmlai/annotated_deep_learning_paper_implementations

LLM Statistics

https://aistratagems.com/large-language-model-llm-statistics/

Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models

Yuxi Ma, Chi Zhang, Song-Chun Zhu. arXiv 2023. [pdf].

S3 Open Datasets And SQLs

https://github.com/cmu-db/benchbase

https://github.com/cwida/public_bi_benchmark/tree/dev/master

https://github.com/TsinghuaDatabaseGroup/datasets