publications.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta name="generator" content="jemdoc, see http://jemdoc.jaboc.net/" />
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<link rel="stylesheet" href="jemdoc.css" type="text/css" />
<title>ADA Lab @ UCSD</title>
</head>
<body>
<table summary="Table for page layout." id="tlayout">
<tr valign="top">
<td id="layout-menu">
<div class="menu-item"><a href="index.html">Home</a></div>
<div class="menu-item"><a href="index.html#members">Members</a></div>
<div class="menu-item"><a href="publications.html" class="current">Publications</a></div>
<div class="menu-item"><a href="news.html">News</a></div>
<div class="menu-item"><a href="impact.html">Impact</a></div>
<div class="menu-item"><a href="blog.html">Blog/Misc.</a></div>
<div class="menu-item"><a href="projects.html"><br /> Active&nbsp;Projects</a></div>
<div class="menu-item"><a href="cerebro.html">Cerebro</a></div>
<div class="menu-category"><br /> Past Projects</div>
<div class="menu-item"><a href="sortinghat.html">SortingHat</a></div>
<div class="menu-item"><a href="speakql.html">SpeakQL</a></div>
<div class="menu-item"><a href="krypton.html">Krypton</a></div>
<div class="menu-item"><a href="vista.html">Vista</a></div>
<div class="menu-item"><a href="panorama.html">Panorama</a></div>
<div class="menu-item"><a href="morpheus.html">Morpheus</a></div>
<div class="menu-item"><a href="hamlet.html">Hamlet</a></div>
<div class="menu-item"><a href="nimbus.html">Nimbus</a></div>
<div class="menu-item"><a href="slab.html">SLAB</a></div>
<div class="menu-item"><a href="orion.html">Orion</a></div>
<div class="menu-item"><a href="http://i.stanford.edu/hazy/victor/columbus/">Columbus</a></div>
<div class="menu-item"><a href="http://i.stanford.edu/hazy/victor/bismarck/">Bismarck</a></div>
<div class="menu-item"><a href="http://i.stanford.edu/hazy/staccato/">Staccato</a></div>
</td>
<td id="layout-content">
<div id="toptitle">
<h1>ADA Lab @ UCSD</h1>
</div>
<h2>Peer-reviewed Publications</h2>
<ul>
<li><p>How do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses<br />
Vraj Shah, Thomas Parashos, and Arun Kumar<br /> 
VLDB 2024 | <a href="papers/2024_CategDedup_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2023_CategDedup.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | Code and Data coming soon
</p>
</li>
</ul>
<ul>
<li><p>Saturn: An Optimized Data System for Multi-Large-Model Deep Learning Workloads<br />
Kabir Nagrecha and Arun Kumar<br />
VLDB 2024 | <a href="papers/2024_Saturn_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2023_Saturn.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://saturn.readthedocs.io/en/latest/index.html" target=&ldquo;blank&rdquo;>Code and Docs Release</a>
</p>
</li>
</ul>
<ul>
<li><p>Lotan: Bridging the Gap between GNNs and Scalable Graph Analytics Engines<br />
Yuhao Zhang and Arun Kumar<br />
VLDB 2023 | <a href="papers/2023_Lotan_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2023_Lotan.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://github.com/makemebitter/lotan" target=&ldquo;blank&rdquo;>Code Release</a> | <a href="https://adalabucsd.github.io/research-blog/lotan.html" target=&ldquo;blank&rdquo;>Blog post</a>
</p>
</li>
</ul>
<ul>
<li><p>Low movement, deep-learned sitting patterns, and sedentary behavior in the International Study of Childhood Obesity, Lifestyle, and the Environment (ISCOLE)<br />
Paul R. Hibbing et al. (12 authors)<br />
International Journal of Obesity 2023 | <a href="papers/2023_ISCOLE_IJO.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Database-Aware ASR Error Correction for Speech-to-SQL Parsing<br />
Yutong Shao, Arun Kumar, and Ndapandula Nakashole<br />
IEEE ICASSP 2023 | <a href="papers/2023_SpeakQL_ICASSP.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>CHAP-child: An open source method for estimating sit-to-stand transitions and sedentary bout patterns from hip accelerometers among children<br />
Jordan A. Carlson et al. (15 authors)<br />
International Journal of Behavioral Nutrition and Physical Activity 2022 | <a href="papers/2022_JBNPA_CHAP.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="https://adalabucsd.github.io/DeepPostures" target=&ldquo;blank&rdquo;>Code, Models, and Documentation</a>
</p>
</li>
</ul>
<ul>
<li><p>Structured Data Representation in Natural Language Interfaces<br />
Yutong Shao, Arun Kumar, and Ndapandula Nakashole<br />
IEEE Data Engineering Bulletin 2022 (Invited) | <a href="papers/2022_SpeakQL_DataEngBulletin.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>CHAP-Adult: A Reliable and Valid Algorithm to Classify Sitting and Measure Sitting Patterns Using Data from Hip-Worn Accelerometers in Adults Aged 35+<br />
John Bellettiere et al. (14 authors)<br />
Journal for the Measurement of Physical Behaviour 2022 | <a href="papers/2022_JMPB_CHAP.pdf" target=&ldquo;blank&rdquo;>PDF</a> | <a href="https://adalabucsd.github.io/DeepPostures" target=&ldquo;blank&rdquo;>Code, Models, and Documentation</a>
</p>
</li>
</ul>
<ul>
<li><p>VLDB Scalable Data Science Category: The Inaugural Year<br />
Arun Kumar, Alon Halevy, and Nesime Tatbul<br />
ACM SIGMOD Record 2022 | <a href="papers/2022_SDS_SIGMODRecord.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets<br />
Supun Nakandala and Arun Kumar<br />
SIGMOD 2022 | <a href="papers/2022_Nautilus_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2021_Nautilus.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://github.com/ADALabUCSD/Nautilus" target=&ldquo;blank&rdquo;>Code Release</a>
</p>
</li>
</ul>
<ul>
<li><p>VLDB Panel Summary: &ldquo;The Future of Data(base) Education: Is the Cow Book Dead?&rdquo;<br />
Zachary Ives, Johannes Gehrke, Jana Giceva, Arun Kumar, and Rachel Pottinger<br />
ACM SIGMOD Record 2021 | <a href="papers/2021_DBEd_SIGMODRecord.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Some Damaging Delusions of Deep Learning Practice (and How to Avoid Them)<br />
Arun Kumar, Supun Nakandala, and Yuhao Zhang<br />
KDD 2021 Deep Learning Day | <a href="papers/2021_DLDelusions_KDD.pdf" target=&ldquo;blank&rdquo;>Extended Abstract PDF</a> 
| <a href="papers/2021_DLDelusions_KDD_Slides.pdf" target=&ldquo;blank&rdquo;>Talk slides</a> 
| <a href="https://www.youtube.com/watch?v=UP9__WsfSuc" target=&ldquo;blank&rdquo;>Talk video</a>
</p>
</li>
</ul>
<ul>
<li><p>Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning<br />
Side Li and Arun Kumar<br />
VLDB 2021 | <a href="papers/2021_Kingpin_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2021_Kingpin.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://www.youtube.com/watch?v=OlTknBfBmvM" target=&ldquo;blank&rdquo;>Talk video</a> | <a href="https://github.com/liside/Kingpin" target=&ldquo;blank&rdquo;>Code Release</a>
</p>
</li>
</ul>
<ul>
<li><p>Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches<br />
Yuhao Zhang, Frank McQuillan, Nandish Jayaram, Nikhil Kak, Ekta Khanna, Orhan Kislal, Domino Valdano, and Arun Kumar<br />
VLDB 2021 | <a href="papers/2021_Cerebro-DS.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2021_Cerebro-DS.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://youtu.be/SK9wTzO4K7M" target=&ldquo;blank&rdquo;>Talk video</a> | <a href="https://github.com/makemebitter/cerebro-ds/" target=&ldquo;blank&rdquo;>Code release</a>
</p>
</li>
</ul>
<ul>
<li><p>Intermittent Human-in-the-Loop Model Selection using Cerebro: A Demonstration<br />
Liangde Li, Supun Nakandala, and Arun Kumar<br />
VLDB 2021 Demo | <a href="papers/2021_Cerebro_VLDB_Demo.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2021_Intermittent_HIL_MS.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://youtu.be/K3THQy5McXc" target=&ldquo;blank&rdquo;>Video</a>
</p>
</li>
</ul>
<ul>
<li><p>Towards A Polyglot Framework for Factorized ML<br />
David Justo, Shaoqing Yi, Lukas Stadler, Nadia Polikarpova, and Arun Kumar<br />
VLDB 2021 (Industrial Track) | <a href="papers/2021_Trinity_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2021_Trinity.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://www.youtube.com/watch?v=osvBmZs2MsM" target=&ldquo;blank&rdquo;>Talk video</a> | Code coming soon
</p>
</li>
</ul>
<ul>
<li><p>Towards Benchmarking Feature Type Inference for AutoML Platforms<br />
Vraj Shah, Jonathan Lacanlale, Premanand Kumar, Kevin Yang, and Arun Kumar<br />
ACM SIGMOD 2021 | <a href="papers/2021_SortingHat_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2021_SortingHat.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | Talk Videos: <a href="https://youtu.be/KAs-uU59AEM" target=&ldquo;blank&rdquo;>Short Talk</a> <a href="https://youtu.be/dpx74zQyU3k" target=&ldquo;blank&rdquo;>Long Talk</a> | <a href="https://github.com/pvn25/ML-Data-Prep-Zoo/tree/master/MLFeatureTypeInference" target=&ldquo;blank&rdquo;>Data, Code, and Pre-trained Models on GitHub</a> | <a href="https://github.com/pvn25/ML-Data-Prep-Zoo/tree/master/MLFeatureTypeInference/Library" target=&ldquo;blank&rdquo;>Python library</a>
</p>
</li>
</ul>
<ul>
<li><p>Automation of Data Prep, ML, and Data Science: New Cure or Snake Oil?<br />
Arun Kumar<br />
ACM SIGMOD 2021 Panel | <a href="papers/2021_Panel_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>The CNN Hip Accelerometer Posture (CHAP) Method for Classifying Sitting Patterns from Hip Accelerometers: A Validation Study<br />
Mikael Anne Greenwood-Hickman, Supun Nakandala, Marta M. Jankowska, Fatima Tuz-Zahra, John Bellettiere, Jordan Carlson, Paul R. Hibbing, Jingjing Zou, Andrea Z. LaCroix, Arun Kumar, and Loki Natarajan<br />
Medicine and Science in Sports and Exercise Journal, 2021 | <a href="papers/2021_MSSE_CHAP.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="https://adalabucsd.github.io/DeepPostures" target=&ldquo;blank&rdquo;>Code, Models, and Documentation</a>
</p>
</li>
</ul>
<ul>
<li><p>Application of Convolutional Neural Network Algorithms for Advancing Sedentary and Activity Bout Classification<br />
Supun Nakandala, Marta Jankowska, Fatima Tuz-Zahra, John Bellettiere, Jordan Carlson, Andrea LaCroix, Sheri Hartman, Dori Rosenberg, Jingjing Zou, Arun Kumar, and Loki Natarajan<br />
Journal for the Measurement of Physical Behaviour, 2021 | <a href="papers/2021_JMPB_CNN.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2021_JMPB_CNN.txt" target=&ldquo;blank&rdquo;>BibTeX</a> | <a href="https://adalabucsd.github.io/DeepPostures" target=&ldquo;blank&rdquo;>Code, Models, and Documentation</a>
</p>
</li>
</ul>
<ul>
<li><p>Cerebro: A Layered Data Platform for Scalable Deep Learning<br />
Arun Kumar, Supun Nakandala, Yuhao Zhang, Side Li, Advitya Gemawat, and Kabir Nagrecha<br />
CIDR 2021 (Vision paper) | <a href="papers/2021_Cerebro_CIDR.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2021_Cerebro_CIDR.txt" target=&ldquo;blank&rdquo;>BibTeX</a> | <a href="https://www.youtube.com/watch?v=8QfMvdlmdic" target=&ldquo;blank&rdquo;>Talk video</a>
</p>
</li>
</ul>
<ul>
<li><p>Cerebro: A Data System for Optimized Deep Learning Model Selection<br />
Supun Nakandala, Yuhao Zhang, and Arun Kumar<br />
VLDB 2020 | <a href="papers/2020_Cerebro_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2020_Cerebro_VLDB.txt" target=&ldquo;blank&rdquo;>BibTeX</a> | <a href="papers/2020_Cerebro_VLDB_Errata.pdf" target=&ldquo;blank&rdquo;>Errata</a> | <a href="papers/TR_2020_Cerebro.pdf" target=&ldquo;blank&rdquo;>TechReport</a>
| Talk videos: <a href="https://www.youtube.com/watch?v=8PJic5FStGs" target=&ldquo;blank&rdquo;>Youtube</a> <a href="https://www.bilibili.com/video/av329339128?p=198" target=&ldquo;blank&rdquo;>Bilibili</a>
| <a href="https://adalabucsd.github.io/research-blog/cerebro.html" target=&ldquo;blank&rdquo;>Blog post</a> | <a href="https://databricks.com/session_na20/resource-efficient-deep-learning-model-selection-on-apache-spark" target=&ldquo;blank&rdquo;>SAIS Talk video</a>
| <a href="https://adalabucsd.github.io/cerebro-system/" target=&ldquo;blank&rdquo;>Source code and documentation</a>
</p>
</li>
</ul>
<ul>
<li><p>Panorama: A Data System for Unbounded Vocabulary Querying over Video<br />
Yuhao Zhang and Arun Kumar<br />
VLDB 2020 | <a href="http://www.vldb.org/pvldb/vol13/p477-zhang.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2019_Panorama_VLDB.txt" target=&ldquo;blank&rdquo;>BibTeX</a>|
<a href="papers/TR_2019_Panorama.pdf" target=&ldquo;blank&rdquo;>TechReport</a> 
| <a href="https://docs.google.com/presentation/d/1a9xHmfP1Gwg03CnVP8OWWf20v1IZ9O5eIhfa0dEdkcc/edit?usp=sharing" target=&ldquo;blank&rdquo;>Talk slides</a> | Talk videos: <a href="https://www.youtube.com/watch?v=gAGOp0fbUcU" target=&ldquo;blank&rdquo;>Youtube</a> <a href="https://www.bilibili.com/video/av329339128?p=109" target=&ldquo;blank&rdquo;>Bilibili</a>
| <a href="https://adalabucsd.github.io/research-blog/panorama.html" target=&ldquo;blank&rdquo;>Blog post</a>
| <a href="https://github.com/makemebitter/Panorama-UCSD" target=&ldquo;blank&rdquo;>Source code on GitHub</a>
</p>
</li>
</ul>
<ul>
<li><p>Understanding and Benchmarking the Impact of GDPR on Database Systems<br />
Supreeth Shastri, Vinay Banakar, Melissa Wasserman, Arun Kumar, and Vijay Chidambaram<br />
VLDB 2020 | <a href="http://www.vldb.org/pvldb/vol13/p1064-shastri.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="https://04e19274-9945-4166-b1be-95d42dc718a3.filesusr.com/ugd/13b079_1e10e6be8e7045ee9b26afdcdae6f60b.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://www.gdprbench.org/" target=&ldquo;blank&rdquo;>Webpage</a> 
| Talk videos: <a href="https://www.youtube.com/watch?v=1O8_fVmzUUc" target=&ldquo;blank&rdquo;>Youtube</a> <a href="https://www.bilibili.com/video/av329339128?p=188" target=&ldquo;blank&rdquo;>Bilibili</a>
</p>
</li>
</ul>
<ul>
<li><p>Query Optimization for Faster Deep CNN Explanations<br />
Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou<br />
ACM SIGMOD Record 2020 | <a href="papers/2020_Krypton_SIGMODRecord.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2020_Krypton_SIGMODRecord.txt" target=&ldquo;blank&rdquo;>BibTeX</a> <br />
<tt>ACM SIGMOD Research Highlights Award</tt>
</p>
</li>
</ul>
<ul>
<li><p>Incremental and Approximate Computations for Accelerating Deep CNN Inference<br />
Supun Nakandala, Kabir Nagrecha, Arun Kumar, and Yannis Papakonstantinou<br />
ACM TODS 2020 | <a href="papers/2020_Krypton_TODS.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2020_Krypton_TODS.txt" target=&ldquo;blank&rdquo;>BibTeX</a> <br />
<tt>Invited Paper</tt>
</p>
</li>
</ul>
<ul>
<li><p>Vista: Optimized System for Declarative Feature Transfer from Deep CNNs at Scale<br />
Supun Nakandala and Arun Kumar<br />
ACM SIGMOD 2020 | <a href="papers/2020_Vista_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2020_Vista_SIGMOD.txt" target=&ldquo;blank&rdquo;>BibTeX</a> |
<a href="papers/TR_2020_Vista.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://adalabucsd.github.io/research-blog/research/2020/06/14/vista.html" target=&ldquo;blank&rdquo;>Blog post</a> | <a href="https://www.youtube.com/watch?v=nmfUFCDthAo&amp;feature=youtu.be" target=&ldquo;blank&rdquo;>Talk Video</a> | <a href="https://github.com/ADALabUCSD/Vista" target=&ldquo;blank&rdquo;>Code</a>
</p>
</li>
</ul>
<ul>
<li><p>SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data<br />
Vraj Shah, Side Li, Arun Kumar, and Lawrence Saul<br />
ACM SIGMOD 2020 | <a href="papers/2020_SpeakQL_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>  and <a href="papers/2020_SpeakQL_SIGMOD.txt" target=&ldquo;blank&rdquo;>BibTeX</a>|
<a href="papers/TR_2020_SpeakQL.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | 
<a href="https://adalabucsd.github.io/research-blog/research/2020/06/14/speakql.html" target=&ldquo;blank&rdquo;>Blog post</a> | 
<a href="https://drive.google.com/drive/folders/1tSxUTu2A7qy8fPtB81RnwkyakgykZ3iw?usp=sharing" target=&ldquo;blank&rdquo;>Dataset on Drive</a>
</p>
</li>
</ul>
<ul>
<li><p>Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations<br />
Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou<br />
ACM SIGMOD 2019 | <a href="papers/2019_Krypton_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2019_Krypton_SIGMOD.txt" target=&ldquo;blank&rdquo;>BibTeX</a> |  <a href="papers/TR_2019_Krypton.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://adalabucsd.github.io/research-blog/research/2019/06/07/krypton.html" target=&ldquo;blank&rdquo;>Blog post</a> | <a href="https://av.tib.eu/media/42901" target=&ldquo;blank&rdquo;>Talk Video</a> <br />
<tt>Honorable Mention for Best Paper Award</tt>
</p>
</li>
</ul>
<ul>
<li><p>Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra<br />
Side Li, Lingjiao Chen, and Arun Kumar<br />
ACM SIGMOD 2019 | <a href="papers/2019_MorpheusFI_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2019_MorpheusFI_SIGMOD.txt" target=&ldquo;blank&rdquo;>BibTeX</a> | <a href="https://github.com/liside/MorpheusFI" target=&ldquo;blank&rdquo;>Code and Data on Github</a>
</p>
</li>
</ul>
<ul>
<li><p>Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent<br />
Fengan Li, Lingjiao Chen, Yijing Zeng, Arun Kumar, Jeffrey Naughton, Jignesh Patel, and Xi Wu<br />
ACM SIGMOD 2019 | <a href="papers/2019_TOC_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2019_TOC.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://github.com/fenganli/toc-release-code" target=&ldquo;blank&rdquo;>Code on GitHub</a>
</p>
</li>
</ul>
<ul>
<li><p>Model-based Pricing for Machine Learning in a Data Marketplace<br />
Lingjiao Chen, Paraschos Koutris, and Arun Kumar<br />
ACM SIGMOD 2019 | <a href="papers/2019_Nimbus_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2018_Nimbus.pdf" target=&ldquo;blank&rdquo;>TechReport</a> 
</p>
</li>
</ul>
<ul>
<li><p>Cerebro: Efficient and Reproducible Model Selection on Deep Learning Systems<br />
Supun Nakandala, Yuhao Zhang, and Arun Kumar<br />
ACM SIGMOD 2019 DEEM Workshop | <a href="papers/2019_Cerebro_DEEM.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2019_Cerebro_DEEM.txt" target=&ldquo;blank&rdquo;>BibTeX</a> | <a href="papers/TR_2019_Cerebro.pdf" target=&ldquo;blank&rdquo;>TechReport</a>
| <a href="https://adalabucsd.github.io/research-blog/cerebro.html" target=&ldquo;blank&rdquo;>Blog post</a>
</p>
</li>
</ul>
<ul>
<li><p>The ML Data Prep Zoo: Towards Semi-Automatic Data Preparation for ML<br />
Vraj Shah and Arun Kumar<br />
ACM SIGMOD 2019 DEEM Workshop | <a href="papers/2019_DataPrepZoo_DEEM.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2019_SortingHat_SIGMOD.txt" target=&ldquo;blank&rdquo;>BibTeX</a>| <a href="papers/TR_2019_DataPrepZoo.pdf" target=&ldquo;blank&rdquo;>TechReport</a> 
| <a href="https://adalabucsd.github.io/research-blog/research/2019/06/21/mldataprepzoo.html" target=&ldquo;blank&rdquo;>Blog post</a>
| <a href="https://github.com/pvn25/ML-Data-Prep-Zoo" target=&ldquo;blank&rdquo;>Data Prep Zoo Repository on GitHub</a>
</p>
</li>
</ul>
<ul>
<li><p>Demonstration of SpeakQL: Speech-driven Multimodal Querying of Structured Data<br />
Vraj Shah, Side Li, Kevin Yang, Arun Kumar, and Lawrence Saul<br />
ACM SIGMOD 2019 Demo | <a href="papers/2019_SpeakQL_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2019_SpeakQL_SIGMOD.txt" target=&ldquo;blank&rdquo;>BibTeX</a> | <a href="https://vimeo.com/295693078" target=&ldquo;blank&rdquo;>Video</a>
</p>
</li>
</ul>
<ul>
<li><p>Demonstration of Nimbus: Model-based Pricing for Machine Learning in a Data Marketplace<br />
Lingjiao Chen, Hongyi Wang, Leshang Chen, Paraschos Koutris, and Arun Kumar<br />
ACM SIGMOD 2019 Demo | <a href="papers/2019_NimbusDemo_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | Video coming soon
</p>
</li>
</ul>
<ul>
<li><p>Demonstration of Krypton: Optimized CNN Inference for Occlusion-based Deep CNN Explanations<br />
Allen Ordookhanians, Xin Li, Supun Nakandala, and Arun Kumar<br />
VLDB 2019 | <a href="http://www.vldb.org/pvldb/vol12/p1894-ordookhanians.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2019_Krypton_VLDB.txt" target=&ldquo;blank&rdquo;>BibTeX</a> | <a href="https://www.youtube.com/watch?v=1OWddbd4n6Y&amp;feature=youtu.be" target=&ldquo;blank&rdquo;>Video</a> 
</p>
</li>
</ul>
<ul>
<li><p>Demonstration of Krypton: Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations<br />
Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou<br />
SysML 2019 Demo | <a href="papers/2019_Krypton_SysML.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="https://www.youtube.com/watch?v=1OWddbd4n6Y&amp;feature=youtu.be" target=&ldquo;blank&rdquo;>Video</a>
</p>
</li>
</ul>
<ul>
<li><p>Data Management in Machine Learning Systems<br />
Matthias Boehm, Arun Kumar, and Jun Yang<br />
Synthesis Lectures on Data Management, Morgan & Claypool Publishers (Book), 2019 |
<a href="https://www.morganclaypool.com/doi/10.2200/S00895ED1V01Y201901DTM057" target=&ldquo;blank&rdquo;>PDF</a> |
<a href="https://link.springer.com/book/10.1007/978-3-031-01869-5" target=&ldquo;blank&rdquo;>Order hard copy</a>
</p>
</li>
</ul>
<ul>
<li><p>Hierarchical and Distributed Machine Learning Inference Beyond the Edge<br />
Anthony Thomas, Yunhui Guo, Yeseong Kim, Baris Aksanli, Arun Kumar and Tajana Rosing<br />
IEEE ICNSC 2019 | <a href="papers/2019_ICNSC.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Predicting Eating Events in Free Living Individuals<br />
Jiayi Wang, Jiue-An Yang, Supun Nakandala, Arun Kumar and Marta M. Jankowska<br />
eScience 2019 Conference (Poster)
</p>
</li>
</ul>
<ul>
<li><p>A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics<br />
Anthony Thomas and Arun Kumar<br />
VLDB 2018/2019  | <a href="papers/2019_SLAB_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> |
<a href="papers/TR_2018_SLAB.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="slab.html" target=&ldquo;blank&rdquo;>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>In-RDBMS Hardware Acceleration of Advanced Analytics<br />
Divya Mahajan, Joon Kyung Kim, Jacob Sacks, Adel Ardalan, Arun Kumar, and Hadi Esmaeilzadeh<br />
VLDB 2018 | <a href="papers/2018_DANA_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> |
<a href="http://act-lab.org/artifacts/dana/addendum.pdf" target=&ldquo;blank&rdquo;>Addendum</a>
</p>
</li>
</ul>
<ul>
<li><p>Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers?<br />
Vraj Shah, Arun Kumar, and Xiaojin Zhu.<br />
VLDB 2018 |
<a href="papers/2018_Hamlet_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2018_Hamlet_VLDB.txt" target=&ldquo;blank&rdquo;>BibTeX</a>|
<a href="papers/TR_2017_HamletPlusPlus.pdf" target=&ldquo;blank&rdquo;>TechReport</a> |
<a href="hamlet.html" target=&ldquo;blank&rdquo;>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Materialization Trade-offs for Feature Transfer from Deep CNNs for Multimodal Data Analytics<br />
Supun Nakandala and Arun Kumar<br />
SysML 2018 Short paper/poster | <a href="papers/2018_Vista_SysML.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Towards Linear Algebra over Normalized Data<br />
Lingjiao Chen, Arun Kumar, Jeffrey Naughton, and Jignesh Patel<br />
VLDB 2017 |
<a href="papers/2017_Morpheus_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> |
<a href="papers/TR_2017_Morpheus.pdf" target=&ldquo;blank&rdquo;>TechReport</a> |
<a href="morpheus.html" target=&ldquo;blank&rdquo;>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics<br />
Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri, Somesh Jha, and Jeffrey Naughton<br />
ACM SIGMOD 2017 |
<a href="papers/2017_BismarckBoltOnDP_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> |
<a href="papers/TR_2017_BismarckBoltOnDP.pdf" target=&ldquo;blank&rdquo;>TechReport</a>
</p>
</li>
</ul>
<ul>
<li><p>Data Management in Machine Learning: Challenges, Techniques, and Systems<br />
Arun Kumar, Matthias Boehm, and Jun Yang<br />
ACM SIGMOD 2017 Tutorial |
<a href="papers/2017_Tutorial_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> |
<a href="papers/Slides_2017_Tutorial_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Slidedeck PDF</a> |
<a href="https://www.youtube.com/watch?v=U8J0Dd_Z5wo" target=&ldquo;blank&rdquo;>Video of tutorial on Youtube</a>
</p>
</li>
</ul>
<ul>
<li><p>SpeakQL: Towards Speech-driven Multi-modal Querying<br />
Dharmil Chandarana, Vraj Shah, Arun Kumar, and Lawrence Saul<br />
ACM SIGMOD 2017 HILDA Workshop |
<a href="papers/2017_SpeakQL_HILDA.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2017_SpeakQL_SIGMOD.txt" target=&ldquo;blank&rdquo;>BibTeX</a>
</p>
</li>
</ul>
<ul>
<li><p>Model-based Pricing: Do Not Pay for More than What You Learn!<br />
Lingjiao Chen, Paraschos Koutris, and Arun Kumar<br />
ACM SIGMOD 2017 DEEM Workshop |
<a href="papers/2017_Nimbus_DEEM.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Cerebro: A System to Manage Deep Learning for Relational Data Analytics<br />
Arun Kumar<br />
CIDR 2017 Abstract |
<a href="papers/2017_Cerebro_CIDR.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>To Join or Not to Join? Thinking Twice about Joins before Feature Selection<br />
Arun Kumar, Jeffrey Naughton, Jignesh M. Patel, and Xiaojin Zhu<br />
ACM SIGMOD 2016 |
<a href="papers/2016_Hamlet_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2016_Hamlet_SIGMOD.txt" target=&ldquo;blank&rdquo;>BibTeX</a>|
<a href="papers/TR_2016_Hamlet.pdf" target=&ldquo;blank&rdquo;>TechReport</a> |
<a href="hamlet.html" target=&ldquo;blank&rdquo;>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Materialization Optimizations for Feature Selection Workloads<br />
Ce Zhang, Arun Kumar, and Christopher Re<br />
ACM TODS 2016 (Invited) | <a href="papers/2016_Columbus_TODS.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Model Selection Management Systems: The Next Frontier of Advanced Analytics<br />
Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M. Patel<br />
ACM SIGMOD Record Dec 2015 Vision Track |
<a href="papers/2015_MSMS_SIGMODRecord.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Demonstration of Santoku: Optimizing Machine Learning over Normalized Data<br />
Arun Kumar, Mona Jalal, Boqun Yan, Jeffrey Naughton, and Jignesh M. Patel<br />
VLDB 2015 Demo |
<a href="papers/2015_Santoku_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> |
<a href="orion.html" target=&ldquo;blank&rdquo;>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Learning Generalized Linear Models Over Normalized Data<br />
Arun Kumar, Jeffrey Naughton, and Jignesh M. Patel<br />
ACM SIGMOD 2015 |
<a href="papers/2015_Orion_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> |
<a href="orion.html" target=&ldquo;blank&rdquo;>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Materialization Optimizations for Feature Selection Workloads<br />
Ce Zhang, Arun Kumar, and Christopher Re<br />
ACM SIGMOD 2014 |
<a href="papers/2014_Columbus_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a><br />
<tt>Best Paper Award; Invited to ACM TODS 2016</tt>
</p>
</li>
</ul>
<ul>
<li><p>Distributed and Scalable PCA in the Cloud<br />
Arun Kumar, Nikos Karampatziakis, Paul Mineiro, Markus Weimer, and Vijay Narayanan<br />
NIPS BigLearn 2013 |
<a href="papers/2013_PCAonREEF_BigLearn.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Feature Selection in Enterprise Analytics: A Demonstration using an R-based Data Analytics System<br />
Pradap Konda, Arun Kumar, Christopher Ré, and Vaishnavi Sashikanth<br />
VLDB 2013 Demo |
<a href="papers/2013_Columbus_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Hazy: Making it Easier to Build and Maintain Big-data Analytics<br />
Arun Kumar, Feng Niu, and Christopher Re<br />
ACM Queue 2013 |
<a href="http://queue.acm.org/detail.cfm?id=2431055" target=&ldquo;blank&rdquo;>Article</a><br />
<tt>Invited to the Communications of the ACM March 2013</tt>
</p>
</li>
</ul>
<ul>
<li><p>Brainwash: A Data System for Feature Engineering<br />
Michael Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael Cafarella, Arun Kumar, Feng Niu, Yongjoo Park, Christopher Re, and Ce Zhang<br />
CIDR 2013 Vision Track |
<a href="papers/2013_Brainwash_CIDR.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Towards a Unified Architecture for in-RDBMS Analytics<br />
Xixuan Feng*, Arun Kumar*, Benjamin Recht, and Christopher Re<br />
ACM SIGMOD 2012 |
<a href="papers/2012_Bismarck_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> |
<a href="papers/TR_2012_Bismarck.pdf" target=&ldquo;blank&rdquo;>TechReport</a> |
<a href="http://i.stanford.edu//hazy/victor/bismarck-download/" target=&ldquo;blank&rdquo;>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>The MADlib Analytics Library or MAD Skills, the SQL<br />
Joseph M. Hellerstein, Christopher Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar<br />
VLDB 2012 Industrial Track |
<a href="papers/2012_MADlib_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Probabilistic Management of OCR Data using an RDBMS<br />
Arun Kumar, and Christopher Re<br />
VLDB 2012 |
<a href="papers/2012_Staccato_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> |
<a href="papers/TR_2012_Staccato.pdf" target=&ldquo;blank&rdquo;>TechReport</a> |
<a href="http://i.stanford.edu/hazy/staccato/download/" target=&ldquo;blank&rdquo;>Code and Data</a>
</p>
</li>
</ul>
<h2>Manuscripts and Articles</h2>
<ul>
<li><p>Arun Kumar's contribution to &ldquo;Reminiscences on Influential Papers&rdquo;<br />
Pinar Tozun<br />
ACM SIGMOD Record 2023 | <a href="https://sigmodrecord.org/publications/sigmodRecord/2212/pdfs/06_Reminiscences_Rabl.pdf" target=&ldquo;blank&rdquo;>Article PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Design and Evaluation of an SQL-Based Dialect for Spoken Querying<br />
Kyle Luoma and Arun Kumar<br />
<a href="papers/TR_2023_SpeakQL_Dialect.pdf" target=&ldquo;blank&rdquo;>TechReport</a>
</p>
</li>
</ul>
<ul>
<li><p>Hydra: A Data System for Large Multi-Model Deep Learning<br />
Kabir Nagrecha and Arun Kumar<br />
<a href="https://arxiv.org/abs/2110.08633" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://github.com/knagrecha/hydra" target=&ldquo;blank&rdquo;>Code release</a>
</p>
</li>
</ul>
<ul>
<li><p>Integrating Cerebro with Ray<br />
Abhishek Gupta and Rishikesh Ingale<br />
<a href="papers/TR_2022_CSE234_CerebroRay.pdf" target=&ldquo;blank&rdquo;>CSE 234 Project TechReport</a> |
<a href="https://github.com/Abhishek2304/Cerebro-System-Ray" target=&ldquo;blank&rdquo;>Code Release</a>
</p>
</li>
</ul>
<ul>
<li><p>Integrating Cerebro with Dask<br />
Vignesh Nanda Kumar and Pratik Ratadiya<br />
<a href="papers/TR_2022_CSE234_CerebroDask.pdf" target=&ldquo;blank&rdquo;>CSE 234 Project TechReport</a> |
<a href="https://github.com/VigneshN1997/cerebro-system" target=&ldquo;blank&rdquo;>Code Release</a>
</p>
</li>
</ul>
<ul>
<li><p>Categorical Data Deduplication<br />
Soham Pachpande and Gehan Chopade<br />
<a href="papers/TR_2022_CSE234_CategDedup.pdf" target=&ldquo;blank&rdquo;>CSE 234 Project TechReport</a> |
<a href="https://github.com/sohampachpande/data-deduplication" target=&ldquo;blank&rdquo;>Code, Data, and Pre-trained Models on GitHub</a>
</p>
</li>
</ul>
<ul>
<li><p>Bringing ML-based Feature Type Inference to OpenML<br />
Ryan Tran and Victor Zhu<br />
<a href="papers/TR_2022_CSE234_OpenML.pdf" target=&ldquo;blank&rdquo;>CSE 234 Project TechReport</a> |
<a href="https://github.com/bobotran/SortingHatLib" target=&ldquo;blank&rdquo;>Code Release on GitHub</a> |
<a href="https://pypi.org/project/sortinghatinf/" target=&ldquo;blank&rdquo;>Package on PyPi</a>
</p>
</li>
</ul>
<ul>
<li><p>Letter from the Rising Star Award Winner<br />
Arun Kumar<br />
IEEE Data Engineering Bulletin, June 2021 | <a href="http://sites.computer.org/debull/A21june/p94.pdf" target=&ldquo;blank&rdquo;>PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Improving Feature Type Inference Accuracy of TFDV with SortingHat<br />
Vraj Shah, Kevin Yang, and Arun Kumar<br />
<a href="papers/TR_2020_TFDV.pdf" target=&ldquo;blank&rdquo;>TechReport</a>
</p>
</li>
</ul>
<ul>
<li><p>ML/AI Systems and Applications: Is the SIGMOD/VLDB Community Losing Relevance?<br />
Arun Kumar<br />
Blog post on the official ACM SIGMOD Blog, 2018 |
<a href="http://wp.sigmod.org/?p=2454" target=&ldquo;blank&rdquo;>Webpage</a>
</p>
</li>
</ul>
<ul>
<li><p>Advice from PhD to Early Career<br />
Arun Kumar<br />
ACM SIGMOD 2018 New Researcher Symposium Talk |
<a href="https://sigmod2018.org/nrs_slides/kumar.pdf" target=&ldquo;blank&rdquo;>Slides</a>
</p>
</li>
</ul>
<ul>
<li><p>A Survey of the Existing Landscape of ML Systems<br />
Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M. Patel<br />
UW-Madison Technical Report TR1827 |
<a href="papers/TR_2015_MSMSSurvey.pdf" target=&ldquo;blank&rdquo;>PDF</a>
</p>
</li>
</ul>
<h2>Theses, and Dissertations</h2>
<ul>
<li><p>Simplifying Data Preparation for Machine Learning on Tabular Data<br />
Vraj Shah. PhD Dissertation. UC San Diego. 2022 | 
<a href="papers/Dissertation_VrajShah.pdf" target=&ldquo;blank&rdquo;>PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Query Optimizations for Deep Learning Systems<br />
Supun Nakandala. PhD Dissertation. UC San Diego. 2022 | 
<a href="papers/Dissertation_SupunNakandala.pdf" target=&ldquo;blank&rdquo;>PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Efficient Systems for Advanced Data Analytics<br />
Liangde Li. MS Thesis. UC San Diego. 2022 | 
<a href="papers/Thesis_LiangdeLi.pdf" target=&ldquo;blank&rdquo;>PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Write once, rewrite everywhere: A Unified Framework for Factorized Machine Learning<br />
David Justo. MS Thesis. UC San Diego. 2019 |
<a href="papers/Thesis_DavidJusto.pdf" target=&ldquo;blank&rdquo;>PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Learning Over Joins<br />
Arun Kumar. PhD Dissertation. UW-Madison. 2016 |
<a href="papers/Dissertation_ArunKumar.pdf" target=&ldquo;blank&rdquo;>PDF</a> |
<a href="http://cseweb.ucsd.edu/csevideo/Arun.Kumar.mp4" target=&ldquo;blank&rdquo;>Video of job talk at UCSD</a><br />
<tt>Wisconsin CS 2016 Graduate Student Research Award for best dissertation research</tt>
</p>
</li>
</ul>
<div id="footer">
<div id="footer-text">
Page generated 2024-07-03 22:01:46 PDT, by <a href="https://github.com/wsshin/jemdoc_mathjax" target="blank">jemdoc+MathJax</a>.
</div>
</div>
</td>
</tr>
</table>
</body>
</html>