triptych.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta name="generator" content="jemdoc, see http://jemdoc.jaboc.net/" />
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<link rel="stylesheet" href="jemdoc.css" type="text/css" />
<title>ADA Lab @ UCSD</title>
</head>
<body>
<table summary="Table for page layout." id="tlayout">
<tr valign="top">
<td id="layout-menu">
<div class="menu-item"><a href="index.html">Home</a></div>
<div class="menu-item"><a href="index.html#members">Members</a></div>
<div class="menu-item"><a href="publications.html">Publications</a></div>
<div class="menu-item"><a href="news.html">News</a></div>
<div class="menu-item"><a href="impact.html">Impact</a></div>
<div class="menu-item"><a href="blog.html">Blog/Misc.</a></div>
<div class="menu-item"><a href="projects.html"><br /> Active&nbsp;Projects</a></div>
<div class="menu-item"><a href="cerebro.html">Cerebro</a></div>
<div class="menu-category"><br /> Past Projects</div>
<div class="menu-item"><a href="sortinghat.html">SortingHat</a></div>
<div class="menu-item"><a href="speakql.html">SpeakQL</a></div>
<div class="menu-item"><a href="krypton.html">Krypton</a></div>
<div class="menu-item"><a href="vista.html">Vista</a></div>
<div class="menu-item"><a href="panorama.html">Panorama</a></div>
<div class="menu-item"><a href="morpheus.html">Morpheus</a></div>
<div class="menu-item"><a href="hamlet.html">Hamlet</a></div>
<div class="menu-item"><a href="nimbus.html">Nimbus</a></div>
<div class="menu-item"><a href="slab.html">SLAB</a></div>
<div class="menu-item"><a href="orion.html">Orion</a></div>
<div class="menu-item"><a href="http://i.stanford.edu/hazy/victor/columbus/">Columbus</a></div>
<div class="menu-item"><a href="http://i.stanford.edu/hazy/victor/bismarck/">Bismarck</a></div>
<div class="menu-item"><a href="http://i.stanford.edu/hazy/staccato/">Staccato</a></div>
</td>
<td id="layout-content">
<div id="toptitle">
<h1>ADA Lab @ UCSD</h1>
</div>
<div class="infoblock">
<div class="blockcontent">
<p><b>Note:</b> This umbrella project webpage is now deprecated. 
Please see the webpages of the active projects Cerebro and SortingHat.
</p>
</div></div>
<table class="imgtable"><tr><td>
<img src="images/triptych.jpg" alt="" width="100px" />&nbsp;</td>
<td align="left"><h2>Project Triptych</h2>
</td></tr></table>
<h3>Overview</h3>
<p>Triptych is an end-to-end <i>model selection management system</i> (MSMS) that aims to simplify
and accelerate the process of sourcing data/features and selecting ML models. Our guiding
principles are to exploit the semantics of the data and the ML task to the extent possible
to reduce work for the data scientist and reduce runtimes and costs. We apply these
principles to remove or mitigate different bottlenecks in this end-to-end process,
eventually unifying these components to yield an integrated &lsquo;&lsquo;operating system&rsquo;&rsquo; for ML
analytics tasks. Please refer to the ACM SIGMOD Record paper below for more details of
this vision.
</p>
<h3>Active Component Projects</h3>
<table class="imgtable"><tr><td>
<img src="images/cerebro.jpg" alt="" width="80px" />&nbsp;</td>
<td align="left"><p><a href="cerebro.html" target=&ldquo;blank&rdquo;><b>Cerebro</b></a><br />
Efficient and reproducible model selection on deep learning systems.
</p>
</td></tr></table>
<table class="imgtable"><tr><td>
<img src="images/morpheus.jpg" alt="" width="80px" />&nbsp;</td>
<td align="left"><p><a href="morpheus.html" target=&ldquo;blank&rdquo;><b>Morpheus</b></a><br />
Integrating linear algebra and relational algebra to simplify feature engineering for ML.
</p>
</td></tr></table>
<table class="imgtable"><tr><td>
<img src="images/sortinghat.jpg" alt="" width="80px" />&nbsp;</td>
<td align="left"><p><a href="sortinghat.html" target=&ldquo;blank&rdquo;><b>SortingHat</b></a><br />
ML schema inference and automatic data preparation.
</p>
</td></tr></table>
<h3>Publications</h3>
<ul>
<li><p>Some Damaging Delusions of Deep Learning Practice (and How to Avoid Them)<br />
Arun Kumar, Supun Nakandala, and Yuhao Zhang<br />
KDD 2021 Deep Learning Day | <a href="papers/2021_DLDelusions_KDD.pdf" target=&ldquo;blank&rdquo;>Extended Abstract PDF</a> 
| <a href="papers/2021_DLDelusions_KDD_Slides.pdf" target=&ldquo;blank&rdquo;>Talk slides</a>       
| <a href="https://www.youtube.com/watch?v=UP9__WsfSuc" target=&ldquo;blank&rdquo;>Talk video</a>
</p>
</li>
</ul>
<ul>
<li><p>Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning<br />
Side Li and Arun Kumar<br />
VLDB 2021 | <a href="papers/2021_Kingpin_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2021_Kingpin.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://www.youtube.com/watch?v=OlTknBfBmvM" target=&ldquo;blank&rdquo;>Talk video</a> | <a href="https://github.com/liside/Kingpin" target=&ldquo;blank&rdquo;>Code Release</a>
</p>
</li>
</ul>
<ul>
<li><p>Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches<br />
Yuhao Zhang, Frank McQuillan, Nandish Jayaram, Nikhil Kak, Ekta Khanna, Orhan Kislal, Domino Valdano, and Arun Kumar<br />
VLDB 2021 | <a href="papers/2021_Cerebro-DS.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2021_Cerebro-DS.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://youtu.be/SK9wTzO4K7M" target=&ldquo;blank&rdquo;>Talk video</a> | <a href="https://github.com/makemebitter/cerebro-ds/" target=&ldquo;blank&rdquo;>Code release</a>
</p>
</li>
</ul>
<ul>
<li><p>Intermittent Human-in-the-Loop Model Selection using Cerebro: A Demonstration<br />
Liangde Li, Supun Nakandala, and Arun Kumar<br />
VLDB 2021 Demo | <a href="papers/2021_Cerebro_VLDB_Demo.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2021_Intermittent_HIL_MS.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://youtu.be/K3THQy5McXc" target=&ldquo;blank&rdquo;>Video</a>
</p>
</li>
</ul>
<ul>
<li><p>Towards A Polyglot Framework for Factorized ML<br />
David Justo, Shaoqing Yi, Lukas Stadler, Nadia Polikarpova, and Arun Kumar<br />
VLDB 2021 (Industrial Track) | <a href="papers/2021_Trinity_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2021_Trinity.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://www.youtube.com/watch?v=osvBmZs2MsM" target=&ldquo;blank&rdquo;>Talk video</a> | Code coming soon
</p>
</li>
</ul>
<ul>
<li><p>Towards Benchmarking Feature Type Inference for AutoML Platforms<br />
Vraj Shah, Jonathan Lacanlale, Premanand Kumar, Kevin Yang, and Arun Kumar<br />
ACM SIGMOD 2021 | <a href="papers/2021_SortingHat_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2021_SortingHat.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | Talk Videos: <a href="https://youtu.be/KAs-uU59AEM" target=&ldquo;blank&rdquo;>Short Talk</a> <a href="https://youtu.be/dpx74zQyU3k" target=&ldquo;blank&rdquo;>Long Talk</a> | <a href="https://github.com/pvn25/ML-Data-Prep-Zoo/tree/master/MLFeatureTypeInference" target=&ldquo;blank&rdquo;>Data, Code, and Pre-trained Models on GitHub</a> | <a href="https://github.com/pvn25/ML-Data-Prep-Zoo/tree/master/MLFeatureTypeInference/Library" target=&ldquo;blank&rdquo;>Python library</a>
</p>
</li>
</ul>
<ul>
<li><p>The CNN Hip Accelerometer Posture (CHAP) Method for Classifying Sitting Patterns from Hip Accelerometers: A Validation Study<br />
Mikael Anne Greenwood-Hickman, Supun Nakandala, Marta M. Jankowska, Fatima Tuz-Zahra, John Bellettiere, Jordan Carlson, Paul R. Hibbing, Jingjing Zou, Andrea Z. LaCroix, Arun Kumar, and Loki Natarajan<br />
Medicine and Science in Sports and Exercise Journal, 2021 | Paper PDF coming soon | <a href="https://github.com/ADALabUCSD/DeepPostures" target=&ldquo;blank&rdquo;>Code</a>
</p>
</li>
</ul>
<ul>
<li><p>Application of Convolutional Neural Network Algorithms for Advancing Sedentary and Activity Bout Classification<br />
Supun Nakandala, Marta Jankowska, Fatima Tuz-Zahra, John Bellettiere, Jordan Carlson, Andrea LaCroix, Sheri Hartman, Dori Rosenberg, Jingjing Zou, Arun Kumar, and Loki Natarajan<br />
Journal for the Measurement of Physical Behaviour, 2021 | <a href="papers/2021_JMPB_CNN.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2021_JMPB_CNN.txt" target=&ldquo;blank&rdquo;>BibTeX</a> | <a href="https://github.com/ADALabUCSD/DeepPostures" target=&ldquo;blank&rdquo;>Code</a>
</p>
</li>
</ul>
<ul>
<li><p>Cerebro: A Layered Data Platform for Scalable Deep Learning<br />
Arun Kumar, Supun Nakandala, Yuhao Zhang, Side Li, Advitya Gemawat, and Kabir Nagrecha<br />
CIDR 2021 (Vision paper) | <a href="papers/2021_Cerebro_CIDR.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2021_Cerebro_CIDR.txt" target=&ldquo;blank&rdquo;>BibTeX</a> | <a href="https://www.youtube.com/watch?v=8QfMvdlmdic" target=&ldquo;blank&rdquo;>Talk video</a>
</p>
</li>
</ul>
<ul>
<li><p>Cerebro: A Data System for Optimized Deep Learning Model Selection<br />
Supun Nakandala, Yuhao Zhang, and Arun Kumar<br />
VLDB 2020 | <a href="papers/2020_Cerebro_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2020_Cerebro_VLDB.txt" target=&ldquo;blank&rdquo;>BibTeX</a> | <a href="papers/2020_Cerebro_VLDB_Errata.pdf" target=&ldquo;blank&rdquo;>Errata</a> | <a href="papers/TR_2020_Cerebro.pdf" target=&ldquo;blank&rdquo;>TechReport</a>
| Talk videos: <a href="https://www.youtube.com/watch?v=8PJic5FStGs" target=&ldquo;blank&rdquo;>Youtube</a> <a href="https://www.bilibili.com/video/av329339128?p=198" target=&ldquo;blank&rdquo;>Bilibili</a>
| <a href="https://adalabucsd.github.io/research-blog/cerebro.html" target=&ldquo;blank&rdquo;>Blog post</a> | <a href="https://databricks.com/session_na20/resource-efficient-deep-learning-model-selection-on-apache-spark" target=&ldquo;blank&rdquo;>SAIS Talk video</a>
| <a href="https://adalabucsd.github.io/cerebro-system/" target=&ldquo;blank&rdquo;>Source code and documentation</a>
</p>
</li>
</ul>
<ul>
<li><p>Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra<br />
Side Li, Lingjiao Chen, and Arun Kumar<br />
ACM SIGMOD 2019 | <a href="papers/2019_MorpheusFI_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2019_MorpheusFI_SIGMOD.txt" target=&ldquo;blank&rdquo;>BibTeX</a> | <a href="https://github.com/liside/MorpheusFI" target=&ldquo;blank&rdquo;>Code and Data on Github</a>
</p>
</li>
</ul>
<ul>
<li><p>Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent<br />
Fengan Li, Lingjiao Chen, Yijing Zeng, Arun Kumar, Jeffrey Naughton, Jignesh Patel, and Xi Wu<br />
ACM SIGMOD 2019 | <a href="papers/2019_TOC_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2019_TOC.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="https://github.com/fenganli/toc-release-code" target=&ldquo;blank&rdquo;>Code on GitHub</a>
</p>
</li>
</ul>
<ul>
<li><p>Model-based Pricing for Machine Learning in a Data Marketplace<br />
Lingjiao Chen, Paraschos Koutris, and Arun Kumar<br />
ACM SIGMOD 2019 | <a href="papers/2019_Nimbus_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | <a href="papers/TR_2018_Nimbus.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | Code and Data coming soon
</p>
</li>
</ul>
<ul>
<li><p>Cerebro: Efficient and Reproducible Model Selection on Deep Learning Systems<br />
Supun Nakandala, Yuhao Zhang, and Arun Kumar<br />
ACM SIGMOD 2019 DEEM Workshop | <a href="papers/2019_Cerebro_DEEM.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2019_Cerebro_DEEM.txt" target=&ldquo;blank&rdquo;>BibTeX</a> | <a href="papers/TR_2019_Cerebro.pdf" target=&ldquo;blank&rdquo;>TechReport</a>
| <a href="https://adalabucsd.github.io/research-blog/cerebro.html" target=&ldquo;blank&rdquo;>Blog post</a>
</p>
</li>
</ul>
<ul>
<li><p>The ML Data Prep Zoo: Towards Semi-Automatic Data Preparation for ML<br />
Vraj Shah and Arun Kumar<br />
ACM SIGMOD 2019 DEEM Workshop | <a href="papers/2019_DataPrepZoo_DEEM.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2019_SortingHat_SIGMOD.txt" target=&ldquo;blank&rdquo;>BibTeX</a> | <a href="papers/TR_2019_DataPrepZoo.pdf" target=&ldquo;blank&rdquo;>TechReport</a> 
| <a href="https://adalabucsd.github.io/research-blog/research/2019/06/21/mldataprepzoo.html" target=&ldquo;blank&rdquo;>Blog post</a>
| <a href="https://github.com/pvn25/ML-Data-Prep-Zoo" target=&ldquo;blank&rdquo;>Data Prep Zoo Repository on GitHub</a>
</p>
</li>
</ul>
<ul>
<li><p>Demonstration of Nimbus: Model-based Pricing for Machine Learning in a Data Marketplace<br />
Lingjiao Chen, Hongyi Wang, Leshang Chen, Paraschos Koutris, and Arun Kumar<br />
ACM SIGMOD 2019 Demo | <a href="papers/2019_NimbusDemo_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> | Video coming soon
</p>
</li>
</ul>
<ul>
<li><p>A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics<br />
Anthony Thomas and Arun Kumar<br />
VLDB 2018/2019  | <a href="papers/2019_SLAB_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> |
<a href="papers/TR_2018_SLAB.pdf" target=&ldquo;blank&rdquo;>TechReport</a> | <a href="slab.html" target=&ldquo;blank&rdquo;>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers?<br />
Vraj Shah, Arun Kumar, and Xiaojin Zhu.<br />
VLDB 2018 |
<a href="papers/2018_Hamlet_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2018_Hamlet_VLDB.txt" target=&ldquo;blank&rdquo;>BibTeX</a>|
<a href="papers/TR_2017_HamletPlusPlus.pdf" target=&ldquo;blank&rdquo;>TechReport</a> |
<a href="hamlet.html" target=&ldquo;blank&rdquo;>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Towards Linear Algebra over Normalized Data<br />
Lingjiao Chen, Arun Kumar, Jeffrey Naughton, and Jignesh Patel<br />
VLDB 2017 |
<a href="papers/2017_Morpheus_VLDB.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> |
<a href="papers/TR_2017_Morpheus.pdf" target=&ldquo;blank&rdquo;>TechReport</a> |
<a href="morpheus.html" target=&ldquo;blank&rdquo;>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Model-based Pricing: Do Not Pay for More than What You Learn!<br />
Lingjiao Chen, Paraschos Koutris, and Arun Kumar<br />
ACM SIGMOD 2017 DEEM Workshop |
<a href="papers/2017_Nimbus_DEEM.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Cerebro: A System to Manage Deep Learning for Relational Data Analytics<br />
Arun Kumar<br />
CIDR 2017 Abstract |
<a href="papers/2017_Cerebro_CIDR.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>To Join or Not to Join? Thinking Twice about Joins before Feature Selection<br />
Arun Kumar, Jeffrey Naughton, Jignesh M. Patel, and Xiaojin Zhu<br />
ACM SIGMOD 2016 |
<a href="papers/2016_Hamlet_SIGMOD.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a> and <a href="papers/2016_Hamlet_SIGMOD.txt" target=&ldquo;blank&rdquo;>BibTeX</a> |
<a href="papers/TR_2016_Hamlet.pdf" target=&ldquo;blank&rdquo;>TechReport</a> |
<a href="hamlet.html" target=&ldquo;blank&rdquo;>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Model Selection Management Systems: The Next Frontier of Advanced Analytics<br />
Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M. Patel<br />
ACM SIGMOD Record Dec 2015 Vision Track |
<a href="papers/2015_MSMS_SIGMODRecord.pdf" target=&ldquo;blank&rdquo;>Paper PDF</a>
</p>
</li>
</ul>
<h3>Technical Reports</h3>
<ul>
<li><p>How do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses<br />
Vraj Shah, Thomas Parashos, and Arun Kumar<br /> 
Under submission | <a href="papers/TR_2021_CategDedup.pdf" target=&ldquo;blank&rdquo;>TechReport</a>
</p>
</li>
</ul>
<ul>
<li><p>Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets<br />
Supun Nakandala and Arun Kumar<br />
Under submission | <a href="papers/TR_2021_Nautilus.pdf" target=&ldquo;blank&rdquo;>TechReport</a>
</p>
</li>
</ul>
<ul>
<li><p>SystemX: A Scalable and Optimized Data System for Large Multi-Model Deep Learning<br />
Kabir Nagrecha and Arun Kumar<br />
Under submission | <a href="papers/TR_2021_SystemX.pdf" target=&ldquo;blank&rdquo;>TechReport</a>
</p>
</li>
</ul>
<ul>
<li><p>Improving Feature Type Inference Accuracy of TFDV with SortingHat<br />
Vraj Shah, Kevin Yang, and Arun Kumar<br />
<a href="papers/TR_2020_TFDV.pdf" target=&ldquo;blank&rdquo;>TechReport</a>
</p>
</li>
</ul>
<h3>Past Projects</h3>
<table class="imgtable"><tr><td>
<img src="images/hamlet.jpg" alt="" width="80px" />&nbsp;</td>
<td align="left"><p><a href="hamlet.html" target=&ldquo;blank&rdquo;><b>Hamlet</b></a><br />
Exploiting database schema information to simplify data sourcing.
</p>
</td></tr></table>
<table class="imgtable"><tr><td>
<img src="images/nimbus.jpg" alt="" width="80px" />&nbsp;</td>
<td align="left"><p><a href="nimbus.html" target=&ldquo;blank&rdquo;><b>Nimbus</b></a><br />
Enabling the first ML-aware cloud-based commodity market for the new black gold: training data.
</p>
</td></tr></table>
<table class="imgtable"><tr><td>
<img src="images/slab.jpg" alt="" width="80px" />&nbsp;</td>
<td align="left"><p><a href="slab.html" target=&ldquo;blank&rdquo;><b>SLAB</b></a><br />
The first comprehensive benchmark comparison of scalable linear algebra systems.
</p>
</td></tr></table>
<div id="footer">
<div id="footer-text">
Page generated 2024-04-25 10:52:35 PDT, by <a href="https://github.com/wsshin/jemdoc_mathjax" target="blank">jemdoc+MathJax</a>.
</div>
</div>
</td>
</tr>
</table>
</body>
</html>