-
Notifications
You must be signed in to change notification settings - Fork 3
/
triptych.html
320 lines (320 loc) · 17.3 KB
/
triptych.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta name="generator" content="jemdoc, see http://jemdoc.jaboc.net/" />
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<link rel="stylesheet" href="jemdoc.css" type="text/css" />
<title>ADA Lab @ UCSD</title>
</head>
<body>
<table summary="Table for page layout." id="tlayout">
<tr valign="top">
<td id="layout-menu">
<div class="menu-item"><a href="index.html">Home</a></div>
<div class="menu-item"><a href="index.html#members">Members</a></div>
<div class="menu-item"><a href="publications.html">Publications</a></div>
<div class="menu-item"><a href="news.html">News</a></div>
<div class="menu-item"><a href="impact.html">Impact</a></div>
<div class="menu-item"><a href="blog.html">Blog/Misc.</a></div>
<div class="menu-item"><a href="projects.html"><br /> Active Projects</a></div>
<div class="menu-item"><a href="cerebro.html">Cerebro</a></div>
<div class="menu-category"><br /> Past Projects</div>
<div class="menu-item"><a href="sortinghat.html">SortingHat</a></div>
<div class="menu-item"><a href="speakql.html">SpeakQL</a></div>
<div class="menu-item"><a href="krypton.html">Krypton</a></div>
<div class="menu-item"><a href="vista.html">Vista</a></div>
<div class="menu-item"><a href="panorama.html">Panorama</a></div>
<div class="menu-item"><a href="morpheus.html">Morpheus</a></div>
<div class="menu-item"><a href="hamlet.html">Hamlet</a></div>
<div class="menu-item"><a href="nimbus.html">Nimbus</a></div>
<div class="menu-item"><a href="slab.html">SLAB</a></div>
<div class="menu-item"><a href="orion.html">Orion</a></div>
<div class="menu-item"><a href="http://i.stanford.edu/hazy/victor/columbus/">Columbus</a></div>
<div class="menu-item"><a href="http://i.stanford.edu/hazy/victor/bismarck/">Bismarck</a></div>
<div class="menu-item"><a href="http://i.stanford.edu/hazy/staccato/">Staccato</a></div>
</td>
<td id="layout-content">
<div id="toptitle">
<h1>ADA Lab @ UCSD</h1>
</div>
<div class="infoblock">
<div class="blockcontent">
<p><b>Note:</b> This umbrella project webpage is now deprecated.
Please see the webpages of the active projects Cerebro and SortingHat.
</p>
</div></div>
<table class="imgtable"><tr><td>
<img src="images/triptych.jpg" alt="" width="100px" /> </td>
<td align="left"><h2>Project Triptych</h2>
</td></tr></table>
<h3>Overview</h3>
<p>Triptych is an end-to-end <i>model selection management system</i> (MSMS) that aims to simplify
and accelerate the process of sourcing data/features and selecting ML models. Our guiding
principles are to exploit the semantics of the data and the ML task to the extent possible
to reduce work for the data scientist and reduce runtimes and costs. We apply these
principles to remove or mitigate different bottlenecks in this end-to-end process,
eventually unifying these components to yield an integrated ‘‘operating system’’ for ML
analytics tasks. Please refer to the ACM SIGMOD Record paper below for more details of
this vision.
</p>
<h3>Active Component Projects</h3>
<table class="imgtable"><tr><td>
<img src="images/cerebro.jpg" alt="" width="80px" /> </td>
<td align="left"><p><a href="cerebro.html" target=“blank”><b>Cerebro</b></a><br />
Efficient and reproducible model selection on deep learning systems.
</p>
</td></tr></table>
<table class="imgtable"><tr><td>
<img src="images/morpheus.jpg" alt="" width="80px" /> </td>
<td align="left"><p><a href="morpheus.html" target=“blank”><b>Morpheus</b></a><br />
Integrating linear algebra and relational algebra to simplify feature engineering for ML.
</p>
</td></tr></table>
<table class="imgtable"><tr><td>
<img src="images/sortinghat.jpg" alt="" width="80px" /> </td>
<td align="left"><p><a href="sortinghat.html" target=“blank”><b>SortingHat</b></a><br />
ML schema inference and automatic data preparation.
</p>
</td></tr></table>
<h3>Publications</h3>
<ul>
<li><p>Some Damaging Delusions of Deep Learning Practice (and How to Avoid Them)<br />
Arun Kumar, Supun Nakandala, and Yuhao Zhang<br />
KDD 2021 Deep Learning Day | <a href="papers/2021_DLDelusions_KDD.pdf" target=“blank”>Extended Abstract PDF</a>
| <a href="papers/2021_DLDelusions_KDD_Slides.pdf" target=“blank”>Talk slides</a>
| <a href="https://www.youtube.com/watch?v=UP9__WsfSuc" target=“blank”>Talk video</a>
</p>
</li>
</ul>
<ul>
<li><p>Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning<br />
Side Li and Arun Kumar<br />
VLDB 2021 | <a href="papers/2021_Kingpin_VLDB.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2021_Kingpin.pdf" target=“blank”>TechReport</a> | <a href="https://www.youtube.com/watch?v=OlTknBfBmvM" target=“blank”>Talk video</a> | <a href="https://github.com/liside/Kingpin" target=“blank”>Code Release</a>
</p>
</li>
</ul>
<ul>
<li><p>Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches<br />
Yuhao Zhang, Frank McQuillan, Nandish Jayaram, Nikhil Kak, Ekta Khanna, Orhan Kislal, Domino Valdano, and Arun Kumar<br />
VLDB 2021 | <a href="papers/2021_Cerebro-DS.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2021_Cerebro-DS.pdf" target=“blank”>TechReport</a> | <a href="https://youtu.be/SK9wTzO4K7M" target=“blank”>Talk video</a> | <a href="https://github.com/makemebitter/cerebro-ds/" target=“blank”>Code release</a>
</p>
</li>
</ul>
<ul>
<li><p>Intermittent Human-in-the-Loop Model Selection using Cerebro: A Demonstration<br />
Liangde Li, Supun Nakandala, and Arun Kumar<br />
VLDB 2021 Demo | <a href="papers/2021_Cerebro_VLDB_Demo.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2021_Intermittent_HIL_MS.pdf" target=“blank”>TechReport</a> | <a href="https://youtu.be/K3THQy5McXc" target=“blank”>Video</a>
</p>
</li>
</ul>
<ul>
<li><p>Towards A Polyglot Framework for Factorized ML<br />
David Justo, Shaoqing Yi, Lukas Stadler, Nadia Polikarpova, and Arun Kumar<br />
VLDB 2021 (Industrial Track) | <a href="papers/2021_Trinity_VLDB.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2021_Trinity.pdf" target=“blank”>TechReport</a> | <a href="https://www.youtube.com/watch?v=osvBmZs2MsM" target=“blank”>Talk video</a> | Code coming soon
</p>
</li>
</ul>
<ul>
<li><p>Towards Benchmarking Feature Type Inference for AutoML Platforms<br />
Vraj Shah, Jonathan Lacanlale, Premanand Kumar, Kevin Yang, and Arun Kumar<br />
ACM SIGMOD 2021 | <a href="papers/2021_SortingHat_SIGMOD.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2021_SortingHat.pdf" target=“blank”>TechReport</a> | Talk Videos: <a href="https://youtu.be/KAs-uU59AEM" target=“blank”>Short Talk</a> <a href="https://youtu.be/dpx74zQyU3k" target=“blank”>Long Talk</a> | <a href="https://github.com/pvn25/ML-Data-Prep-Zoo/tree/master/MLFeatureTypeInference" target=“blank”>Data, Code, and Pre-trained Models on GitHub</a> | <a href="https://github.com/pvn25/ML-Data-Prep-Zoo/tree/master/MLFeatureTypeInference/Library" target=“blank”>Python library</a>
</p>
</li>
</ul>
<ul>
<li><p>The CNN Hip Accelerometer Posture (CHAP) Method for Classifying Sitting Patterns from Hip Accelerometers: A Validation Study<br />
Mikael Anne Greenwood-Hickman, Supun Nakandala, Marta M. Jankowska, Fatima Tuz-Zahra, John Bellettiere, Jordan Carlson, Paul R. Hibbing, Jingjing Zou, Andrea Z. LaCroix, Arun Kumar, and Loki Natarajan<br />
Medicine and Science in Sports and Exercise Journal, 2021 | Paper PDF coming soon | <a href="https://github.com/ADALabUCSD/DeepPostures" target=“blank”>Code</a>
</p>
</li>
</ul>
<ul>
<li><p>Application of Convolutional Neural Network Algorithms for Advancing Sedentary and Activity Bout Classification<br />
Supun Nakandala, Marta Jankowska, Fatima Tuz-Zahra, John Bellettiere, Jordan Carlson, Andrea LaCroix, Sheri Hartman, Dori Rosenberg, Jingjing Zou, Arun Kumar, and Loki Natarajan<br />
Journal for the Measurement of Physical Behaviour, 2021 | <a href="papers/2021_JMPB_CNN.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2021_JMPB_CNN.txt" target=“blank”>BibTeX</a> | <a href="https://github.com/ADALabUCSD/DeepPostures" target=“blank”>Code</a>
</p>
</li>
</ul>
<ul>
<li><p>Cerebro: A Layered Data Platform for Scalable Deep Learning<br />
Arun Kumar, Supun Nakandala, Yuhao Zhang, Side Li, Advitya Gemawat, and Kabir Nagrecha<br />
CIDR 2021 (Vision paper) | <a href="papers/2021_Cerebro_CIDR.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2021_Cerebro_CIDR.txt" target=“blank”>BibTeX</a> | <a href="https://www.youtube.com/watch?v=8QfMvdlmdic" target=“blank”>Talk video</a>
</p>
</li>
</ul>
<ul>
<li><p>Cerebro: A Data System for Optimized Deep Learning Model Selection<br />
Supun Nakandala, Yuhao Zhang, and Arun Kumar<br />
VLDB 2020 | <a href="papers/2020_Cerebro_VLDB.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2020_Cerebro_VLDB.txt" target=“blank”>BibTeX</a> | <a href="papers/2020_Cerebro_VLDB_Errata.pdf" target=“blank”>Errata</a> | <a href="papers/TR_2020_Cerebro.pdf" target=“blank”>TechReport</a>
| Talk videos: <a href="https://www.youtube.com/watch?v=8PJic5FStGs" target=“blank”>Youtube</a> <a href="https://www.bilibili.com/video/av329339128?p=198" target=“blank”>Bilibili</a>
| <a href="https://adalabucsd.github.io/research-blog/cerebro.html" target=“blank”>Blog post</a> | <a href="https://databricks.com/session_na20/resource-efficient-deep-learning-model-selection-on-apache-spark" target=“blank”>SAIS Talk video</a>
| <a href="https://adalabucsd.github.io/cerebro-system/" target=“blank”>Source code and documentation</a>
</p>
</li>
</ul>
<ul>
<li><p>Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra<br />
Side Li, Lingjiao Chen, and Arun Kumar<br />
ACM SIGMOD 2019 | <a href="papers/2019_MorpheusFI_SIGMOD.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2019_MorpheusFI_SIGMOD.txt" target=“blank”>BibTeX</a> | <a href="https://github.com/liside/MorpheusFI" target=“blank”>Code and Data on Github</a>
</p>
</li>
</ul>
<ul>
<li><p>Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent<br />
Fengan Li, Lingjiao Chen, Yijing Zeng, Arun Kumar, Jeffrey Naughton, Jignesh Patel, and Xi Wu<br />
ACM SIGMOD 2019 | <a href="papers/2019_TOC_SIGMOD.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2019_TOC.pdf" target=“blank”>TechReport</a> | <a href="https://github.com/fenganli/toc-release-code" target=“blank”>Code on GitHub</a>
</p>
</li>
</ul>
<ul>
<li><p>Model-based Pricing for Machine Learning in a Data Marketplace<br />
Lingjiao Chen, Paraschos Koutris, and Arun Kumar<br />
ACM SIGMOD 2019 | <a href="papers/2019_Nimbus_SIGMOD.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2018_Nimbus.pdf" target=“blank”>TechReport</a> | Code and Data coming soon
</p>
</li>
</ul>
<ul>
<li><p>Cerebro: Efficient and Reproducible Model Selection on Deep Learning Systems<br />
Supun Nakandala, Yuhao Zhang, and Arun Kumar<br />
ACM SIGMOD 2019 DEEM Workshop | <a href="papers/2019_Cerebro_DEEM.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2019_Cerebro_DEEM.txt" target=“blank”>BibTeX</a> | <a href="papers/TR_2019_Cerebro.pdf" target=“blank”>TechReport</a>
| <a href="https://adalabucsd.github.io/research-blog/cerebro.html" target=“blank”>Blog post</a>
</p>
</li>
</ul>
<ul>
<li><p>The ML Data Prep Zoo: Towards Semi-Automatic Data Preparation for ML<br />
Vraj Shah and Arun Kumar<br />
ACM SIGMOD 2019 DEEM Workshop | <a href="papers/2019_DataPrepZoo_DEEM.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2019_SortingHat_SIGMOD.txt" target=“blank”>BibTeX</a> | <a href="papers/TR_2019_DataPrepZoo.pdf" target=“blank”>TechReport</a>
| <a href="https://adalabucsd.github.io/research-blog/research/2019/06/21/mldataprepzoo.html" target=“blank”>Blog post</a>
| <a href="https://github.com/pvn25/ML-Data-Prep-Zoo" target=“blank”>Data Prep Zoo Repository on GitHub</a>
</p>
</li>
</ul>
<ul>
<li><p>Demonstration of Nimbus: Model-based Pricing for Machine Learning in a Data Marketplace<br />
Lingjiao Chen, Hongyi Wang, Leshang Chen, Paraschos Koutris, and Arun Kumar<br />
ACM SIGMOD 2019 Demo | <a href="papers/2019_NimbusDemo_SIGMOD.pdf" target=“blank”>Paper PDF</a> | Video coming soon
</p>
</li>
</ul>
<ul>
<li><p>A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics<br />
Anthony Thomas and Arun Kumar<br />
VLDB 2018/2019 | <a href="papers/2019_SLAB_VLDB.pdf" target=“blank”>Paper PDF</a> |
<a href="papers/TR_2018_SLAB.pdf" target=“blank”>TechReport</a> | <a href="slab.html" target=“blank”>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers?<br />
Vraj Shah, Arun Kumar, and Xiaojin Zhu.<br />
VLDB 2018 |
<a href="papers/2018_Hamlet_VLDB.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2018_Hamlet_VLDB.txt" target=“blank”>BibTeX</a>|
<a href="papers/TR_2017_HamletPlusPlus.pdf" target=“blank”>TechReport</a> |
<a href="hamlet.html" target=“blank”>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Towards Linear Algebra over Normalized Data<br />
Lingjiao Chen, Arun Kumar, Jeffrey Naughton, and Jignesh Patel<br />
VLDB 2017 |
<a href="papers/2017_Morpheus_VLDB.pdf" target=“blank”>Paper PDF</a> |
<a href="papers/TR_2017_Morpheus.pdf" target=“blank”>TechReport</a> |
<a href="morpheus.html" target=“blank”>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Model-based Pricing: Do Not Pay for More than What You Learn!<br />
Lingjiao Chen, Paraschos Koutris, and Arun Kumar<br />
ACM SIGMOD 2017 DEEM Workshop |
<a href="papers/2017_Nimbus_DEEM.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Cerebro: A System to Manage Deep Learning for Relational Data Analytics<br />
Arun Kumar<br />
CIDR 2017 Abstract |
<a href="papers/2017_Cerebro_CIDR.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>To Join or Not to Join? Thinking Twice about Joins before Feature Selection<br />
Arun Kumar, Jeffrey Naughton, Jignesh M. Patel, and Xiaojin Zhu<br />
ACM SIGMOD 2016 |
<a href="papers/2016_Hamlet_SIGMOD.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2016_Hamlet_SIGMOD.txt" target=“blank”>BibTeX</a> |
<a href="papers/TR_2016_Hamlet.pdf" target=“blank”>TechReport</a> |
<a href="hamlet.html" target=“blank”>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Model Selection Management Systems: The Next Frontier of Advanced Analytics<br />
Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M. Patel<br />
ACM SIGMOD Record Dec 2015 Vision Track |
<a href="papers/2015_MSMS_SIGMODRecord.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<h3>Technical Reports</h3>
<ul>
<li><p>How do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses<br />
Vraj Shah, Thomas Parashos, and Arun Kumar<br />
Under submission | <a href="papers/TR_2021_CategDedup.pdf" target=“blank”>TechReport</a>
</p>
</li>
</ul>
<ul>
<li><p>Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets<br />
Supun Nakandala and Arun Kumar<br />
Under submission | <a href="papers/TR_2021_Nautilus.pdf" target=“blank”>TechReport</a>
</p>
</li>
</ul>
<ul>
<li><p>SystemX: A Scalable and Optimized Data System for Large Multi-Model Deep Learning<br />
Kabir Nagrecha and Arun Kumar<br />
Under submission | <a href="papers/TR_2021_SystemX.pdf" target=“blank”>TechReport</a>
</p>
</li>
</ul>
<ul>
<li><p>Improving Feature Type Inference Accuracy of TFDV with SortingHat<br />
Vraj Shah, Kevin Yang, and Arun Kumar<br />
<a href="papers/TR_2020_TFDV.pdf" target=“blank”>TechReport</a>
</p>
</li>
</ul>
<h3>Past Projects</h3>
<table class="imgtable"><tr><td>
<img src="images/hamlet.jpg" alt="" width="80px" /> </td>
<td align="left"><p><a href="hamlet.html" target=“blank”><b>Hamlet</b></a><br />
Exploiting database schema information to simplify data sourcing.
</p>
</td></tr></table>
<table class="imgtable"><tr><td>
<img src="images/nimbus.jpg" alt="" width="80px" /> </td>
<td align="left"><p><a href="nimbus.html" target=“blank”><b>Nimbus</b></a><br />
Enabling the first ML-aware cloud-based commodity market for the new black gold: training data.
</p>
</td></tr></table>
<table class="imgtable"><tr><td>
<img src="images/slab.jpg" alt="" width="80px" /> </td>
<td align="left"><p><a href="slab.html" target=“blank”><b>SLAB</b></a><br />
The first comprehensive benchmark comparison of scalable linear algebra systems.
</p>
</td></tr></table>
<div id="footer">
<div id="footer-text">
Page generated 2024-04-25 10:52:35 PDT, by <a href="https://github.com/wsshin/jemdoc_mathjax" target="blank">jemdoc+MathJax</a>.
</div>
</div>
</td>
</tr>
</table>
</body>
</html>