-
Notifications
You must be signed in to change notification settings - Fork 3
/
publications.html
690 lines (690 loc) · 34.9 KB
/
publications.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta name="generator" content="jemdoc, see http://jemdoc.jaboc.net/" />
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<link rel="stylesheet" href="jemdoc.css" type="text/css" />
<title>ADA Lab @ UCSD</title>
</head>
<body>
<table summary="Table for page layout." id="tlayout">
<tr valign="top">
<td id="layout-menu">
<div class="menu-item"><a href="index.html">Home</a></div>
<div class="menu-item"><a href="index.html#members">Members</a></div>
<div class="menu-item"><a href="publications.html" class="current">Publications</a></div>
<div class="menu-item"><a href="news.html">News</a></div>
<div class="menu-item"><a href="impact.html">Impact</a></div>
<div class="menu-item"><a href="blog.html">Blog/Misc.</a></div>
<div class="menu-item"><a href="projects.html"><br /> Active Projects</a></div>
<div class="menu-item"><a href="cerebro.html">Cerebro</a></div>
<div class="menu-category"><br /> Past Projects</div>
<div class="menu-item"><a href="sortinghat.html">SortingHat</a></div>
<div class="menu-item"><a href="speakql.html">SpeakQL</a></div>
<div class="menu-item"><a href="krypton.html">Krypton</a></div>
<div class="menu-item"><a href="vista.html">Vista</a></div>
<div class="menu-item"><a href="panorama.html">Panorama</a></div>
<div class="menu-item"><a href="morpheus.html">Morpheus</a></div>
<div class="menu-item"><a href="hamlet.html">Hamlet</a></div>
<div class="menu-item"><a href="nimbus.html">Nimbus</a></div>
<div class="menu-item"><a href="slab.html">SLAB</a></div>
<div class="menu-item"><a href="orion.html">Orion</a></div>
<div class="menu-item"><a href="http://i.stanford.edu/hazy/victor/columbus/">Columbus</a></div>
<div class="menu-item"><a href="http://i.stanford.edu/hazy/victor/bismarck/">Bismarck</a></div>
<div class="menu-item"><a href="http://i.stanford.edu/hazy/staccato/">Staccato</a></div>
</td>
<td id="layout-content">
<div id="toptitle">
<h1>ADA Lab @ UCSD</h1>
</div>
<h2>Peer-reviewed Publications</h2>
<ul>
<li><p>How do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses<br />
Vraj Shah, Thomas Parashos, and Arun Kumar<br />
VLDB 2024 | <a href="papers/2024_CategDedup_VLDB.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2023_CategDedup.pdf" target=“blank”>TechReport</a> | Code and Data coming soon
</p>
</li>
</ul>
<ul>
<li><p>Saturn: An Optimized Data System for Multi-Large-Model Deep Learning Workloads<br />
Kabir Nagrecha and Arun Kumar<br />
VLDB 2024 | <a href="papers/2024_Saturn_VLDB.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2023_Saturn.pdf" target=“blank”>TechReport</a> | <a href="https://saturn.readthedocs.io/en/latest/index.html" target=“blank”>Code and Docs Release</a>
</p>
</li>
</ul>
<ul>
<li><p>Lotan: Bridging the Gap between GNNs and Scalable Graph Analytics Engines<br />
Yuhao Zhang and Arun Kumar<br />
VLDB 2023 | <a href="papers/2023_Lotan_VLDB.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2023_Lotan.pdf" target=“blank”>TechReport</a> | <a href="https://github.com/makemebitter/lotan" target=“blank”>Code Release</a> | <a href="https://adalabucsd.github.io/research-blog/lotan.html" target=“blank”>Blog post</a>
</p>
</li>
</ul>
<ul>
<li><p>Low movement, deep-learned sitting patterns, and sedentary behavior in the International Study of Childhood Obesity, Lifestyle, and the Environment (ISCOLE)<br />
Paul R. Hibbing et al. (12 authors)<br />
International Journal of Obesity 2023 | <a href="papers/2023_ISCOLE_IJO.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Database-Aware ASR Error Correction for Speech-to-SQL Parsing<br />
Yutong Shao, Arun Kumar, and Ndapandula Nakashole<br />
IEEE ICASSP 2023 | <a href="papers/2023_SpeakQL_ICASSP.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>CHAP-child: An open source method for estimating sit-to-stand transitions and sedentary bout patterns from hip accelerometers among children<br />
Jordan A. Carlson et al. (15 authors)<br />
International Journal of Behavioral Nutrition and Physical Activity 2022 | <a href="papers/2022_JBNPA_CHAP.pdf" target=“blank”>Paper PDF</a> | <a href="https://adalabucsd.github.io/DeepPostures" target=“blank”>Code, Models, and Documentation</a>
</p>
</li>
</ul>
<ul>
<li><p>Structured Data Representation in Natural Language Interfaces<br />
Yutong Shao, Arun Kumar, and Ndapandula Nakashole<br />
IEEE Data Engineering Bulletin 2022 (Invited) | <a href="papers/2022_SpeakQL_DataEngBulletin.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>CHAP-Adult: A Reliable and Valid Algorithm to Classify Sitting and Measure Sitting Patterns Using Data from Hip-Worn Accelerometers in Adults Aged 35+<br />
John Bellettiere et al. (14 authors)<br />
Journal for the Measurement of Physical Behaviour 2022 | <a href="papers/2022_JMPB_CHAP.pdf" target=“blank”>PDF</a> | <a href="https://adalabucsd.github.io/DeepPostures" target=“blank”>Code, Models, and Documentation</a>
</p>
</li>
</ul>
<ul>
<li><p>VLDB Scalable Data Science Category: The Inaugural Year<br />
Arun Kumar, Alon Halevy, and Nesime Tatbul<br />
ACM SIGMOD Record 2022 | <a href="papers/2022_SDS_SIGMODRecord.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets<br />
Supun Nakandala and Arun Kumar<br />
SIGMOD 2022 | <a href="papers/2022_Nautilus_SIGMOD.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2021_Nautilus.pdf" target=“blank”>TechReport</a> | <a href="https://github.com/ADALabUCSD/Nautilus" target=“blank”>Code Release</a>
</p>
</li>
</ul>
<ul>
<li><p>VLDB Panel Summary: “The Future of Data(base) Education: Is the Cow Book Dead?”<br />
Zachary Ives, Johannes Gehrke, Jana Giceva, Arun Kumar, and Rachel Pottinger<br />
ACM SIGMOD Record 2021 | <a href="papers/2021_DBEd_SIGMODRecord.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Some Damaging Delusions of Deep Learning Practice (and How to Avoid Them)<br />
Arun Kumar, Supun Nakandala, and Yuhao Zhang<br />
KDD 2021 Deep Learning Day | <a href="papers/2021_DLDelusions_KDD.pdf" target=“blank”>Extended Abstract PDF</a>
| <a href="papers/2021_DLDelusions_KDD_Slides.pdf" target=“blank”>Talk slides</a>
| <a href="https://www.youtube.com/watch?v=UP9__WsfSuc" target=“blank”>Talk video</a>
</p>
</li>
</ul>
<ul>
<li><p>Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning<br />
Side Li and Arun Kumar<br />
VLDB 2021 | <a href="papers/2021_Kingpin_VLDB.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2021_Kingpin.pdf" target=“blank”>TechReport</a> | <a href="https://www.youtube.com/watch?v=OlTknBfBmvM" target=“blank”>Talk video</a> | <a href="https://github.com/liside/Kingpin" target=“blank”>Code Release</a>
</p>
</li>
</ul>
<ul>
<li><p>Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches<br />
Yuhao Zhang, Frank McQuillan, Nandish Jayaram, Nikhil Kak, Ekta Khanna, Orhan Kislal, Domino Valdano, and Arun Kumar<br />
VLDB 2021 | <a href="papers/2021_Cerebro-DS.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2021_Cerebro-DS.pdf" target=“blank”>TechReport</a> | <a href="https://youtu.be/SK9wTzO4K7M" target=“blank”>Talk video</a> | <a href="https://github.com/makemebitter/cerebro-ds/" target=“blank”>Code release</a>
</p>
</li>
</ul>
<ul>
<li><p>Intermittent Human-in-the-Loop Model Selection using Cerebro: A Demonstration<br />
Liangde Li, Supun Nakandala, and Arun Kumar<br />
VLDB 2021 Demo | <a href="papers/2021_Cerebro_VLDB_Demo.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2021_Intermittent_HIL_MS.pdf" target=“blank”>TechReport</a> | <a href="https://youtu.be/K3THQy5McXc" target=“blank”>Video</a>
</p>
</li>
</ul>
<ul>
<li><p>Towards A Polyglot Framework for Factorized ML<br />
David Justo, Shaoqing Yi, Lukas Stadler, Nadia Polikarpova, and Arun Kumar<br />
VLDB 2021 (Industrial Track) | <a href="papers/2021_Trinity_VLDB.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2021_Trinity.pdf" target=“blank”>TechReport</a> | <a href="https://www.youtube.com/watch?v=osvBmZs2MsM" target=“blank”>Talk video</a> | Code coming soon
</p>
</li>
</ul>
<ul>
<li><p>Towards Benchmarking Feature Type Inference for AutoML Platforms<br />
Vraj Shah, Jonathan Lacanlale, Premanand Kumar, Kevin Yang, and Arun Kumar<br />
ACM SIGMOD 2021 | <a href="papers/2021_SortingHat_SIGMOD.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2021_SortingHat.pdf" target=“blank”>TechReport</a> | Talk Videos: <a href="https://youtu.be/KAs-uU59AEM" target=“blank”>Short Talk</a> <a href="https://youtu.be/dpx74zQyU3k" target=“blank”>Long Talk</a> | <a href="https://github.com/pvn25/ML-Data-Prep-Zoo/tree/master/MLFeatureTypeInference" target=“blank”>Data, Code, and Pre-trained Models on GitHub</a> | <a href="https://github.com/pvn25/ML-Data-Prep-Zoo/tree/master/MLFeatureTypeInference/Library" target=“blank”>Python library</a>
</p>
</li>
</ul>
<ul>
<li><p>Automation of Data Prep, ML, and Data Science: New Cure or Snake Oil?<br />
Arun Kumar<br />
ACM SIGMOD 2021 Panel | <a href="papers/2021_Panel_SIGMOD.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>The CNN Hip Accelerometer Posture (CHAP) Method for Classifying Sitting Patterns from Hip Accelerometers: A Validation Study<br />
Mikael Anne Greenwood-Hickman, Supun Nakandala, Marta M. Jankowska, Fatima Tuz-Zahra, John Bellettiere, Jordan Carlson, Paul R. Hibbing, Jingjing Zou, Andrea Z. LaCroix, Arun Kumar, and Loki Natarajan<br />
Medicine and Science in Sports and Exercise Journal, 2021 | <a href="papers/2021_MSSE_CHAP.pdf" target=“blank”>Paper PDF</a> | <a href="https://adalabucsd.github.io/DeepPostures" target=“blank”>Code, Models, and Documentation</a>
</p>
</li>
</ul>
<ul>
<li><p>Application of Convolutional Neural Network Algorithms for Advancing Sedentary and Activity Bout Classification<br />
Supun Nakandala, Marta Jankowska, Fatima Tuz-Zahra, John Bellettiere, Jordan Carlson, Andrea LaCroix, Sheri Hartman, Dori Rosenberg, Jingjing Zou, Arun Kumar, and Loki Natarajan<br />
Journal for the Measurement of Physical Behaviour, 2021 | <a href="papers/2021_JMPB_CNN.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2021_JMPB_CNN.txt" target=“blank”>BibTeX</a> | <a href="https://adalabucsd.github.io/DeepPostures" target=“blank”>Code, Models, and Documentation</a>
</p>
</li>
</ul>
<ul>
<li><p>Cerebro: A Layered Data Platform for Scalable Deep Learning<br />
Arun Kumar, Supun Nakandala, Yuhao Zhang, Side Li, Advitya Gemawat, and Kabir Nagrecha<br />
CIDR 2021 (Vision paper) | <a href="papers/2021_Cerebro_CIDR.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2021_Cerebro_CIDR.txt" target=“blank”>BibTeX</a> | <a href="https://www.youtube.com/watch?v=8QfMvdlmdic" target=“blank”>Talk video</a>
</p>
</li>
</ul>
<ul>
<li><p>Cerebro: A Data System for Optimized Deep Learning Model Selection<br />
Supun Nakandala, Yuhao Zhang, and Arun Kumar<br />
VLDB 2020 | <a href="papers/2020_Cerebro_VLDB.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2020_Cerebro_VLDB.txt" target=“blank”>BibTeX</a> | <a href="papers/2020_Cerebro_VLDB_Errata.pdf" target=“blank”>Errata</a> | <a href="papers/TR_2020_Cerebro.pdf" target=“blank”>TechReport</a>
| Talk videos: <a href="https://www.youtube.com/watch?v=8PJic5FStGs" target=“blank”>Youtube</a> <a href="https://www.bilibili.com/video/av329339128?p=198" target=“blank”>Bilibili</a>
| <a href="https://adalabucsd.github.io/research-blog/cerebro.html" target=“blank”>Blog post</a> | <a href="https://databricks.com/session_na20/resource-efficient-deep-learning-model-selection-on-apache-spark" target=“blank”>SAIS Talk video</a>
| <a href="https://adalabucsd.github.io/cerebro-system/" target=“blank”>Source code and documentation</a>
</p>
</li>
</ul>
<ul>
<li><p>Panorama: A Data System for Unbounded Vocabulary Querying over Video<br />
Yuhao Zhang and Arun Kumar<br />
VLDB 2020 | <a href="http://www.vldb.org/pvldb/vol13/p477-zhang.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2019_Panorama_VLDB.txt" target=“blank”>BibTeX</a>|
<a href="papers/TR_2019_Panorama.pdf" target=“blank”>TechReport</a>
| <a href="https://docs.google.com/presentation/d/1a9xHmfP1Gwg03CnVP8OWWf20v1IZ9O5eIhfa0dEdkcc/edit?usp=sharing" target=“blank”>Talk slides</a> | Talk videos: <a href="https://www.youtube.com/watch?v=gAGOp0fbUcU" target=“blank”>Youtube</a> <a href="https://www.bilibili.com/video/av329339128?p=109" target=“blank”>Bilibili</a>
| <a href="https://adalabucsd.github.io/research-blog/panorama.html" target=“blank”>Blog post</a>
| <a href="https://github.com/makemebitter/Panorama-UCSD" target=“blank”>Source code on GitHub</a>
</p>
</li>
</ul>
<ul>
<li><p>Understanding and Benchmarking the Impact of GDPR on Database Systems<br />
Supreeth Shastri, Vinay Banakar, Melissa Wasserman, Arun Kumar, and Vijay Chidambaram<br />
VLDB 2020 | <a href="http://www.vldb.org/pvldb/vol13/p1064-shastri.pdf" target=“blank”>Paper PDF</a> | <a href="https://04e19274-9945-4166-b1be-95d42dc718a3.filesusr.com/ugd/13b079_1e10e6be8e7045ee9b26afdcdae6f60b.pdf" target=“blank”>TechReport</a> | <a href="https://www.gdprbench.org/" target=“blank”>Webpage</a>
| Talk videos: <a href="https://www.youtube.com/watch?v=1O8_fVmzUUc" target=“blank”>Youtube</a> <a href="https://www.bilibili.com/video/av329339128?p=188" target=“blank”>Bilibili</a>
</p>
</li>
</ul>
<ul>
<li><p>Query Optimization for Faster Deep CNN Explanations<br />
Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou<br />
ACM SIGMOD Record 2020 | <a href="papers/2020_Krypton_SIGMODRecord.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2020_Krypton_SIGMODRecord.txt" target=“blank”>BibTeX</a> <br />
<tt>ACM SIGMOD Research Highlights Award</tt>
</p>
</li>
</ul>
<ul>
<li><p>Incremental and Approximate Computations for Accelerating Deep CNN Inference<br />
Supun Nakandala, Kabir Nagrecha, Arun Kumar, and Yannis Papakonstantinou<br />
ACM TODS 2020 | <a href="papers/2020_Krypton_TODS.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2020_Krypton_TODS.txt" target=“blank”>BibTeX</a> <br />
<tt>Invited Paper</tt>
</p>
</li>
</ul>
<ul>
<li><p>Vista: Optimized System for Declarative Feature Transfer from Deep CNNs at Scale<br />
Supun Nakandala and Arun Kumar<br />
ACM SIGMOD 2020 | <a href="papers/2020_Vista_SIGMOD.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2020_Vista_SIGMOD.txt" target=“blank”>BibTeX</a> |
<a href="papers/TR_2020_Vista.pdf" target=“blank”>TechReport</a> | <a href="https://adalabucsd.github.io/research-blog/research/2020/06/14/vista.html" target=“blank”>Blog post</a> | <a href="https://www.youtube.com/watch?v=nmfUFCDthAo&feature=youtu.be" target=“blank”>Talk Video</a> | <a href="https://github.com/ADALabUCSD/Vista" target=“blank”>Code</a>
</p>
</li>
</ul>
<ul>
<li><p>SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data<br />
Vraj Shah, Side Li, Arun Kumar, and Lawrence Saul<br />
ACM SIGMOD 2020 | <a href="papers/2020_SpeakQL_SIGMOD.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2020_SpeakQL_SIGMOD.txt" target=“blank”>BibTeX</a>|
<a href="papers/TR_2020_SpeakQL.pdf" target=“blank”>TechReport</a> |
<a href="https://adalabucsd.github.io/research-blog/research/2020/06/14/speakql.html" target=“blank”>Blog post</a> |
<a href="https://drive.google.com/drive/folders/1tSxUTu2A7qy8fPtB81RnwkyakgykZ3iw?usp=sharing" target=“blank”>Dataset on Drive</a>
</p>
</li>
</ul>
<ul>
<li><p>Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations<br />
Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou<br />
ACM SIGMOD 2019 | <a href="papers/2019_Krypton_SIGMOD.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2019_Krypton_SIGMOD.txt" target=“blank”>BibTeX</a> | <a href="papers/TR_2019_Krypton.pdf" target=“blank”>TechReport</a> | <a href="https://adalabucsd.github.io/research-blog/research/2019/06/07/krypton.html" target=“blank”>Blog post</a> | <a href="https://av.tib.eu/media/42901" target=“blank”>Talk Video</a> <br />
<tt>Honorable Mention for Best Paper Award</tt>
</p>
</li>
</ul>
<ul>
<li><p>Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra<br />
Side Li, Lingjiao Chen, and Arun Kumar<br />
ACM SIGMOD 2019 | <a href="papers/2019_MorpheusFI_SIGMOD.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2019_MorpheusFI_SIGMOD.txt" target=“blank”>BibTeX</a> | <a href="https://github.com/liside/MorpheusFI" target=“blank”>Code and Data on Github</a>
</p>
</li>
</ul>
<ul>
<li><p>Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent<br />
Fengan Li, Lingjiao Chen, Yijing Zeng, Arun Kumar, Jeffrey Naughton, Jignesh Patel, and Xi Wu<br />
ACM SIGMOD 2019 | <a href="papers/2019_TOC_SIGMOD.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2019_TOC.pdf" target=“blank”>TechReport</a> | <a href="https://github.com/fenganli/toc-release-code" target=“blank”>Code on GitHub</a>
</p>
</li>
</ul>
<ul>
<li><p>Model-based Pricing for Machine Learning in a Data Marketplace<br />
Lingjiao Chen, Paraschos Koutris, and Arun Kumar<br />
ACM SIGMOD 2019 | <a href="papers/2019_Nimbus_SIGMOD.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2018_Nimbus.pdf" target=“blank”>TechReport</a>
</p>
</li>
</ul>
<ul>
<li><p>Cerebro: Efficient and Reproducible Model Selection on Deep Learning Systems<br />
Supun Nakandala, Yuhao Zhang, and Arun Kumar<br />
ACM SIGMOD 2019 DEEM Workshop | <a href="papers/2019_Cerebro_DEEM.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2019_Cerebro_DEEM.txt" target=“blank”>BibTeX</a> | <a href="papers/TR_2019_Cerebro.pdf" target=“blank”>TechReport</a>
| <a href="https://adalabucsd.github.io/research-blog/cerebro.html" target=“blank”>Blog post</a>
</p>
</li>
</ul>
<ul>
<li><p>The ML Data Prep Zoo: Towards Semi-Automatic Data Preparation for ML<br />
Vraj Shah and Arun Kumar<br />
ACM SIGMOD 2019 DEEM Workshop | <a href="papers/2019_DataPrepZoo_DEEM.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2019_SortingHat_SIGMOD.txt" target=“blank”>BibTeX</a>| <a href="papers/TR_2019_DataPrepZoo.pdf" target=“blank”>TechReport</a>
| <a href="https://adalabucsd.github.io/research-blog/research/2019/06/21/mldataprepzoo.html" target=“blank”>Blog post</a>
| <a href="https://github.com/pvn25/ML-Data-Prep-Zoo" target=“blank”>Data Prep Zoo Repository on GitHub</a>
</p>
</li>
</ul>
<ul>
<li><p>Demonstration of SpeakQL: Speech-driven Multimodal Querying of Structured Data<br />
Vraj Shah, Side Li, Kevin Yang, Arun Kumar, and Lawrence Saul<br />
ACM SIGMOD 2019 Demo | <a href="papers/2019_SpeakQL_SIGMOD.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2019_SpeakQL_SIGMOD.txt" target=“blank”>BibTeX</a> | <a href="https://vimeo.com/295693078" target=“blank”>Video</a>
</p>
</li>
</ul>
<ul>
<li><p>Demonstration of Nimbus: Model-based Pricing for Machine Learning in a Data Marketplace<br />
Lingjiao Chen, Hongyi Wang, Leshang Chen, Paraschos Koutris, and Arun Kumar<br />
ACM SIGMOD 2019 Demo | <a href="papers/2019_NimbusDemo_SIGMOD.pdf" target=“blank”>Paper PDF</a> | Video coming soon
</p>
</li>
</ul>
<ul>
<li><p>Demonstration of Krypton: Optimized CNN Inference for Occlusion-based Deep CNN Explanations<br />
Allen Ordookhanians, Xin Li, Supun Nakandala, and Arun Kumar<br />
VLDB 2019 | <a href="http://www.vldb.org/pvldb/vol12/p1894-ordookhanians.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2019_Krypton_VLDB.txt" target=“blank”>BibTeX</a> | <a href="https://www.youtube.com/watch?v=1OWddbd4n6Y&feature=youtu.be" target=“blank”>Video</a>
</p>
</li>
</ul>
<ul>
<li><p>Demonstration of Krypton: Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations<br />
Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou<br />
SysML 2019 Demo | <a href="papers/2019_Krypton_SysML.pdf" target=“blank”>Paper PDF</a> | <a href="https://www.youtube.com/watch?v=1OWddbd4n6Y&feature=youtu.be" target=“blank”>Video</a>
</p>
</li>
</ul>
<ul>
<li><p>Data Management in Machine Learning Systems<br />
Matthias Boehm, Arun Kumar, and Jun Yang<br />
Synthesis Lectures on Data Management, Morgan & Claypool Publishers (Book), 2019 |
<a href="https://www.morganclaypool.com/doi/10.2200/S00895ED1V01Y201901DTM057" target=“blank”>PDF</a> |
<a href="https://link.springer.com/book/10.1007/978-3-031-01869-5" target=“blank”>Order hard copy</a>
</p>
</li>
</ul>
<ul>
<li><p>Hierarchical and Distributed Machine Learning Inference Beyond the Edge<br />
Anthony Thomas, Yunhui Guo, Yeseong Kim, Baris Aksanli, Arun Kumar and Tajana Rosing<br />
IEEE ICNSC 2019 | <a href="papers/2019_ICNSC.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Predicting Eating Events in Free Living Individuals<br />
Jiayi Wang, Jiue-An Yang, Supun Nakandala, Arun Kumar and Marta M. Jankowska<br />
eScience 2019 Conference (Poster)
</p>
</li>
</ul>
<ul>
<li><p>A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics<br />
Anthony Thomas and Arun Kumar<br />
VLDB 2018/2019 | <a href="papers/2019_SLAB_VLDB.pdf" target=“blank”>Paper PDF</a> |
<a href="papers/TR_2018_SLAB.pdf" target=“blank”>TechReport</a> | <a href="slab.html" target=“blank”>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>In-RDBMS Hardware Acceleration of Advanced Analytics<br />
Divya Mahajan, Joon Kyung Kim, Jacob Sacks, Adel Ardalan, Arun Kumar, and Hadi Esmaeilzadeh<br />
VLDB 2018 | <a href="papers/2018_DANA_VLDB.pdf" target=“blank”>Paper PDF</a> |
<a href="http://act-lab.org/artifacts/dana/addendum.pdf" target=“blank”>Addendum</a>
</p>
</li>
</ul>
<ul>
<li><p>Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers?<br />
Vraj Shah, Arun Kumar, and Xiaojin Zhu.<br />
VLDB 2018 |
<a href="papers/2018_Hamlet_VLDB.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2018_Hamlet_VLDB.txt" target=“blank”>BibTeX</a>|
<a href="papers/TR_2017_HamletPlusPlus.pdf" target=“blank”>TechReport</a> |
<a href="hamlet.html" target=“blank”>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Materialization Trade-offs for Feature Transfer from Deep CNNs for Multimodal Data Analytics<br />
Supun Nakandala and Arun Kumar<br />
SysML 2018 Short paper/poster | <a href="papers/2018_Vista_SysML.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Towards Linear Algebra over Normalized Data<br />
Lingjiao Chen, Arun Kumar, Jeffrey Naughton, and Jignesh Patel<br />
VLDB 2017 |
<a href="papers/2017_Morpheus_VLDB.pdf" target=“blank”>Paper PDF</a> |
<a href="papers/TR_2017_Morpheus.pdf" target=“blank”>TechReport</a> |
<a href="morpheus.html" target=“blank”>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics<br />
Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri, Somesh Jha, and Jeffrey Naughton<br />
ACM SIGMOD 2017 |
<a href="papers/2017_BismarckBoltOnDP_SIGMOD.pdf" target=“blank”>Paper PDF</a> |
<a href="papers/TR_2017_BismarckBoltOnDP.pdf" target=“blank”>TechReport</a>
</p>
</li>
</ul>
<ul>
<li><p>Data Management in Machine Learning: Challenges, Techniques, and Systems<br />
Arun Kumar, Matthias Boehm, and Jun Yang<br />
ACM SIGMOD 2017 Tutorial |
<a href="papers/2017_Tutorial_SIGMOD.pdf" target=“blank”>Paper PDF</a> |
<a href="papers/Slides_2017_Tutorial_SIGMOD.pdf" target=“blank”>Slidedeck PDF</a> |
<a href="https://www.youtube.com/watch?v=U8J0Dd_Z5wo" target=“blank”>Video of tutorial on Youtube</a>
</p>
</li>
</ul>
<ul>
<li><p>SpeakQL: Towards Speech-driven Multi-modal Querying<br />
Dharmil Chandarana, Vraj Shah, Arun Kumar, and Lawrence Saul<br />
ACM SIGMOD 2017 HILDA Workshop |
<a href="papers/2017_SpeakQL_HILDA.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2017_SpeakQL_SIGMOD.txt" target=“blank”>BibTeX</a>
</p>
</li>
</ul>
<ul>
<li><p>Model-based Pricing: Do Not Pay for More than What You Learn!<br />
Lingjiao Chen, Paraschos Koutris, and Arun Kumar<br />
ACM SIGMOD 2017 DEEM Workshop |
<a href="papers/2017_Nimbus_DEEM.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Cerebro: A System to Manage Deep Learning for Relational Data Analytics<br />
Arun Kumar<br />
CIDR 2017 Abstract |
<a href="papers/2017_Cerebro_CIDR.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>To Join or Not to Join? Thinking Twice about Joins before Feature Selection<br />
Arun Kumar, Jeffrey Naughton, Jignesh M. Patel, and Xiaojin Zhu<br />
ACM SIGMOD 2016 |
<a href="papers/2016_Hamlet_SIGMOD.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2016_Hamlet_SIGMOD.txt" target=“blank”>BibTeX</a>|
<a href="papers/TR_2016_Hamlet.pdf" target=“blank”>TechReport</a> |
<a href="hamlet.html" target=“blank”>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Materialization Optimizations for Feature Selection Workloads<br />
Ce Zhang, Arun Kumar, and Christopher Re<br />
ACM TODS 2016 (Invited) | <a href="papers/2016_Columbus_TODS.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Model Selection Management Systems: The Next Frontier of Advanced Analytics<br />
Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M. Patel<br />
ACM SIGMOD Record Dec 2015 Vision Track |
<a href="papers/2015_MSMS_SIGMODRecord.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Demonstration of Santoku: Optimizing Machine Learning over Normalized Data<br />
Arun Kumar, Mona Jalal, Boqun Yan, Jeffrey Naughton, and Jignesh M. Patel<br />
VLDB 2015 Demo |
<a href="papers/2015_Santoku_VLDB.pdf" target=“blank”>Paper PDF</a> |
<a href="orion.html" target=“blank”>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Learning Generalized Linear Models Over Normalized Data<br />
Arun Kumar, Jeffrey Naughton, and Jignesh M. Patel<br />
ACM SIGMOD 2015 |
<a href="papers/2015_Orion_SIGMOD.pdf" target=“blank”>Paper PDF</a> |
<a href="orion.html" target=“blank”>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>Materialization Optimizations for Feature Selection Workloads<br />
Ce Zhang, Arun Kumar, and Christopher Re<br />
ACM SIGMOD 2014 |
<a href="papers/2014_Columbus_SIGMOD.pdf" target=“blank”>Paper PDF</a><br />
<tt>Best Paper Award; Invited to ACM TODS 2016</tt>
</p>
</li>
</ul>
<ul>
<li><p>Distributed and Scalable PCA in the Cloud<br />
Arun Kumar, Nikos Karampatziakis, Paul Mineiro, Markus Weimer, and Vijay Narayanan<br />
NIPS BigLearn 2013 |
<a href="papers/2013_PCAonREEF_BigLearn.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Feature Selection in Enterprise Analytics: A Demonstration using an R-based Data Analytics System<br />
Pradap Konda, Arun Kumar, Christopher Ré, and Vaishnavi Sashikanth<br />
VLDB 2013 Demo |
<a href="papers/2013_Columbus_VLDB.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Hazy: Making it Easier to Build and Maintain Big-data Analytics<br />
Arun Kumar, Feng Niu, and Christopher Re<br />
ACM Queue 2013 |
<a href="http://queue.acm.org/detail.cfm?id=2431055" target=“blank”>Article</a><br />
<tt>Invited to the Communications of the ACM March 2013</tt>
</p>
</li>
</ul>
<ul>
<li><p>Brainwash: A Data System for Feature Engineering<br />
Michael Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael Cafarella, Arun Kumar, Feng Niu, Yongjoo Park, Christopher Re, and Ce Zhang<br />
CIDR 2013 Vision Track |
<a href="papers/2013_Brainwash_CIDR.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Towards a Unified Architecture for in-RDBMS Analytics<br />
Xixuan Feng*, Arun Kumar*, Benjamin Recht, and Christopher Re<br />
ACM SIGMOD 2012 |
<a href="papers/2012_Bismarck_SIGMOD.pdf" target=“blank”>Paper PDF</a> |
<a href="papers/TR_2012_Bismarck.pdf" target=“blank”>TechReport</a> |
<a href="http://i.stanford.edu//hazy/victor/bismarck-download/" target=“blank”>Code and Data</a>
</p>
</li>
</ul>
<ul>
<li><p>The MADlib Analytics Library or MAD Skills, the SQL<br />
Joseph M. Hellerstein, Christopher Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar<br />
VLDB 2012 Industrial Track |
<a href="papers/2012_MADlib_VLDB.pdf" target=“blank”>Paper PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Probabilistic Management of OCR Data using an RDBMS<br />
Arun Kumar, and Christopher Re<br />
VLDB 2012 |
<a href="papers/2012_Staccato_VLDB.pdf" target=“blank”>Paper PDF</a> |
<a href="papers/TR_2012_Staccato.pdf" target=“blank”>TechReport</a> |
<a href="http://i.stanford.edu/hazy/staccato/download/" target=“blank”>Code and Data</a>
</p>
</li>
</ul>
<h2>Manuscripts and Articles</h2>
<ul>
<li><p>Arun Kumar's contribution to “Reminiscences on Influential Papers”<br />
Pinar Tozun<br />
ACM SIGMOD Record 2023 | <a href="https://sigmodrecord.org/publications/sigmodRecord/2212/pdfs/06_Reminiscences_Rabl.pdf" target=“blank”>Article PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Design and Evaluation of an SQL-Based Dialect for Spoken Querying<br />
Kyle Luoma and Arun Kumar<br />
<a href="papers/TR_2023_SpeakQL_Dialect.pdf" target=“blank”>TechReport</a>
</p>
</li>
</ul>
<ul>
<li><p>Hydra: A Data System for Large Multi-Model Deep Learning<br />
Kabir Nagrecha and Arun Kumar<br />
<a href="https://arxiv.org/abs/2110.08633" target=“blank”>TechReport</a> | <a href="https://github.com/knagrecha/hydra" target=“blank”>Code release</a>
</p>
</li>
</ul>
<ul>
<li><p>Integrating Cerebro with Ray<br />
Abhishek Gupta and Rishikesh Ingale<br />
<a href="papers/TR_2022_CSE234_CerebroRay.pdf" target=“blank”>CSE 234 Project TechReport</a> |
<a href="https://github.com/Abhishek2304/Cerebro-System-Ray" target=“blank”>Code Release</a>
</p>
</li>
</ul>
<ul>
<li><p>Integrating Cerebro with Dask<br />
Vignesh Nanda Kumar and Pratik Ratadiya<br />
<a href="papers/TR_2022_CSE234_CerebroDask.pdf" target=“blank”>CSE 234 Project TechReport</a> |
<a href="https://github.com/VigneshN1997/cerebro-system" target=“blank”>Code Release</a>
</p>
</li>
</ul>
<ul>
<li><p>Categorical Data Deduplication<br />
Soham Pachpande and Gehan Chopade<br />
<a href="papers/TR_2022_CSE234_CategDedup.pdf" target=“blank”>CSE 234 Project TechReport</a> |
<a href="https://github.com/sohampachpande/data-deduplication" target=“blank”>Code, Data, and Pre-trained Models on GitHub</a>
</p>
</li>
</ul>
<ul>
<li><p>Bringing ML-based Feature Type Inference to OpenML<br />
Ryan Tran and Victor Zhu<br />
<a href="papers/TR_2022_CSE234_OpenML.pdf" target=“blank”>CSE 234 Project TechReport</a> |
<a href="https://github.com/bobotran/SortingHatLib" target=“blank”>Code Release on GitHub</a> |
<a href="https://pypi.org/project/sortinghatinf/" target=“blank”>Package on PyPi</a>
</p>
</li>
</ul>
<ul>
<li><p>Letter from the Rising Star Award Winner<br />
Arun Kumar<br />
IEEE Data Engineering Bulletin, June 2021 | <a href="http://sites.computer.org/debull/A21june/p94.pdf" target=“blank”>PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Improving Feature Type Inference Accuracy of TFDV with SortingHat<br />
Vraj Shah, Kevin Yang, and Arun Kumar<br />
<a href="papers/TR_2020_TFDV.pdf" target=“blank”>TechReport</a>
</p>
</li>
</ul>
<ul>
<li><p>ML/AI Systems and Applications: Is the SIGMOD/VLDB Community Losing Relevance?<br />
Arun Kumar<br />
Blog post on the official ACM SIGMOD Blog, 2018 |
<a href="http://wp.sigmod.org/?p=2454" target=“blank”>Webpage</a>
</p>
</li>
</ul>
<ul>
<li><p>Advice from PhD to Early Career<br />
Arun Kumar<br />
ACM SIGMOD 2018 New Researcher Symposium Talk |
<a href="https://sigmod2018.org/nrs_slides/kumar.pdf" target=“blank”>Slides</a>
</p>
</li>
</ul>
<ul>
<li><p>A Survey of the Existing Landscape of ML Systems<br />
Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M. Patel<br />
UW-Madison Technical Report TR1827 |
<a href="papers/TR_2015_MSMSSurvey.pdf" target=“blank”>PDF</a>
</p>
</li>
</ul>
<h2>Theses, and Dissertations</h2>
<ul>
<li><p>Simplifying Data Preparation for Machine Learning on Tabular Data<br />
Vraj Shah. PhD Dissertation. UC San Diego. 2022 |
<a href="papers/Dissertation_VrajShah.pdf" target=“blank”>PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Query Optimizations for Deep Learning Systems<br />
Supun Nakandala. PhD Dissertation. UC San Diego. 2022 |
<a href="papers/Dissertation_SupunNakandala.pdf" target=“blank”>PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Efficient Systems for Advanced Data Analytics<br />
Liangde Li. MS Thesis. UC San Diego. 2022 |
<a href="papers/Thesis_LiangdeLi.pdf" target=“blank”>PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Write once, rewrite everywhere: A Unified Framework for Factorized Machine Learning<br />
David Justo. MS Thesis. UC San Diego. 2019 |
<a href="papers/Thesis_DavidJusto.pdf" target=“blank”>PDF</a>
</p>
</li>
</ul>
<ul>
<li><p>Learning Over Joins<br />
Arun Kumar. PhD Dissertation. UW-Madison. 2016 |
<a href="papers/Dissertation_ArunKumar.pdf" target=“blank”>PDF</a> |
<a href="http://cseweb.ucsd.edu/csevideo/Arun.Kumar.mp4" target=“blank”>Video of job talk at UCSD</a><br />
<tt>Wisconsin CS 2016 Graduate Student Research Award for best dissertation research</tt>
</p>
</li>
</ul>
<div id="footer">
<div id="footer-text">
Page generated 2024-07-03 22:01:46 PDT, by <a href="https://github.com/wsshin/jemdoc_mathjax" target="blank">jemdoc+MathJax</a>.
</div>
</div>
</td>
</tr>
</table>
</body>
</html>