-
Notifications
You must be signed in to change notification settings - Fork 3
/
publications.jemdoc
404 lines (318 loc) · 22.6 KB
/
publications.jemdoc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
# jemdoc: menu{MENU2}{publications.html}
= ADA Lab @ UCSD
== Peer-reviewed Publications
- How do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses\n
Vraj Shah, Thomas Parashos, and Arun Kumar\n
VLDB 2024 | [papers/2024_CategDedup_VLDB.pdf Paper PDF] | [papers/TR_2023_CategDedup.pdf TechReport] | Code and Data coming soon
- Saturn: An Optimized Data System for Multi-Large-Model Deep Learning Workloads\n
Kabir Nagrecha and Arun Kumar\n
VLDB 2024 | [papers/2024_Saturn_VLDB.pdf Paper PDF] | [papers/TR_2023_Saturn.pdf TechReport] | [https://saturn.readthedocs.io/en/latest/index.html Code and Docs Release]
- Lotan: Bridging the Gap between GNNs and Scalable Graph Analytics Engines\n
Yuhao Zhang and Arun Kumar\n
VLDB 2023 | [papers/2023_Lotan_VLDB.pdf Paper PDF] | [papers/TR_2023_Lotan.pdf TechReport] | [https://github.com/makemebitter/lotan Code Release] | [https://adalabucsd.github.io/research-blog/lotan.html Blog post]
- Low movement, deep-learned sitting patterns, and sedentary behavior in the International Study of Childhood Obesity, Lifestyle, and the Environment (ISCOLE)\n
Paul R. Hibbing et al. (12 authors)\n
International Journal of Obesity 2023 | [papers/2023_ISCOLE_IJO.pdf Paper PDF]
- Database-Aware ASR Error Correction for Speech-to-SQL Parsing\n
Yutong Shao, Arun Kumar, and Ndapandula Nakashole\n
IEEE ICASSP 2023 | [papers/2023_SpeakQL_ICASSP.pdf Paper PDF]
- CHAP-child: An open source method for estimating sit-to-stand transitions and sedentary bout patterns from hip accelerometers among children\n
Jordan A. Carlson et al. (15 authors)\n
International Journal of Behavioral Nutrition and Physical Activity 2022 | [papers/2022_JBNPA_CHAP.pdf Paper PDF] | [https://adalabucsd.github.io/DeepPostures Code, Models, and Documentation]
- Structured Data Representation in Natural Language Interfaces\n
Yutong Shao, Arun Kumar, and Ndapandula Nakashole\n
IEEE Data Engineering Bulletin 2022 (Invited) | [papers/2022_SpeakQL_DataEngBulletin.pdf Paper PDF]
- CHAP-Adult: A Reliable and Valid Algorithm to Classify Sitting and Measure Sitting Patterns Using Data from Hip-Worn Accelerometers in Adults Aged 35\+\n
John Bellettiere et al. (14 authors)\n
Journal for the Measurement of Physical Behaviour 2022 | [papers/2022_JMPB_CHAP.pdf PDF] | [https://adalabucsd.github.io/DeepPostures Code, Models, and Documentation]
- VLDB Scalable Data Science Category: The Inaugural Year\n
Arun Kumar, Alon Halevy, and Nesime Tatbul\n
ACM SIGMOD Record 2022 | [papers/2022_SDS_SIGMODRecord.pdf Paper PDF]
- Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets\n
Supun Nakandala and Arun Kumar\n
SIGMOD 2022 | [papers/2022_Nautilus_SIGMOD.pdf Paper PDF] | [papers/TR_2021_Nautilus.pdf TechReport] | [https://github.com/ADALabUCSD/Nautilus Code Release]
- VLDB Panel Summary: "The Future of Data(base) Education: Is the Cow Book Dead?"\n
Zachary Ives, Johannes Gehrke, Jana Giceva, Arun Kumar, and Rachel Pottinger\n
ACM SIGMOD Record 2021 | [papers/2021_DBEd_SIGMODRecord.pdf Paper PDF]
- Some Damaging Delusions of Deep Learning Practice (and How to Avoid Them)\n
Arun Kumar, Supun Nakandala, and Yuhao Zhang\n
KDD 2021 Deep Learning Day | [papers/2021_DLDelusions_KDD.pdf Extended Abstract PDF]
| [papers/2021_DLDelusions_KDD_Slides.pdf Talk slides]
| [https://www.youtube.com/watch?v=UP9__WsfSuc Talk video]
- Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning\n
Side Li and Arun Kumar\n
VLDB 2021 | [papers/2021_Kingpin_VLDB.pdf Paper PDF] | [papers/TR_2021_Kingpin.pdf TechReport] | [https://www.youtube.com/watch?v=OlTknBfBmvM Talk video] | [https://github.com/liside/Kingpin Code Release]
- Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches\n
Yuhao Zhang, Frank McQuillan, Nandish Jayaram, Nikhil Kak, Ekta Khanna, Orhan Kislal, Domino Valdano, and Arun Kumar\n
VLDB 2021 | [papers/2021_Cerebro-DS.pdf Paper PDF] | [papers/TR_2021_Cerebro-DS.pdf TechReport] | [https://youtu.be/SK9wTzO4K7M Talk video] | [https://github.com/makemebitter/cerebro-ds/ Code release]
- Intermittent Human-in-the-Loop Model Selection using Cerebro: A Demonstration\n
Liangde Li, Supun Nakandala, and Arun Kumar\n
VLDB 2021 Demo | [papers/2021_Cerebro_VLDB_Demo.pdf Paper PDF] | [papers/TR_2021_Intermittent_HIL_MS.pdf TechReport] | [https://youtu.be/K3THQy5McXc Video]
- Towards A Polyglot Framework for Factorized ML\n
David Justo, Shaoqing Yi, Lukas Stadler, Nadia Polikarpova, and Arun Kumar\n
VLDB 2021 (Industrial Track) | [papers/2021_Trinity_VLDB.pdf Paper PDF] | [papers/TR_2021_Trinity.pdf TechReport] | [https://www.youtube.com/watch?v=osvBmZs2MsM Talk video] | Code coming soon
- Towards Benchmarking Feature Type Inference for AutoML Platforms\n
Vraj Shah, Jonathan Lacanlale, Premanand Kumar, Kevin Yang, and Arun Kumar\n
ACM SIGMOD 2021 | [papers/2021_SortingHat_SIGMOD.pdf Paper PDF] | [papers/TR_2021_SortingHat.pdf TechReport] | Talk Videos: [https://youtu.be/KAs-uU59AEM Short Talk] [https://youtu.be/dpx74zQyU3k Long Talk] | [https://github.com/pvn25/ML-Data-Prep-Zoo/tree/master/MLFeatureTypeInference Data, Code, and Pre-trained Models on GitHub] | [https://github.com/pvn25/ML-Data-Prep-Zoo/tree/master/MLFeatureTypeInference/Library Python library]
- Automation of Data Prep, ML, and Data Science: New Cure or Snake Oil?\n
Arun Kumar\n
ACM SIGMOD 2021 Panel | [papers/2021_Panel_SIGMOD.pdf Paper PDF]
- The CNN Hip Accelerometer Posture (CHAP) Method for Classifying Sitting Patterns from Hip Accelerometers: A Validation Study\n
Mikael Anne Greenwood-Hickman, Supun Nakandala, Marta M. Jankowska, Fatima Tuz-Zahra, John Bellettiere, Jordan Carlson, Paul R. Hibbing, Jingjing Zou, Andrea Z. LaCroix, Arun Kumar, and Loki Natarajan\n
Medicine and Science in Sports and Exercise Journal, 2021 | [papers/2021_MSSE_CHAP.pdf Paper PDF] | [https://adalabucsd.github.io/DeepPostures Code, Models, and Documentation]
- Application of Convolutional Neural Network Algorithms for Advancing Sedentary and Activity Bout Classification\n
Supun Nakandala, Marta Jankowska, Fatima Tuz-Zahra, John Bellettiere, Jordan Carlson, Andrea LaCroix, Sheri Hartman, Dori Rosenberg, Jingjing Zou, Arun Kumar, and Loki Natarajan\n
Journal for the Measurement of Physical Behaviour, 2021 | [papers/2021_JMPB_CNN.pdf Paper PDF] and [papers/2021_JMPB_CNN.txt BibTeX] | [https://adalabucsd.github.io/DeepPostures Code, Models, and Documentation]
- Cerebro: A Layered Data Platform for Scalable Deep Learning\n
Arun Kumar, Supun Nakandala, Yuhao Zhang, Side Li, Advitya Gemawat, and Kabir Nagrecha\n
CIDR 2021 (Vision paper) | [papers/2021_Cerebro_CIDR.pdf Paper PDF] and [papers/2021_Cerebro_CIDR.txt BibTeX] | [https://www.youtube.com/watch?v=8QfMvdlmdic Talk video]
- Cerebro: A Data System for Optimized Deep Learning Model Selection\n
Supun Nakandala, Yuhao Zhang, and Arun Kumar\n
VLDB 2020 | [papers/2020_Cerebro_VLDB.pdf Paper PDF] and [papers/2020_Cerebro_VLDB.txt BibTeX] | [papers/2020_Cerebro_VLDB_Errata.pdf Errata] | [papers/TR_2020_Cerebro.pdf TechReport]
| Talk videos: [https://www.youtube.com/watch?v=8PJic5FStGs Youtube] [https://www.bilibili.com/video/av329339128?p=198 Bilibili]
| [https://adalabucsd.github.io/research-blog/cerebro.html Blog post] | [https://databricks.com/session_na20/resource-efficient-deep-learning-model-selection-on-apache-spark SAIS Talk video]
| [https://adalabucsd.github.io/cerebro-system/ Source code and documentation]
- Panorama: A Data System for Unbounded Vocabulary Querying over Video\n
Yuhao Zhang and Arun Kumar\n
VLDB 2020 | [http://www.vldb.org/pvldb/vol13/p477-zhang.pdf Paper PDF] and [papers/2019_Panorama_VLDB.txt BibTeX]|
[papers/TR_2019_Panorama.pdf TechReport]
| [https://docs.google.com/presentation/d/1a9xHmfP1Gwg03CnVP8OWWf20v1IZ9O5eIhfa0dEdkcc/edit?usp=sharing Talk slides] | Talk videos: [https://www.youtube.com/watch?v=gAGOp0fbUcU Youtube] [https://www.bilibili.com/video/av329339128?p=109 Bilibili]
| [https://adalabucsd.github.io/research-blog/panorama.html Blog post]
| [https://github.com/makemebitter/Panorama-UCSD Source code on GitHub]
- Understanding and Benchmarking the Impact of GDPR on Database Systems\n
Supreeth Shastri, Vinay Banakar, Melissa Wasserman, Arun Kumar, and Vijay Chidambaram\n
VLDB 2020 | [http://www.vldb.org/pvldb/vol13/p1064-shastri.pdf Paper PDF] | [https://04e19274-9945-4166-b1be-95d42dc718a3.filesusr.com/ugd/13b079_1e10e6be8e7045ee9b26afdcdae6f60b.pdf TechReport] | [https://www.gdprbench.org/ Webpage]
| Talk videos: [https://www.youtube.com/watch?v=1O8_fVmzUUc Youtube] [https://www.bilibili.com/video/av329339128?p=188 Bilibili]
- Query Optimization for Faster Deep CNN Explanations\n
Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou\n
ACM SIGMOD Record 2020 | [papers/2020_Krypton_SIGMODRecord.pdf Paper PDF] and [papers/2020_Krypton_SIGMODRecord.txt BibTeX] \n
+ACM SIGMOD Research Highlights Award+
- Incremental and Approximate Computations for Accelerating Deep CNN Inference\n
Supun Nakandala, Kabir Nagrecha, Arun Kumar, and Yannis Papakonstantinou\n
ACM TODS 2020 | [papers/2020_Krypton_TODS.pdf Paper PDF] and [papers/2020_Krypton_TODS.txt BibTeX] \n
+Invited Paper+
- Vista: Optimized System for Declarative Feature Transfer from Deep CNNs at Scale\n
Supun Nakandala and Arun Kumar\n
ACM SIGMOD 2020 | [papers/2020_Vista_SIGMOD.pdf Paper PDF] and [papers/2020_Vista_SIGMOD.txt BibTeX] |
[papers/TR_2020_Vista.pdf TechReport] | [https://adalabucsd.github.io/research-blog/research/2020/06/14/vista.html Blog post] | [https://www.youtube.com/watch?v=nmfUFCDthAo&feature=youtu.be Talk Video] | [https://github.com/ADALabUCSD/Vista Code]
- SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data\n
Vraj Shah, Side Li, Arun Kumar, and Lawrence Saul\n
ACM SIGMOD 2020 | [papers/2020_SpeakQL_SIGMOD.pdf Paper PDF] and [papers/2020_SpeakQL_SIGMOD.txt BibTeX]|
[papers/TR_2020_SpeakQL.pdf TechReport] |
[https://adalabucsd.github.io/research-blog/research/2020/06/14/speakql.html Blog post] |
[https://drive.google.com/drive/folders/1tSxUTu2A7qy8fPtB81RnwkyakgykZ3iw?usp=sharing Dataset on Drive]
- Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations\n
Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou\n
ACM SIGMOD 2019 | [papers/2019_Krypton_SIGMOD.pdf Paper PDF] and [papers/2019_Krypton_SIGMOD.txt BibTeX] | [papers/TR_2019_Krypton.pdf TechReport] | [https://adalabucsd.github.io/research-blog/research/2019/06/07/krypton.html Blog post] | [https://av.tib.eu/media/42901 Talk Video] \n
+Honorable Mention for Best Paper Award+
- Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra\n
Side Li, Lingjiao Chen, and Arun Kumar\n
ACM SIGMOD 2019 | [papers/2019_MorpheusFI_SIGMOD.pdf Paper PDF] and [papers/2019_MorpheusFI_SIGMOD.txt BibTeX] | [https://github.com/liside/MorpheusFI Code and Data on Github]
- Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent\n
Fengan Li, Lingjiao Chen, Yijing Zeng, Arun Kumar, Jeffrey Naughton, Jignesh Patel, and Xi Wu\n
ACM SIGMOD 2019 | [papers/2019_TOC_SIGMOD.pdf Paper PDF] | [papers/TR_2019_TOC.pdf TechReport] | [https://github.com/fenganli/toc-release-code Code on GitHub]
- Model-based Pricing for Machine Learning in a Data Marketplace\n
Lingjiao Chen, Paraschos Koutris, and Arun Kumar\n
ACM SIGMOD 2019 | [papers/2019_Nimbus_SIGMOD.pdf Paper PDF] | [papers/TR_2018_Nimbus.pdf TechReport]
- Cerebro: Efficient and Reproducible Model Selection on Deep Learning Systems\n
Supun Nakandala, Yuhao Zhang, and Arun Kumar\n
ACM SIGMOD 2019 DEEM Workshop | [papers/2019_Cerebro_DEEM.pdf Paper PDF] and [papers/2019_Cerebro_DEEM.txt BibTeX] | [papers/TR_2019_Cerebro.pdf TechReport]
| [https://adalabucsd.github.io/research-blog/cerebro.html Blog post]
- The ML Data Prep Zoo: Towards Semi-Automatic Data Preparation for ML\n
Vraj Shah and Arun Kumar\n
ACM SIGMOD 2019 DEEM Workshop | [papers/2019_DataPrepZoo_DEEM.pdf Paper PDF] and [papers/2019_SortingHat_SIGMOD.txt BibTeX]| [papers/TR_2019_DataPrepZoo.pdf TechReport]
| [https://adalabucsd.github.io/research-blog/research/2019/06/21/mldataprepzoo.html Blog post]
| [https://github.com/pvn25/ML-Data-Prep-Zoo Data Prep Zoo Repository on GitHub]
- Demonstration of SpeakQL: Speech-driven Multimodal Querying of Structured Data\n
Vraj Shah, Side Li, Kevin Yang, Arun Kumar, and Lawrence Saul\n
ACM SIGMOD 2019 Demo | [papers/2019_SpeakQL_SIGMOD.pdf Paper PDF] and [papers/2019_SpeakQL_SIGMOD.txt BibTeX] | [https://vimeo.com/295693078 Video]
- Demonstration of Nimbus: Model-based Pricing for Machine Learning in a Data Marketplace\n
Lingjiao Chen, Hongyi Wang, Leshang Chen, Paraschos Koutris, and Arun Kumar\n
ACM SIGMOD 2019 Demo | [papers/2019_NimbusDemo_SIGMOD.pdf Paper PDF] | Video coming soon
- Demonstration of Krypton: Optimized CNN Inference for Occlusion-based Deep CNN Explanations\n
Allen Ordookhanians, Xin Li, Supun Nakandala, and Arun Kumar\n
VLDB 2019 | [http://www.vldb.org/pvldb/vol12/p1894-ordookhanians.pdf Paper PDF] and [papers/2019_Krypton_VLDB.txt BibTeX] | [https://www.youtube.com/watch?v=1OWddbd4n6Y&feature=youtu.be Video]
- Demonstration of Krypton: Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations\n
Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou\n
SysML 2019 Demo | [papers/2019_Krypton_SysML.pdf Paper PDF] | [https://www.youtube.com/watch?v=1OWddbd4n6Y&feature=youtu.be Video]
- Data Management in Machine Learning Systems\n
Matthias Boehm, Arun Kumar, and Jun Yang\n
Synthesis Lectures on Data Management, Morgan \& Claypool Publishers (Book), 2019 |
[https://www.morganclaypool.com/doi/10.2200/S00895ED1V01Y201901DTM057 PDF] |
[https://link.springer.com/book/10.1007/978-3-031-01869-5 Order hard copy]
- Hierarchical and Distributed Machine Learning Inference Beyond the Edge\n
Anthony Thomas, Yunhui Guo, Yeseong Kim, Baris Aksanli, Arun Kumar and Tajana Rosing\n
IEEE ICNSC 2019 | [papers/2019_ICNSC.pdf Paper PDF]
- Predicting Eating Events in Free Living Individuals\n
Jiayi Wang, Jiue-An Yang, Supun Nakandala, Arun Kumar and Marta M. Jankowska\n
eScience 2019 Conference (Poster)
- A Comparative Evaluation of Systems for Scalable Linear Algebra-based Analytics\n
Anthony Thomas and Arun Kumar\n
VLDB 2018/2019 | [papers/2019_SLAB_VLDB.pdf Paper PDF] |
[papers/TR_2018_SLAB.pdf TechReport] | [slab.html Code and Data]
- In-RDBMS Hardware Acceleration of Advanced Analytics\n
Divya Mahajan, Joon Kyung Kim, Jacob Sacks, Adel Ardalan, Arun Kumar, and Hadi Esmaeilzadeh\n
VLDB 2018 | [papers/2018_DANA_VLDB.pdf Paper PDF] |
[http://act-lab.org/artifacts/dana/addendum.pdf Addendum]
- Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers?\n
Vraj Shah, Arun Kumar, and Xiaojin Zhu.\n
VLDB 2018 |
[papers/2018_Hamlet_VLDB.pdf Paper PDF] and [papers/2018_Hamlet_VLDB.txt BibTeX]|
[papers/TR_2017_HamletPlusPlus.pdf TechReport] |
[hamlet.html Code and Data]
- Materialization Trade-offs for Feature Transfer from Deep CNNs for Multimodal Data Analytics\n
Supun Nakandala and Arun Kumar\n
SysML 2018 Short paper/poster | [papers/2018_Vista_SysML.pdf Paper PDF]
- Towards Linear Algebra over Normalized Data\n
Lingjiao Chen, Arun Kumar, Jeffrey Naughton, and Jignesh Patel\n
VLDB 2017 |
[papers/2017_Morpheus_VLDB.pdf Paper PDF] |
[papers/TR_2017_Morpheus.pdf TechReport] |
[morpheus.html Code and Data]
- Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics\n
Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri, Somesh Jha, and Jeffrey Naughton\n
ACM SIGMOD 2017 |
[papers/2017_BismarckBoltOnDP_SIGMOD.pdf Paper PDF] |
[papers/TR_2017_BismarckBoltOnDP.pdf TechReport]
- Data Management in Machine Learning: Challenges, Techniques, and Systems\n
Arun Kumar, Matthias Boehm, and Jun Yang\n
ACM SIGMOD 2017 Tutorial |
[papers/2017_Tutorial_SIGMOD.pdf Paper PDF] |
[papers/Slides_2017_Tutorial_SIGMOD.pdf Slidedeck PDF] |
[https://www.youtube.com/watch?v=U8J0Dd_Z5wo Video of tutorial on Youtube]
- SpeakQL: Towards Speech-driven Multi-modal Querying\n
Dharmil Chandarana, Vraj Shah, Arun Kumar, and Lawrence Saul\n
ACM SIGMOD 2017 HILDA Workshop |
[papers/2017_SpeakQL_HILDA.pdf Paper PDF] and [papers/2017_SpeakQL_SIGMOD.txt BibTeX]
- Model-based Pricing: Do Not Pay for More than What You Learn!\n
Lingjiao Chen, Paraschos Koutris, and Arun Kumar\n
ACM SIGMOD 2017 DEEM Workshop |
[papers/2017_Nimbus_DEEM.pdf Paper PDF]
- Cerebro: A System to Manage Deep Learning for Relational Data Analytics\n
Arun Kumar\n
CIDR 2017 Abstract |
[papers/2017_Cerebro_CIDR.pdf Paper PDF]
- To Join or Not to Join? Thinking Twice about Joins before Feature Selection\n
Arun Kumar, Jeffrey Naughton, Jignesh M. Patel, and Xiaojin Zhu\n
ACM SIGMOD 2016 |
[papers/2016_Hamlet_SIGMOD.pdf Paper PDF] and [papers/2016_Hamlet_SIGMOD.txt BibTeX]|
[papers/TR_2016_Hamlet.pdf TechReport] |
[hamlet.html Code and Data]
- Materialization Optimizations for Feature Selection Workloads\n
Ce Zhang, Arun Kumar, and Christopher Re\n
ACM TODS 2016 (Invited) | [papers/2016_Columbus_TODS.pdf Paper PDF]
- Model Selection Management Systems: The Next Frontier of Advanced Analytics\n
Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M. Patel\n
ACM SIGMOD Record Dec 2015 Vision Track |
[papers/2015_MSMS_SIGMODRecord.pdf Paper PDF]
- Demonstration of Santoku: Optimizing Machine Learning over Normalized Data\n
Arun Kumar, Mona Jalal, Boqun Yan, Jeffrey Naughton, and Jignesh M. Patel\n
VLDB 2015 Demo |
[papers/2015_Santoku_VLDB.pdf Paper PDF] |
[orion.html Code and Data]
- Learning Generalized Linear Models Over Normalized Data\n
Arun Kumar, Jeffrey Naughton, and Jignesh M. Patel\n
ACM SIGMOD 2015 |
[papers/2015_Orion_SIGMOD.pdf Paper PDF] |
[orion.html Code and Data]
- Materialization Optimizations for Feature Selection Workloads\n
Ce Zhang, Arun Kumar, and Christopher Re\n
ACM SIGMOD 2014 |
[papers/2014_Columbus_SIGMOD.pdf Paper PDF]\n
+Best Paper Award; Invited to ACM TODS 2016+
- Distributed and Scalable PCA in the Cloud\n
Arun Kumar, Nikos Karampatziakis, Paul Mineiro, Markus Weimer, and Vijay Narayanan\n
NIPS BigLearn 2013 |
[papers/2013_PCAonREEF_BigLearn.pdf Paper PDF]
- Feature Selection in Enterprise Analytics: A Demonstration using an R-based Data Analytics System\n
Pradap Konda, Arun Kumar, Christopher Ré, and Vaishnavi Sashikanth\n
VLDB 2013 Demo |
[papers/2013_Columbus_VLDB.pdf Paper PDF]
- Hazy: Making it Easier to Build and Maintain Big-data Analytics\n
Arun Kumar, Feng Niu, and Christopher Re\n
ACM Queue 2013 |
[http://queue.acm.org/detail.cfm?id=2431055 Article]\n
+Invited to the Communications of the ACM March 2013+
- Brainwash: A Data System for Feature Engineering\n
Michael Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael Cafarella, Arun Kumar, Feng Niu, Yongjoo Park, Christopher Re, and Ce Zhang\n
CIDR 2013 Vision Track |
[papers/2013_Brainwash_CIDR.pdf Paper PDF]
- Towards a Unified Architecture for in-RDBMS Analytics\n
Xixuan Feng\*, Arun Kumar\*, Benjamin Recht, and Christopher Re\n
ACM SIGMOD 2012 |
[papers/2012_Bismarck_SIGMOD.pdf Paper PDF] |
[papers/TR_2012_Bismarck.pdf TechReport] |
[http://i.stanford.edu//hazy/victor/bismarck-download/ Code and Data]
- The MADlib Analytics Library or MAD Skills, the SQL\n
Joseph M. Hellerstein, Christopher Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar\n
VLDB 2012 Industrial Track |
[papers/2012_MADlib_VLDB.pdf Paper PDF]
- Probabilistic Management of OCR Data using an RDBMS\n
Arun Kumar, and Christopher Re\n
VLDB 2012 |
[papers/2012_Staccato_VLDB.pdf Paper PDF] |
[papers/TR_2012_Staccato.pdf TechReport] |
[http://i.stanford.edu/hazy/staccato/download/ Code and Data]
== Manuscripts and Articles
- Arun Kumar's contribution to "Reminiscences on Influential Papers"\n
Pinar Tozun\n
ACM SIGMOD Record 2023 | [https://sigmodrecord.org/publications/sigmodRecord/2212/pdfs/06_Reminiscences_Rabl.pdf Article PDF]
- Design and Evaluation of an SQL-Based Dialect for Spoken Querying\n
Kyle Luoma and Arun Kumar\n
[papers/TR_2023_SpeakQL_Dialect.pdf TechReport]
- Hydra: A Data System for Large Multi-Model Deep Learning\n
Kabir Nagrecha and Arun Kumar\n
[https://arxiv.org/abs/2110.08633 TechReport] | [https://github.com/knagrecha/hydra Code release]
- Integrating Cerebro with Ray\n
Abhishek Gupta and Rishikesh Ingale\n
[papers/TR_2022_CSE234_CerebroRay.pdf CSE 234 Project TechReport] |
[https://github.com/Abhishek2304/Cerebro-System-Ray Code Release]
- Integrating Cerebro with Dask\n
Vignesh Nanda Kumar and Pratik Ratadiya\n
[papers/TR_2022_CSE234_CerebroDask.pdf CSE 234 Project TechReport] |
[https://github.com/VigneshN1997/cerebro-system Code Release]
- Categorical Data Deduplication\n
Soham Pachpande and Gehan Chopade\n
[papers/TR_2022_CSE234_CategDedup.pdf CSE 234 Project TechReport] |
[https://github.com/sohampachpande/data-deduplication Code, Data, and Pre-trained Models on GitHub]
- Bringing ML-based Feature Type Inference to OpenML\n
Ryan Tran and Victor Zhu\n
[papers/TR_2022_CSE234_OpenML.pdf CSE 234 Project TechReport] |
[https://github.com/bobotran/SortingHatLib Code Release on GitHub] |
[https://pypi.org/project/sortinghatinf/ Package on PyPi]
- Letter from the Rising Star Award Winner\n
Arun Kumar\n
IEEE Data Engineering Bulletin, June 2021 | [http://sites.computer.org/debull/A21june/p94.pdf PDF]
- Improving Feature Type Inference Accuracy of TFDV with SortingHat\n
Vraj Shah, Kevin Yang, and Arun Kumar\n
[papers/TR_2020_TFDV.pdf TechReport]
- ML\/AI Systems and Applications: Is the SIGMOD\/VLDB Community Losing Relevance?\n
Arun Kumar\n
Blog post on the official ACM SIGMOD Blog, 2018 |
[http://wp.sigmod.org/?p=2454 Webpage]
- Advice from PhD to Early Career\n
Arun Kumar\n
ACM SIGMOD 2018 New Researcher Symposium Talk |
[https://sigmod2018.org/nrs_slides/kumar.pdf Slides]
- A Survey of the Existing Landscape of ML Systems\n
Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M. Patel\n
UW-Madison Technical Report TR1827 |
[papers/TR_2015_MSMSSurvey.pdf PDF]
== Theses, and Dissertations
- Simplifying Data Preparation for Machine Learning on Tabular Data\n
Vraj Shah. PhD Dissertation. UC San Diego. 2022 |
[papers/Dissertation_VrajShah.pdf PDF]
- Query Optimizations for Deep Learning Systems\n
Supun Nakandala. PhD Dissertation. UC San Diego. 2022 |
[papers/Dissertation_SupunNakandala.pdf PDF]
- Efficient Systems for Advanced Data Analytics\n
Liangde Li. MS Thesis. UC San Diego. 2022 |
[papers/Thesis_LiangdeLi.pdf PDF]
- Write once, rewrite everywhere: A Unified Framework for Factorized Machine Learning\n
David Justo. MS Thesis. UC San Diego. 2019 |
[papers/Thesis_DavidJusto.pdf PDF]
- Learning Over Joins\n
Arun Kumar. PhD Dissertation. UW-Madison. 2016 |
[papers/Dissertation_ArunKumar.pdf PDF] |
[http://cseweb.ucsd.edu/csevideo/Arun.Kumar.mp4 Video of job talk at UCSD]\n
+Wisconsin CS 2016 Graduate Student Research Award for best dissertation research+