-
Notifications
You must be signed in to change notification settings - Fork 3
/
morpheus.html
124 lines (124 loc) · 7.06 KB
/
morpheus.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta name="generator" content="jemdoc, see http://jemdoc.jaboc.net/" />
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<link rel="stylesheet" href="jemdoc.css" type="text/css" />
<title>ADA Lab @ UCSD</title>
</head>
<body>
<table summary="Table for page layout." id="tlayout">
<tr valign="top">
<td id="layout-menu">
<div class="menu-item"><a href="index.html">Home</a></div>
<div class="menu-item"><a href="index.html#members">Members</a></div>
<div class="menu-item"><a href="publications.html">Publications</a></div>
<div class="menu-item"><a href="news.html">News</a></div>
<div class="menu-item"><a href="impact.html">Impact</a></div>
<div class="menu-item"><a href="blog.html">Blog/Misc.</a></div>
<div class="menu-item"><a href="projects.html"><br /> Active Projects</a></div>
<div class="menu-item"><a href="cerebro.html">Cerebro</a></div>
<div class="menu-category"><br /> Past Projects</div>
<div class="menu-item"><a href="sortinghat.html">SortingHat</a></div>
<div class="menu-item"><a href="speakql.html">SpeakQL</a></div>
<div class="menu-item"><a href="krypton.html">Krypton</a></div>
<div class="menu-item"><a href="vista.html">Vista</a></div>
<div class="menu-item"><a href="panorama.html">Panorama</a></div>
<div class="menu-item"><a href="morpheus.html" class="current">Morpheus</a></div>
<div class="menu-item"><a href="hamlet.html">Hamlet</a></div>
<div class="menu-item"><a href="nimbus.html">Nimbus</a></div>
<div class="menu-item"><a href="slab.html">SLAB</a></div>
<div class="menu-item"><a href="orion.html">Orion</a></div>
<div class="menu-item"><a href="http://i.stanford.edu/hazy/victor/columbus/">Columbus</a></div>
<div class="menu-item"><a href="http://i.stanford.edu/hazy/victor/bismarck/">Bismarck</a></div>
<div class="menu-item"><a href="http://i.stanford.edu/hazy/staccato/">Staccato</a></div>
</td>
<td id="layout-content">
<div id="toptitle">
<h1>ADA Lab @ UCSD</h1>
</div>
<table class="imgtable"><tr><td>
<img src="images/morpheus.jpg" alt="" height="80px" /> </td>
<td align="left"><h2>Project Morpheus</h2>
</td></tr></table>
<h3>Overview</h3>
<p>Applying ML to structured data often involves performing relational operations as part of feature and data engineering. For instance, joins before ML are ubiquitous, since many datasets in the real world are multi-table, while almost all ML toolkits expect single-table inputs. This forces data scientists to join those tables and <i>materialize</i> a single table, which leads to data redundancy and runtime waste. In recent work (<a href="orion.html" target=“blank”>Project Orion</a>), we introduced the paradigm of “factorized” ML to mitigate this issue for a few specific ML algorithms by showing how to push ML through joins. But that approach requires a manual rewrite of ML implementations. Such a piecemeal approach creates a massive development overhead when extending factorized ML to other ML algorithms.
</p>
<p>In this project, we mitigate the above overhead by leveraging a popular formal algebra to represent the computations of many ML algorithms: linear algebra (LA). We introduce a new logical data type to represent multi-table data and devise a framework of algebraic rewrite rules to convert a large set of LA operations over denormalized data into operations over the base tables. This enables us to automatically factorize several popular ML algorithms, thus unifying and generalizing prior works. Experiments with real-world multi-table datasets show that our approach also yields significant runtimes speed-ups in multiple ML system environments.
</p>
<p>We have protoyped Morpheus in the popular R environment. Versions in Python and TensorFlow, as well as Apache SystemML are in the works. This project sets the stage for a holistic unification of relational algebra-based feature and data engineering with LA-based ML to help accelerate ML workloads over structured data.
</p>
<p>The ideas from this work have been protoyped and/or adopted for applications at LogicBlox, Microsoft, and Avito.
</p>
<h3>Downloads (Paper, Code, Data, etc.)</h3>
<ul>
<li><p>Towards A Polyglot Framework for Factorized ML<br />
David Justo, Shaoqing Yi, Lukas Stadler, Nadia Polikarpova, and Arun Kumar<br />
VLDB 2021 (Industrial Track; to appear) | <a href="papers/2021_Trinity_VLDB.pdf" target=“blank”>Paper PDF</a> | <a href="papers/TR_2021_Trinity.pdf" target=“blank”>TechReport</a> | <a href="https://www.youtube.com/watch?v=osvBmZs2MsM" target=“blank”>Talk video</a>
</p>
</li>
</ul>
<ul>
<li><p>Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra<br />
Side Li, Lingjiao Chen, and Arun Kumar<br />
ACM SIGMOD 2019 | <a href="papers/2019_MorpheusFI_SIGMOD.pdf" target=“blank”>Paper PDF</a> and <a href="papers/2019_MorpheusFI_SIGMOD.txt" target=“blank”>BibTeX</a> | <a href="https://github.com/liside/MorpheusFI" target=“blank”>Code and Data on Github</a>
</p>
</li>
</ul>
<ul>
<li><p>Towards Linear Algebra over Normalized Data<br />
Lingjiao Chen, Arun Kumar, Jeffrey Naughton, and Jignesh Patel<br />
VLDB 2017 |
<a href="papers/2017_Morpheus_VLDB.pdf" target=“blank”>Paper PDF</a> |
<a href="papers/TR_2017_Morpheus.pdf" target=“blank”>TechReport</a>
</p>
</li>
</ul>
<ul>
<li><p>MorpheusPy: Factorized Machine Learning with NumPy<br />
Side Li, Arun Kumar<br />
<a href="papers/TR_2018_MorpheusPy.pdf" target=“blank”>TechReport</a>
</p>
</li>
</ul>
<ul>
<li><p>MorpheusFlow: a case study of learning over joins with TensorFlow<br />
Side Li, Arun Kumar<br />
<a href="papers/TR_2018_MorpheusFlow.pdf" target=“blank”>TechReport</a>
</p>
</li>
</ul>
<ul>
<li><p><a href="https://github.com/lchen001/Morpheus" target=“blank”>MorpheusR Code on GitHub</a> (R library for normalized matrix and factorized LA; example factorized ML algorithms)
</p>
</li>
<li><p><a href="https://github.com/ADALabUCSD/MorpheusPy" target=“blank”>MorpheusPy Code on GitHub</a> (Python library for normalized matrix and factorized LA; example factorized ML algorithms)
</p>
</li>
<li><p><a href="https://github.com/ADALabUCSD/MorpheusFlow" target=“blank”>MorpheusFlow Code on GitHub</a> (TensorFlow extension for lazy joins over relational data)
</p>
</li>
<li><p><a href="https://github.com/lchen001/Morpheus/blob/master/Data.zip" target=“blank”>Real-world datasets</a> (Based on the <a href="hamlet.html" target=“blank”>Project Hamlet</a> datasets)
</p>
</li>
</ul>
<h3>Student Contacts</h3>
<ul>
<li><p>Side Li: s7li [at] eng [dot] ucsd [dot] edu
</p>
</li>
</ul>
<h3>Acknowledgments</h3>
<p>This project is supported in part by Faculty Research Awards form Google Research and Oracle Labs.
</p>
<div id="footer">
<div id="footer-text">
Page generated 2024-04-25 10:52:35 PDT, by <a href="https://github.com/wsshin/jemdoc_mathjax" target="blank">jemdoc+MathJax</a>.
</div>
</div>
</td>
</tr>
</table>
</body>
</html>