-
Notifications
You must be signed in to change notification settings - Fork 0
/
Duckdb.html
67 lines (57 loc) · 2.2 KB
/
Duckdb.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
<!DOCTYPE html>
<html>
<head>
<title>DuckDB UDF Function Test</title>
<style>
pre { white-space: pre-wrap; font-family: Consolas, monospace; font-size: 14px; background-color: #f5f5f5; padding: 10px; border-radius: 5px; }
</style>
</head>
<body>
<h1>Function Examples</h1>
<h2>Function 1: dot product of two vectors double[] and double[]</h2>
<pre id="function"><code>
import numpy as np
def dot_product(vector1, vector2):
return np.dot(np.array(vector1), np.array(vector2))
</code></pre>
<h2>Function 2: dot product of one vectors double[] and one vectors varchar</h2>
<pre id="function2"><code>
import numpy as np
def dot_product2(vector1, vector2):
d_vector = [float(x) for x in vector2.split(',')]
return np.dot(np.array(vector1), np.array(d_vector))
</code></pre>
<h2>Register UDF: register both dot product function</h2>
<pre id="function3"><code>
import duckdb
from duckdb.typing import VARCHAR, DOUBLE
duckdb.con.create_function("dot_product", dot_product,
[duckdb.array_type(float),
duckdb.array_type(float)],
DOUBLE, side_effects=True)
duckdb.con.create_function("dot_product2", dot_product2,
[duckdb.array_type(float),
VARCHAR],
DOUBLE, side_effects=True)
</code></pre>
<h2>Run UDF: execute the query with both dot product function</h2>
<pre id="function3"><code>
import duckdb
from duckdb.typing import VARCHAR, DOUBLE
vectors1 = ', '.join(str(value) for value in vectors)
vector_columns = ', '.join([str(f"\"{i}\"") for i in range(0, 384)])
start_time = time.time()
duckdb.con.execute(f"""SELECT {col1, col2},
dot_product( [{vector1}], [{vector_columns}]) AS similarity
FROM {tablename} ORDER BY similarity DESC""").fetchdf()
end_time = time.time()
print(f"Time {end_time - start_time}")
start_time = time.time()
duckdb.con.execute(f"""SELECT {select},
dot_product2( [{vector1}], vectors_col_name ) AS similarity
FROM {tablename} ORDER BY similarity DESC""").fetchdf()
end_time = time.time()
print(f"Time {end_time - start_time}")
</code></pre>
</body>
</html>