-
Notifications
You must be signed in to change notification settings - Fork 43
Deduplication in database queries
Links implements SQL queries using a normaliser that targets the multiset fragment of SQL. We have now introduced a new, experimental normaliser (the mixing normaliser) which supports queries mixing set and multiset operations, enabling the user to freely use deduplication within a query.
The mixing normaliser is enabled in one of two ways:
-
globally, by adding a
mixing_norm=on
statement to the Links configuration file: this will cause all flat queries to be processed by the mixing normaliser (nested queries introduced by thequery nested
statement are unaffected) -
on a per query basis, for queries specified in a
query mixing { ... }
block.
When the mixing normaliser is enabled, the functions dedup
and distinct
can be used within queries. The former takes as input a list, and returns the same list where duplicate elements have been removed.
links> dedup;
fun : ([a]) -> [a]
The function distinct
is similar, except that its input is a table handle (concretely distinct(t)
is defined as dedup(asList(t))
).
links> distinct;
fun : (TableHandle((|a::Base),(|_::Base),(|_::Base))) {}-> [(|a::Base)]
Unlike homogeneous multiset queries, queries mixing sets and multisets can be defined using 'lateral' dependencies between inputs that cannot be simplified away using standard techniques (see [1]). The mixing normaliser will generate SQL queries using the keyword lateral
when this is needed in order for a query to be legal SQL. For example the query
query mixing {
for (x <-- factorials)
for (z <- dedup(for (y <- distinct(factorials))
where (x.i == y.f)
[(f = x.f)]))
[(f = z.f)]
}
after normalisation, yields the following SQL:
select z.f as f from
factorials as x,
lateral (select distinct x.f as f
from (select distinct * from factorials) as y
where x.i = y.f) as z
lateral
is an SQL:1999 feature and is not supported on older DBMSs; furthermore, in some cases it may be executed inefficiently. The mixing normaliser implements a query transformation ('delateralisation') that can be used to only produce SQL queries that will not use lateral
. This transformation is enabled optionally on a per query basis, using a query delat { ... }
block. Delateralisation enables the use of older, non SQL:1999 compliant DBMSs, and may in some cases lead to more efficient execution of queries.
-
In order for a query to be translatable to SQL,
dedup
should only be used on expressions whose type is a flat list (e.g. lists of base types, or lists of tuples of base types, but not lists of lists of base types or lists of functions). Ifdedup
is applied to other types, the evaluation will fail even if the type of the whole query is a flat list ([1] and [3] provide some insight on this limitation). -
Presently, the mixing normaliser provides no support for ranges:
links> query [2,1] mixing { for (f <-- factorials) [f] }; ***: Error: Links_core.EvalMixingQuery.EvalMixingUnimplemented("Range is not (yet) supported by the new mixing normaliser")
-
The mixing normaliser is also unable to deal with nested queries:
query nested
will always use the standard normaliser, even whenmixing_norm=on
is specified in the configuration file.