-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support database window functions #12747
Comments
1.How do u Imagine window functions ? Like.addcolumn(WF,sum,[doc_id,id]) 2.What if wf gonna be few ? |
Regarding the API, it's a good question and ideally there would be a generic (non-database!) LINQ way of expressing this. Here's a totally incomplete API idea: Employees.PartitionBy(emp => emp.Department, (emp, dept) => new { Id=emp.Id, Salary=emp.Salary, SalarySum=Math.Sum(dept.Select(emp => emp.Salary)) }) Basically you define two lambdas: the partition lambda (given an employee, returns the partition to which it belongs), and the projection lambda (given an employee and their department, projects with the new partition-calculated lambda). So the above would return an IEnumerable of an anonymous type containing each employee's Id, Salary and also their departments salary sum. One issue is that an in-memory implementation would recalculate the sum for each and every employee in the partition, which isn't very efficient. This definitely does not cover everything that can be done with (PostgreSQL) window functions, but it's a start :) Hopefully if the API is done correctly, there won't be a problematic limit on which window functions are supported. |
hmmmm. what about order by ? ms sql
|
@pmiddleton oh, interesting, I don't know SQL Server's PIVOT functionality at all, is there a lot of overlap? I wonder if it's worth looking at the two together... |
@Diaskhan I haven't looked into it yet, but maybe simply an overload with a third lambda parameter that provides the ORDER BY clause? This definitely needs to be thought out more thoroughly than my quick API proposals here. |
@roji - There is no overlap in end user functionality, but I believe the implementation in the EF backend will be similar. |
Just thinking about this and #13805 (comment), I came to realize that many scenarios in which LINQ's GroupBy does not translate well to SQL's GROUP BY because both rows and some computation of an aggregate or raking functions need to be returned are probably equivalent to SQL window functions. |
@divega Is this something still on progress or no action will be taken? |
This feature is on the backlog, which means it's not going to be included in 3.1. Planning for the version after that hasn't started yet, so we don't know yet if this feature will be included. However, as there's relatively little interest this is unlikely to be prioritized. |
Just to give a use case: In GraphQL this is used a lot. When you have batching in combination with pagination this becomes essential. |
At least context
.Employees
// extract group key to window's partition by when pattern matches the one below
.GroupBy(emp => emp.Department)
.SelectMany
(
group => group
// extract ordering to window's order by
.OrderByDescending(emp = emp.Salary)
.Select((emp, i) => new
{
emp.Department,
emp.Number,
emp.Salary,
Rank = i
})
) |
The window functions are actually fully supported, though via a 3rd party extension. Example of SQL Server var years = new int?[] { 2016, 2017 };
DbContext.Set<StaffSalesPercentile>()
.Query((VwStaffSales sales, Staffs staffs, StaffSalesPercentile alias) =>
{
var fullName = CONCAT_WS(" ", staffs.FirstName, staffs.LastName);
var cumeDist = AggregateBy(PERCENT_RANK())
.OVER(PARTITION(BY(sales.Year))
.ORDER(BY(sales.NetSales).DESC));
var r = SELECT<StaffSalesPercentile>(fullName.@as(alias.FullName),
sales.NetSales.@as(alias.NetSales),
sales.Year.@as(alias.Year),
cumeDist.@as(alias.Percentile));
FROM(sales).JOIN(staffs).ON(staffs.StaffId == sales.StaffId);
WHERE(years.Contains(sales.Year));
return r;
}); Apparently this is a direction efcore may consider as well |
Any update on this? |
@chamikasandamal This issue is in the Backlog milestone. This means that it is not planned for the next release (EF Core 6.0). We will re-assess the backlog following the this release and consider this item at that time. However, keep in mind that there are many other high priority features with which it will be competing for resources. |
There are examples of window functions with working code in the Linq To DB ORM. |
syntax suggestion: var query = db.Set<Table>()
.UseWindowFunction(x=> EFWindowFunctions.RowNumber( ) , partition: new { x.Alpha }, orderby: new []
{
new OrderByDefine{ Column = x.CreateDate, Sort = Sort.Asc }
}, (x, i)=> new { x, rowNumber = i } )
.ToList();
static TResult UseWindowFunction<T,TFuncResult,TResult>(this IQuerable<T> query, Expression<Func<IQuerable<T>, TFuncResult>> windowFunction, object? partition, IEnumerable<OrderByDefine> orderby,
Expresstion<Func<T,TFuncResult,TResult>> resultSelector)
{
}
static int RowNumber<T>(this EFWindowFunctions _)
{
throw new InvalidOperation("****")
} sql tempalte: select A,B,C, <filed name>
FROM
(
SELECT A,B,C, <windowFunction> OVER ( <partition> , <orderby>) as <filed name>
FROM
(
<Original Query>
) AS ___t
) AS __t
|
Library availableHi all, in an attempt to get rid of ugly SQL in a project for my client, I've written a library that does provides window functions and binary functions as well. It has been used in production since January of this year, so battle tested in at least this one project. I hope anyone interested can check it out, use it, harshly criticize it and contribute to it. It is currently available for SQL Server, SQLite and Postgres and is multitargeted to frameworks 6 and 7-preview. https://github.com/zompinc/efcore-extensions Previously discussed here |
@virzak suggest you do a PR to the tools list in the docs |
Please support window functions in efcore 9, i think this is the best time to add this feature. |
An important scenario for this is paging, when the total number of rows (without the limit/offset applied) needs to be (efficiently) computed. See usage of |
Is there any interest in an EF Core community standup featuring https://github.com/zompinc/efcore-extensions ? Would love to come on and present it. It recently implemented Window Functions inside a Where clause, multiple levels nested window functions and as of today it has count(*) over() Here is the breakdown I had in mind: Approximate breakdown could be:
|
@virzak definitely could be! Let us discuss and we'll reach out as time allows etc. |
@virzak are they on the tools list in the docs? |
Recent community standup video discussing windows function and @virzak's EF extension for supporting them. |
Dumping some thoughts I had on the LINQ way to express windows functions... Note that this doesn't mean we plan to implement this any time soon (although it's definitely on my radar). Most variations we've seen so far e.g. in the Zomp extension and in the above proposals) go in the direction of allowing users to directly express e.g. In the context of window functions, that would mean pattern-matching LINQ subquery patterns that match window functions. For example, consider the following query: _ = await context.Blogs
.Select(b => new { Blog = b, TotalCount = context.Blogs.Count() })
.ToListAsync(); EF currently evaluates the Instead, EF could translate this to a window function: SELECT [b].[Id], [b].[Name], COUNT(*) OVER () AS TotalCount
FROM [Blogs] AS [b] (this would require changing the funcletizer to not evaluate for this kind of case) Going further a PARTITION BY clause could be expressed by adding a Where operator: _ = await context.Blogs
.Select(b => new { Blog = b, TotalCount = context.Blogs.Count(b2 => b2.Category == b.Category) })
.ToListAsync(); This adds a count of all Blogs in the each Blog's category. It is currently translated via an inefficient correlated subquery: SELECT [b].[Id], [b].[Category], [b].[Name], (
SELECT COUNT(*)
FROM [Blogs] AS [b0]
WHERE [b0].[Category] = [b].[Category]) AS [TotalCount]
FROM [Blogs] AS [b]
WHERE [b].[Id] > 8 ... but could be identified and transformed to the following using window functions: SELECT [b].[Id], [b].[Name], COUNT(*) OVER (PARTITION BY [b].[Category]) AS TotalCount
FROM [Blogs] AS [b]; A window function ORDER BY could be expressed as follows: _ = await context.Blogs
.Select(b => new { Blog = b, TotalCount = context.Blogs.Select(b2 => b.Id).OrderBy(i => i).Where(i => i < b.Id).Min() })
.Where(x => x.Blog.Id > 8)
.ToListAsync(); This calculates the "minimum ID so far" for each Blog, and would translate to: SELECT [b].[Id], [b].[Name], MIN([b].[Id]) OVER (ORDER BY [b].[Id]) AS TotalCount
FROM [Blogs] AS [b]; Some notes:
@virzak @ajcvickers very interested to hear any thoughts you have on the above! |
@roji, I really like this approach. In retrospect, I should have started implementing the extension by detecting these LINQ patterns. There is no disadvantage to replacing an inefficient subquery in favour or window function where it makes sense. Add RowNumber to the solution: // Row Number
var query = context.Blogs
.Select((b, i) => new { Blog = b, RowNumber = i + 1 }); The test cases of the Zomp library actually use LINQ to get the expected result and compares them to the actual results. Some of those LINQ expressions are complex and not user friendly (perhaps they could be simplified). Here is an attempt at Percent_Rank. expectedSequence = TestRows
.Select(r => r.Col1)
.OrderBy(x => x, comparer)
.Select(v => groups
.Where(g => comparer.Compare(g.Key, v) < 0)
.Select(g => g.Count())
.Sum() / (double)(TestRows.Length - 1)); Another huge advantage of LINQ: there are differences between implementations. Percent_Rank differs between Postgres and SQL Server by whether nulls come first or not. So even though LINQ expression is more complex, it provides more determinism than the OVER() approach. When you have the data:
Sql Server produces: SELECT
Percent_rank = Percent_rank() Over( order by "Col1" )
,[Id]
,[Col1]
FROM [TestRows]
Postgres produces: SELECT Percent_rank() Over( order by "Col1" ) as "Percent_rank", "Id", "Col1" FROM public."TestRows"
ORDER BY "Id" ASC
Perhaps a hybrid approach is good also, if some functionality cannot be covered by LINQ, but first stage of implementation can cover as much as possible without exposing any new API. Adding "Moving Window" into Enumerable and System.Linq.Async's AsyncEnumerable would be a great addition for purposes other than EF Core. In my client project I have an implementation, which I would dump if LINQ had these functions. Of course it would need to cover cases where window size doesn't remain the same on each iteration (ROWS BETWEEN UNBOUNDED PRECEDING AND 2 FOLLOWING). I also agree with the upsides of Over() approach as well. .PartitionBy(b.Category) is friendlier and less error prone than .Count(b2 => b2.Category == b.Category). Though not introducing new APIs seems like a far bigger advantage. As for getting the filters "just right", perhaps we could encourage an approach where a query is constructed in multiple statements: var filteredQuery = context.Blogs.Where(z => z.Id > 10);
var finalQuery = filteredQuery
.Select(b => new { Blog = b, TotalCount = filteredQuery.Count(z => z.Id < b.Id) }); |
Absolutely. This is something that has been discussed before (#24218) and decided against, but I think we should reconsider and implement it.
Yeah, I suspected this may be the case (that was the "just right" comment above). We can try to think about this and see if we can come up with something friendlier; if not, I think it may be OK to also expose an explicit window function LINQ construct as well.
Exactly - plain old LINQ is great (where possible) also because it's an abstraction that shields the user from database-specific details.
Yeah, that's a very general difference between PG and SQL Server that affects everything (including just plain ORDER BY). That's one example of cases where we don't attempt to force the database into matching the .NET client behavior (or the behavior of ther databases) - similar to how SQL Server is case-insensitive by default (though note that PG ORDER BY allows specifying NULLS FIRST/LAST, and npgsql/efcore.pg#627 tracks exposing that to users; not sure if there's something similar for Percent_Rank). To summarize, I think a behavioral discrepancy specifically on NULL ordering would be acceptable if there's no good way around it.
Yep, I think that's a good approach.
Yes, that would indeed avoid the repetition that's otherwise needed. That would unfortunately clash with precompiled queries/NativeAOT, since those are only planned to work for statically-analyzable queries - and once you start breaking up the query into multiple IQueryable variables, that's no longer a simple static query. Though if the composed queryables are localized in a single function, statically analyzing them may become possible. In any case, we'll have the same problem with features other than window functions, e.g. the CTE support planned in #29918. |
Ok time for me to chime in since this conversation has taken off. I started implementing this feature in ef core back at xmas. I currently have things at a state where I have an api fully designed and can run queries. I am now adding in support for more aggregates, adding unit tests, and flushing out all the plethora of bugs. For the api I went with a set of new fluent functions, similar to what others have done for a number of reasons. I at first investigated adding in some custom linq operators. This lead to a bunch of dead ends due to being limited to hanging extension methods of the existing interfaces. You can't sufficiently control where the api surfaces itself when writing linq queries this way. You would really have to sit down with the linq team and modify the base interfaces/methods to make this approach work. The fluent api approach is more intuitive from the average end developers standpoint. You have a well defined set of methods that show you exactly how to write a windowing function in a query. You are not relying on behind the scenes ef core magic to hopefully translate things into what you want. While there is merit in having ef core "just figure things out" in some cases I don't think it would be the best approach for that to be the main way to define windowing functions. Here is an example of what I am currently doing. MaxDiscount = EF.Functions.Over().PartitionBy(od.ProductID, od.OrderID).OrderBy(od.OrderID).Rows(RowsPreceding.CurrentRow, 5).Max(od.Discount), I felt this format lent it self to the best usability and easy of implementation. It also allows for extension points for the various providers. For example different providers have different needs for row/range/groups in the sliding window, while others support adding in an excludable clause. I currently was only shooting for support in select/order by, but based on this thread adding in support for where shouldn't be too difficult. You can see my current working branch here. https://github.com/pmiddleton/EntityFramework/tree/windowing Let me know what you think. Paul |
Exciting to have you back! |
Here I assume you mean using special LINQ constructs ( _ = await context.Blogs
.Select(b => new { Blog = b, TotalCount = context.Blogs.Count(b2 => b2.Category == b.Category) })
.ToListAsync(); ... whereas using a special construct would instead include something like: EF.Functions.Over().PartitionBy(b.Category).Count(), I of course chose the "easy" example (see discussion below), but for this category of cases, at least IMHO the 1st way is a much more intuitive, easy and discoverable way to express things without needing to understand windows functions etc.
It's worth noting that to a certain extent, all of EF's LINQ translation is basically that; that is, uses don't express SQL concepts (either via strings or via SQL-conceptual operators, as many other ORMs do), but rather use C# code - and EF is responsible for figuring out how to best translate that. I do agree that there's an extremity where a very specific combination of LINQ operators need to be composed just right for EF's pattern matching to kick in and do the right thing - at that point a custom operator maybe makes more sense. But I think we may be reaching the limit of where the conversation can lead without diving into the concrete details. We know that there's at least quite a few window function constructs that can be expressed quite naturally without any explicit SQL concepts (like OVER); we suspect there may be others where that may not be the case. A case-by-base analysis of the different window functions scenarios and what natural LINQ constructs are necessariy to express them would IMHO provide the information we need here (I did a tiny bit of that in my comment above but that's clearly not enough). And once again, even if there are scenarios where a custom LINQ operator (OVER) is more appropriate, that still doesn't mean we shouldn't tackle the pattern-matching approach for the cases where we a nice, expressive LINQ construct exists without it. Finally, the specific proposed syntax looks good to me (I'd need a bit more time to think about it in depth). One thing that may be a bit odd, is that the operator chain rooted over EF.Functions.Over() isn't typed to Employee in any way - and therefore the operators (PartitionBy, OrderBy) accept any argument (and also a non-lambda parameter in constrast to e.g. regular OrderBy). Making EF.Functions.Over typed ( |
To me, this is the key. If the developer writes a LINQ query that would be best translated using a window function, then EF Core should do this. This is what we normally do for database functions like |
@roji don't get me wrong. I'm all for finding patterns such as this and automatically doing the right thing behind the scenes. _ = await context.Blogs
.Select(b => new { Blog = b, TotalCount = context.Blogs.Count(b2 => b2.Category == b.Category) })
.ToListAsync(); Your right that this is the most intuitive way to write this particular query in linq. However the number of these scenarios is limited. What I meant by ef magic is that you could never hope to teach your average dev that we could transform something like this into a window function. expectedSequence = TestRows
.Select(r => r.Col1)
.OrderBy(x => x, comparer)
.Select(v => groups
.Where(g => comparer.Compare(g.Key, v) < 0)
.Select(g => g.Count())
.Sum() / (double)(TestRows.Length - 1)); Once you get beyond the simples cases, where honestly the average dev isn't even thinking about using a window function in the first place, then you end up with better usability from a well defined api. |
@roji - good catch on the Over. In my various linq extension method approaches things were typed to I am trying to think if there is ever a use case where you need to bind outside of the scope of |
Right, I think we're more or less aligned on this... I do want to spend a bit more time to think about the problematic scenarios: in at least some cases, there may be opportunities for proposing general improvements to LINQ that make sense (not just for databases), and that may also simplify cases where currently the LINQ expression isn't good. But I know that's very abstract (it's mainly about me having to spend more time thinking about LINQ & window functions to understand things better).
I don't think that can ever happen in SQL, but I'm no expert on window functions. Ideally we'd have had the IQueryable for Employee where we want to specify a window function invocation, and we'd just hang Over() on top of that - but we don't, only the Select's element parameter. That's what led me to go in the direction of extracting the IQueryable out and referencing it twice as a simplification (@virzak also mentioned this possibility), e.g.: var filteredBlogs = context.Blogs.Where(b => b.Id > 8);
_ = await filteredBlogs
.Select(b => new { Blog = b, TotalCount = filteredBlogs.Count(b2 => b2.Category == b.Category) })
.ToListAsync(); Count() here can be replaced with an Over() extension method over IQueryable, which flows the type forward like any normal LINQ operator. That would be the ideal way to express things IMHO, but unfortunately it does depend on extracting the IQueryable out, which may be a bit much (and is also a bit problematic for NativeAOT/static queries)... |
Adding an Over method to IQueryable was one of my original ideas. One of the main issues I ran into is that you now have access to that Over method in places where it does not make any logical sense. I also tried hanging it off of Select elements parameter but that required binding an extension method to object so yea worse than hanging it off of IQueryable :) What you need is an overload for Select that passes in both the entity and some new IWindow into the lambda which you can then Over on. Though at the end of the day that is just a slight variation of syntax vs the fluent api off of EF.Functions. |
Yep, that all makes sense to me. FWIW I don't think it would be terrible to have Over() on IQueryable - there are other queryable operators which don't make sense everywhere. For example, most LINQ operators don't translate after GroupBy, and that's OK (we'd throw an informative error). Also, at least this would be a queryable extension, which generally restricts it to EF queries only. This is why we allow ourselves to add queryable extensions, but not enumerable extensions (which would appear in totally non-EF contexts). My main issue with the Over-over-queryable approach is that it requires breaking trhe query apart, to allow reusing the IQueryable as a local variable... Maybe that's not too terrible either, I don't know.
That's a nice idea (though note that this Select overload would also be an IQueryable extension, like the Over() extension discussed above). Also, note that window functions are also supported by OrderBy. And in fact, in @virzak extension you can use them everywhere (e.g. in a Where clause), and the extension detects that and does a subquery pushdown to have the function in the subquery projection. If we want to support this powerful high-level kind of query translation (which I think is a good idea), we'd need to basically allow window functions anywhere, so you also need overloads for all queryable operators - not great. |
I took at quick look at this tonight. You can root the chain via the over method but it needs to be done via a passed parameter. The input to the parent linq method will quite probably be an anonymous type so it's the only option. You then end up with a query that looks like this. MaxDiscount = EF.Functions.Over(od).PartitionBy(pb => new { pb.Quantity, pb.UnitPrice }).OrderBy(ob => ob.Quantity).Rows(RowsPreceding.CurrentRow, 5).Max(m => m.Discount), vs what I currently have MaxDiscount = EF.Functions.Over().PartitionBy(od.ProductID, od.OrderID).OrderBy(od.OrderID).Rows(RowsPreceding.CurrentRow, 5).Max(od.Discount), While this does look more like a more typical linq operator, at the end of the day all I don't think it buys you anything. The use of lambdas still won't stop you from including references to any random in scope variable, the same as with the untyped approach. Using a fluent api approach means the window call already lives inside of a lambda from the parent select, orderby, etc. and is meant to use it's parameter. An entire query looks like this. var results = context.Employees.Select(e => new
{
e.Id,
e.Name,
MaxSalary = EF.Functions.Over().Max(e.Salary),
MinSalary = EF.Functions.Over().Min(e.Salary)
}).ToList(); Now I'm hardly a linq expert so I might be missing something? @virzak - was there any reason your api went with one approach over the other? Were there any inherent advantages you found by doing it the way you did? |
I agree there's not huge value to having lambdas - though there's also not huge value to not having them... Yeah, the main point is to align the operators with what LINQ operators typically look like, especially given that sometimes the exact same operators are used here (OrderBy, Max), and in other contexts these accepts lambdas. So it's a consistency thing. Another similar thing is that it's generally odd for the operator chain to not be rooted to anything; this is why I was trying to explore options where Over() is composed over IQueryable, rather than a root-less EF.Functions.Over(). Passing the Select() parameter to Over() as least provides that root (albeit in a slightly weird way). I also think that LINQ users are used to dotting on the operator parameter, so having a lambda may better guide them to reference properties on I don't pretend that the above are very strong arguments - but then I don't think there are strong arguments to the other side (the syntax is slightly less verbose?). |
Might not be intuitive with query syntax: from a in SomeTable
from b in SomeRelatedTable
select new
{
a.Number,
b.Number,
Diff = EF.Functions.Over(???).Max(a.Number - b.Number)
} |
@pmiddleton Not sure I understand what would be an advantage of having a lambda inside a lambda. The |
I have the first cut of the windowing functions code ready. I would like you two to take a look at the api and give me your thoughts. The code is not ready for a PR yet. I have not yet documented any of the new apis. I don't want to do that until I know things are set. I also rebased last weekend and see that @roji added a Quote method on SqlExpression so I need to figure out what that needs to do. Overall I have added the following features for window functions
The following windows functions are supported
I don't yet have where clause support via a pushdown done. That might not make it for this first cut. The code is here here https://github.com/pmiddleton/EntityFramework/tree/windowing3 You can look in the WindowFunctionTestBase.cs file for the unit tests to see what the api looks like. Let me know what you think. Paul |
Hey @pmiddleton, thanks for working on this! I can't commit to thorough reviewing soon (there's lots going on), but I do suggest submitting your work as a draft PR, simply because it's easier to have a conversation/review like that (this doesn't at all only that what you have is ready etc.) |
I created the PR here #34258 Since I rebased last week I am having build issues in Visual Studio. Have you guys run into this issue Error (active) CS9057 The analyzer assembly 'C:\code\efcore.dotnet\packs\Microsoft.AspNetCore.App.Ref\9.0.0-preview.5.24306.11\analyzers\dotnet\cs\Microsoft.AspNetCore.App.Analyzers.dll' references version '4.11.0.0' of the compiler, which is newer than the currently running version '4.10.0.0'. EFCore.AspNet.Specification.Tests C:\code\efcore\test\EFCore.AspNet.Specification.Tests\CSC 1 I'm getting that on 4 projects. Everything builds fine at the command line. |
Both SQL Server and PostgreSQL seem to support window functions in a similar way - in at least some cases the syntax even seems to be identical. It could be nice to add support for this to EF Core.
A main question is how this would be expressed in LINQ, AFAIK there's no way to do these kinds of operations in memory.
The text was updated successfully, but these errors were encountered: