Skip to content

Lazy statistics for columns #1492

@Jolanrensen

Description

@Jolanrensen

Let's say you write

df.filter { someValue > myColumn.max() }

This is way faster:
val myColumnMax = df.myColumn.max()

df.filter { someValue > myColumnMax }

Maybe we could solve this by having lazily calculated stats stored inside ValueColumns. Columns are immutable after all, so it would be safe to do so and the performance gain should be significant!

Of course, this wouldn't work when you write:

df.filter { someValue > (myColumn + 1).max() }
// or
df.filter { someValue > myColumn.maxOf { it + 1 } }

but that's okay I think. We can't have it all :)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformanceSomething related to how fast the library can handle data

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions