Skip to content

Improve performance of REPEAT functions #11990

Closed
@alamb

Description

@alamb
          I think you could make this perform much better by avoiding the `String` and instead building the output directly with `StringViewBuilder` 

https://docs.rs/arrow/latest/arrow/array/type.StringViewBuilder.html

here is an example of how to use them: apache/arrow-rs#6240

I realize this just follows the same model as was here. However, if we are messing with the code it might be nice to make it faster while we are at it

Originally posted by @alamb in #11962 (comment)

The idea would be to

  1. Create a benchmark for the REPEAT function for StringArray, LargeStringArray and StringViewArray
  2. Optimize the performance of REPEAT (likely by not creating Strings but instead creating the values directly into a StringBuilder / StringViewArray builder somehow

Benchmarks:

Benchmarks likely would go into https://github.com/apache/datafusion/blob/main/datafusion/functions/benches follow the model of an existing one (e.g. ltrim.rs).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions