Skip to content

Set execution options in a way more consistent with DataFusion and DataFusion Python #72

@robtandy

Description

@robtandy

Right now, the DFRayContext accepts options that govern its runtime behavior like

 ctx = DFRayContext(
       batch_size=batch_size,
       partitions_per_worker=partitions_per_worker,
       prefetch_buffer_size=prefetch_buffer_size,
       worker_pool_min=worker_pool_min,

While we cannot accept a SessionConfig from datafusion-python due to lack of ABI stability in rust, we should still try to adhere to the DataFusion Python API as much as possible.

So, we allow setting datafusion configuration options via DFRayContext.set.

Setting options in two places is confusion. I think it will be more clear if we set the options in the init via DFRayContext.set instead.

We can call them

  • datafusion.ray.execution.batch_size
  • datafusion.ray.execution.partitions_per_processor (use updated name processor instead of worker here)
  • datafusion.ray.execution.prefetch_buffer_size
  • datafusion.ray.execution.processor_pool_min (use updated name processor)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions