Right now, the DFRayContext
accepts options that govern its runtime behavior like
ctx = DFRayContext(
batch_size=batch_size,
partitions_per_worker=partitions_per_worker,
prefetch_buffer_size=prefetch_buffer_size,
worker_pool_min=worker_pool_min,
While we cannot accept a SessionConfig
from datafusion-python due to lack of ABI stability in rust, we should still try to adhere to the DataFusion Python API as much as possible.
So, we allow setting datafusion configuration options via DFRayContext.set
.
Setting options in two places is confusion. I think it will be more clear if we set the options in the init via DFRayContext.set
instead.
We can call them
datafusion.ray.execution.batch_size
datafusion.ray.execution.partitions_per_processor
(use updated name processor instead of worker here)
datafusion.ray.execution.prefetch_buffer_size
datafusion.ray.execution.processor_pool_min
(use updated name processor)