spark connect fails when performing a `.show()` #3498

universalmind303 · 2024-12-05T20:56:53Z

Describe the bug

# %%
import daft
from daft.daft import connect_start
from pyspark.sql import SparkSession


server = connect_start()
url = f"sc://localhost:{server.port()}"
session = SparkSession.builder.appName("DaftConfigTest").remote(url).getOrCreate()


session.createDataFrame([("cory", 100)], ["name", "age"]).show()

results in "Error in Daft server: Unsupported relation type: ShowString"

To Reproduce

No response

Expected behavior

No response

Component(s)

Other

Additional context

It appears that currently our show logic exists purely in python. As a prerequisite, we'll need to refactor that logic into rust so that it can be used from spark connect.

The text was updated successfully, but these errors were encountered:

andrewgazelka · 2024-12-06T00:15:57Z

Describe the bug
# %%
import daft
from daft.daft import connect_start
from pyspark.sql import SparkSession


server = connect_start()
url = f"sc://localhost:{server.port()}"
session = SparkSession.builder.appName("DaftConfigTest").remote(url).getOrCreate()


session.createDataFrame([("cory", 100)], ["name", "age"]).show()
results in "Error in Daft server: Unsupported relation type: ShowString"

To Reproduce

No response

Expected behavior

No response

Component(s)

Other

Additional context

It appears that currently our show logic exists purely in python. As a prerequisite, we'll need to refactor that logic into rust so that it can be used from spark connect.

also note createDataFrame with strings is currently bugged (I am working on fixing it)

Can we just use Display rust impl for now?

universalmind303 · 2024-12-06T00:55:13Z

Can we just use Display rust impl for now?

no, the logical plan display is equivalent to df.explain().

df.show() shows a small sample of the materialized dataset similar to df.limit(10).collect()

universalmind303 · 2024-12-06T20:06:33Z

@andrewgazelka I can take on this one.

most of this is ported from the python impl inside `daft/runners/partitioning.py`. ### Note for reviewer. For context around why this is needed. The `DataFrame` class uses `PartitionSet` extensively for various common operations such as `show`, and `collect`. In order to add this functionality to our spark connect implementation, we need a similar construct in rust. Ideally, I'd like to port over the python implementation to use this new rust one, but there are still a few things that I'm not entirely sure how to implement (such as `RayPartitionSet`) Not all of the methods inside `partitioning.rs` are used yet, But I intend to follow up this PR with an implementation for #3498, and this is a prerequisite as `show` relies on `get_preview_micropartitions`.

universalmind303 added bug Something isn't working needs triage and removed needs triage labels Dec 5, 2024

samster25 assigned andrewgazelka Dec 6, 2024

universalmind303 assigned universalmind303 and unassigned andrewgazelka Dec 6, 2024

universalmind303 mentioned this issue Dec 6, 2024

refactor: create a rust based PartitionSet #3515

Merged

universalmind303 mentioned this issue Dec 16, 2024

EPIC: spark connect #3581

Open

50 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark connect fails when performing a `.show()` #3498

spark connect fails when performing a `.show()` #3498

universalmind303 commented Dec 5, 2024 •

edited

Loading

andrewgazelka commented Dec 6, 2024 •

edited

Loading

Describe the bug

To Reproduce

Expected behavior

Component(s)

Additional context

universalmind303 commented Dec 6, 2024

universalmind303 commented Dec 6, 2024

spark connect fails when performing a .show() #3498

spark connect fails when performing a .show() #3498

Comments

universalmind303 commented Dec 5, 2024 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Component(s)

Additional context

andrewgazelka commented Dec 6, 2024 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Component(s)

Additional context

universalmind303 commented Dec 6, 2024

universalmind303 commented Dec 6, 2024

spark connect fails when performing a `.show()` #3498

spark connect fails when performing a `.show()` #3498

universalmind303 commented Dec 5, 2024 •

edited

Loading

andrewgazelka commented Dec 6, 2024 •

edited

Loading