Skip to content

Commit

Permalink
Troubleshoot dask and pandas (#847)
Browse files Browse the repository at this point in the history
  • Loading branch information
janet-can committed Aug 8, 2024
1 parent d87c31f commit 855ff61
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 0 deletions.
2 changes: 2 additions & 0 deletions soda-cl/sample-datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,8 @@ Note that you cannot use an `exclude_columns` configuration to disable sample ro

## Specify columns for failed row sampling

{% include banner-upgrade.md %}

Beyond collecting samples of data from datasets, you can also use a `samples columns` configuration to an individual check to specify the columns for which Soda must implicitly collect failed row sample values. Soda only collects the check's failed row samples for the columns you specify in the list, as in the `duplicate_count` example below.

Soda implicitly collects failed row samples for the following checks:
Expand Down
9 changes: 9 additions & 0 deletions soda/connect-dask.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,8 @@ scan.set_verbose(True)
scan.execute()
```

<br />

### Load JSON file into Dataframe

{% include code-header.html %}
Expand All @@ -103,6 +105,13 @@ df = pd.read_json('your_file.json')

...
```
<br />

## Troubleshoot

**Problem:** You encounter errors when trying to install `soda-dask-pandas` in an environment that uses Python 3.11. This may manifest as an issue with dependencies or as an error that reads, `Pre-scan validation failed, see logs for details.`

**Workaround:** Uninstall the `soda-dask-pandas` package, then downgrade the version of Python your environment uses to Python 3.9. Install the `soda-dask-pandas` package again.

<br />
<br />
Expand Down
8 changes: 8 additions & 0 deletions soda/connect-troubleshoot.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,14 @@ Last modified on {% last_modified_at %}

<br />

## Scan error with Soda Dask and Pandas

**Problem:** You encounter errors when trying to install `soda-dask-pandas` in an environment that uses Python 3.11. This may manifest as an issue with dependencies or as an error that reads, `Pre-scan validation failed, see logs for details.`

**Workaround:** Uninstall the `soda-dask-pandas` package, then downgrade the version of Python your environment uses to Python 3.9. Install the `soda-dask-pandas` package again.

<br />

## Go further

* Access [Troubleshoot SodaCL]({% link soda-cl/troubleshoot.md %}) for help resolving issues running scans with SodaCL.
Expand Down

0 comments on commit 855ff61

Please sign in to comment.