Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add pancreas dataset #741

Merged
merged 15 commits into from
Sep 12, 2024
Merged

add pancreas dataset #741

merged 15 commits into from
Sep 12, 2024

Conversation

MUCDK
Copy link
Collaborator

@MUCDK MUCDK commented Aug 26, 2024

No description provided.

@MUCDK MUCDK requested a review from selmanozleyen August 26, 2024 10:25
@selmanozleyen
Copy link
Collaborator

Before going more into the fixes @MUCDK, I think there is a problem with this function _load_from_url

def _load_dataset_from_url(
.

In its code force_download is not being used at all. I also noticed it locally, that even if I set it to true it uses the cache

def _load_dataset_from_url(
    fpath: PathLike,
    type: Literal["h5ad", "h5mu"],
    *,
    backup_url: str,
    expected_shape: Tuple[int, int],
    force_download: bool = False,
    **kwargs: Any,
) -> Union[ad.AnnData, mu.MuData]:
    # TODO: make nicer once https://github.com/scverse/mudata/issues/76 resolved
    fpath = os.path.expanduser(fpath)
    if type == "h5ad" and not fpath.endswith(".h5ad"):
        fpath += ".h5ad"

    if type == "h5mu" and not fpath.endswith(".h5mu"):
        fpath += ".h5mu"

    if not os.path.exists(fpath):
        with tempfile.TemporaryDirectory() as tmpdir:
            tmp = pathlib.Path(tmpdir) / f"data.{type}"
            urllib.request.urlretrieve(backup_url, tmp)
            if type == "h5ad":
                data = ad.read_h5ad(filename=tmp, **kwargs)
            if type == "h5mu":
                data = mudata.read(tmp, **kwargs)
            with contextlib.suppress(FileNotFoundError):
                os.remove(fpath)
            shutil.move(tmp, fpath)
    else:
        if type == "h5ad":
            data = ad.read_h5ad(filename=fpath, **kwargs)
        else:
            raise NotImplementedError("MuData download only available with `force_download=True`.")

    if data.shape != expected_shape:
        data_str = "MuData" if type == "h5mu" else "AnnData"
        raise ValueError(f"Expected {data_str} object to have shape `{expected_shape}`, found `{data.shape}`.")

    return data

but I think the problem with the tests are still a different issue

@MUCDK
Copy link
Collaborator Author

MUCDK commented Sep 9, 2024

Yeah true, can we adapt it such that it's used again? It used to be passed to scanpy.read(), but force_download is not supported by mudata

@selmanozleyen
Copy link
Collaborator

@MUCDK I used scanpy's download function (it does more checks and better error handling) and it seems to work. However I think due to MuData addition pytest 3.9 tests are failing (MuData is trying to import a class that is removed from AnnData AlignedViewMixin). Also the mypy errors are also in main branch.

@MUCDK
Copy link
Collaborator Author

MUCDK commented Sep 9, 2024

Yeah this is why I changed the implementation.

I actually opened an issue, shouuld we continue on that side? wdyt? scverse/mudata#76

@selmanozleyen
Copy link
Collaborator

selmanozleyen commented Sep 9, 2024

What I meant is I actually download with scanpy.readwrite._check_datafile_present_and_download because it seems like it handles url requests better. It works for the notebooks and the tests above 3.10.

I think the CI got an update and now the MuData doesn't support AnnData in python 3.9. I will check the versions if they fit. But I agree that this should be added to MuData so that it has a similar interface to scanpy

Update: Yes, it turns out now importing MuData is enough for 3.9 tests to fail. I ran the tests again to a commit that passed the tests before my changes https://github.com/theislab/moscot/actions/runs/10558788448/job/29875691253

@selmanozleyen
Copy link
Collaborator

@MUCDK I also realized that sparseargument was being ignored. Because scanpy doesn't have that option and it just ignores it.

@MUCDK
Copy link
Collaborator Author

MUCDK commented Sep 9, 2024

we can deprecate python=3.9, then it should work ,right?

@MUCDK MUCDK closed this Sep 9, 2024
@MUCDK MUCDK reopened this Sep 9, 2024
@selmanozleyen
Copy link
Collaborator

Yes, I checked MuData commits and thats the case.

@MUCDK
Copy link
Collaborator Author

MUCDK commented Sep 9, 2024

ok then let's deprecate python 3.9, was planning to do that anyways, see #740

@selmanozleyen selmanozleyen merged commit b516bd9 into main Sep 12, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants