Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dataset file handling tests in test_datasets.py::test_file_handling #280

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

christian-monch
Copy link
Contributor

This PR modifies test_datasets.py::test_file_handling to be able to test with remote paths that contain directories.

There are still possible improvements: support files with non ".txt" suffix. Those are currently not supported because replace file fails due to "changing file types".

@adswa
Copy link
Member

adswa commented Mar 17, 2023

I ran the test that fails on windows locally because Appveyor is booked out, and saw this:

====================================================== FAILURES =======================================================
_________________________________________________ test_file_handling __________________________________________________

tmp_path = WindowsPath('C:/Users/adina/AppData/Local/Temp/pytest-of-adina/pytest-20/test_file_handling0')
dataverse_admin_api = <pyDataverse.api.NativeApi object at 0x0000023D2D1B69B0>
dataverse_dataaccess_api = <pyDataverse.api.DataAccessApi object at 0x0000023D2D1B63B0>
dataverse_dataset = 'doi:10.70122/FK2/FHE2XK'

    def test_file_handling(
            tmp_path,
            dataverse_admin_api,
            dataverse_dataaccess_api,
            dataverse_dataset,
    ):
        odd = ODD(dataverse_admin_api, dataverse_dataset)

        paths = (
            tmp_path / '.Ö-dot-Ö-in-front' / '!-üö-.txt',
            tmp_path / '.dot-in-front' / 'c1.txt',
            tmp_path / ' space-in-front' / 'c2.txt',
            tmp_path / '-minus-in-front' / 'c3.txt',
            tmp_path / 'Ö-in-front' / 'überflüssiger_fuß.txt',
            tmp_path / ' Ö-space-Ö-in-front' / 'a.txt',
            tmp_path / 'dummy.txt',
        )

        path_info = dict()
        for path in paths:
            if path.parent != tmp_path:
                path.parent.mkdir()
            relative_path = path.relative_to(tmp_path)
            fcontent = 'content of: ' + str(relative_path)
            path.write_text(fcontent)
            src_md5 = md5sum(path)
            fileid = check_upload(odd, fcontent, relative_path, tmp_path, src_md5)
            path_info[path] = (src_md5, fileid)

        for path, (src_md5, fileid) in path_info.items():
            relative_path = path.relative_to(tmp_path)
            check_download(odd, fileid, tmp_path / 'downloaded.txt', src_md5)
            check_file_metadata_update(
                dataverse_admin_api, dataverse_dataset, odd, fileid, path
            )
            path_to_check = relative_path.parent / 'mykey'
            fileid = check_replace_file(odd, fileid, path_to_check, tmp_path)
>           check_rename_file(
                odd, fileid, name=str(relative_path.parent / ('ren' + [path.name](http://path.name/)))
            )

datalad_dataverse\tests\test_dataset.py:53:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
datalad_dataverse\tests\test_dataset.py:64: in check_rename_file
    odd.rename_file(new_path, fileid)
datalad_dataverse\[dataset.py:280](http://dataset.py:280/): in rename_file
    response.raise_for_status()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Response [500]>

    def raise_for_status(self):
        """Raises :class:`HTTPError`, if one occurred."""

        http_error_msg = ""
        if isinstance(self.reason, bytes):
            # We attempt to decode utf-8 first because some servers
            # choose to localize their reason strings. If the string
            # isn't utf-8, we fall back to iso-8859-1 for all other
            # encodings. (See PR #3538)
            try:
                reason = self.reason.decode("utf-8")
            except UnicodeDecodeError:
                reason = self.reason.decode("iso-8859-1")
        else:
            reason = self.reason

        if 400 <= self.status_code < 500:
            http_error_msg = (
                f"{self.status_code} Client Error: {reason} for url: {self.url}"
            )

        elif 500 <= self.status_code < 600:
            http_error_msg = (
                f"{self.status_code} Server Error: {reason} for url: {self.url}"
            )

        if http_error_msg:
>           raise HTTPError(http_error_msg, response=self)
E           requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://demo.dataverse.org/api/v1/files/2049179/metadata

..\..\env\gooyey\lib\site-packages\requests\[models.py:1021](http://models.py:1021/): HTTPError
================================================== warnings summary ===================================================
..\..\env\gooyey\lib\site-packages\_pytest\config\__init__.py:747
  C:\Users\adina\env\gooyey\lib\site-packages\_pytest\config\__init__.py:747: PytestAssertRewriteWarning: Module already imported so cannot be rewritten: datalad_next.tests.fixtures
    self.import_plugin(import_spec)

..\..\env\gooyey\lib\site-packages\boto\[plugin.py:40](http://plugin.py:40/)
  C:\Users\adina\env\gooyey\lib\site-packages\boto\[plugin.py:40](http://plugin.py:40/): DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
    import imp

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

This commit modifies
test_datasets.test_file_handling to
be able to test with remote paths that
contain directories.

There is still a possible improvment:
support files with non ".txt" suffix.
Those are currently not supported
because replace file fails due to
"changing file types".
co-authored by @adswa, @bpoldrack
Use suggestions by Ben and Adina to
improve the code

Rebased on main
@christian-monch christian-monch force-pushed the tests-with-directories branch from 1e954de to fae7605 Compare March 17, 2023 20:11
@christian-monch christian-monch requested a review from adswa March 20, 2023 10:31
@codecov-commenter
Copy link

codecov-commenter commented Oct 5, 2023

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (ffc3908) 93.78% compared to head (0874e88) 93.81%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #280      +/-   ##
==========================================
+ Coverage   93.78%   93.81%   +0.03%     
==========================================
  Files          17       17              
  Lines         965      954      -11     
==========================================
- Hits          905      895      -10     
+ Misses         60       59       -1     
Files Coverage Δ
datalad_dataverse/tests/test_dataset.py 100.00% <100.00%> (+0.81%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@adswa adswa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR is a bit dated and seemingly got lost after the development sprint, but I rebased it and all the tests pass, and I can't see anything wrong with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants