Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve usability of Directory datatype #17614

Draft
wants to merge 15 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions lib/galaxy/config/sample/datatypes_conf.xml.sample
Original file line number Diff line number Diff line change
Expand Up @@ -288,13 +288,14 @@
<display file="qiime/qiime2/q2view.xml"/>
</datatype>
<datatype extension='qiime2.tabular' type="galaxy.datatypes.qiime2:QIIME2Metadata" display_in_upload="true"/>
<datatype extension="zip" type="galaxy.datatypes.binary:CompressedZipArchive" display_in_upload="true"/>
<datatype extension="ncbi_genome_dataset.zip" type="galaxy.datatypes.binary:CompressedZipArchive" subclass="true" display_in_upload="true"/>
<datatype extension="tar" type="galaxy.datatypes.binary:CompressedArchive" subclass="true" display_in_upload="true">
<converter file="tar_to_directory.xml" target_datatype="directory"/>
<datatype extension="zip" type="galaxy.datatypes.binary:CompressedZipArchive" display_in_upload="true">
<converter file="archive_to_directory.xml" target_datatype="directory" />
</datatype>
<datatype extension="directory" type="galaxy.datatypes.data:Directory">
<datatype extension="ncbi_genome_dataset.zip" type="galaxy.datatypes.binary:CompressedZipArchive" subclass="true" display_in_upload="true"/>
<datatype extension="tar" auto_compressed_types="gz,bz2" type="galaxy.datatypes.binary:CompressedArchive" subclass="true" display_in_upload="true">
<converter file="archive_to_directory.xml" target_datatype="directory"/>
</datatype>
<datatype extension="directory" type="galaxy.datatypes.data:Directory"/>
<datatype extension="yaml" type="galaxy.datatypes.text:Yaml" display_in_upload="true" />
<!-- Proteomics Datatypes -->
<datatype extension="mrm" type="galaxy.datatypes.tabular:Tabular" display_in_upload="true" subclass="true"/>
Expand Down
37 changes: 37 additions & 0 deletions lib/galaxy/datatypes/converters/archive_to_directory.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
<tool id="CONVERTER_archive_to_directory" name="Unpack archive to directory" version="1.0.0" profile="21.09">
<!-- Use compression_utils instead of shell commands (tar/unzip) so we can verify safety of results -->
<requirements>
<requirement type="package" version="23.2.1">galaxy-util</requirement>
</requirements>
<command><![CDATA[
mkdir '$output1.files_path' &&
cd '$output1.files_path' &&
python -c "from galaxy.util.compression_utils import CompressedFile; CompressedFile('$input1').extract('.')"
]]></command>
<configfiles>
<configfile filename="metadata_json"><![CDATA[{
"output1": {
"name": "$input1.name unpacked to $__target_datatype__",
"ext": "$__target_datatype__"
}
}]]></configfile>
davelopez marked this conversation as resolved.
Show resolved Hide resolved
</configfiles>
<inputs>
<param format="tar,zip" name="input1" type="data"/>
wm75 marked this conversation as resolved.
Show resolved Hide resolved
<param name="__target_datatype__" type="select" label="Target data type">
<option value="directory">directory</option>
</param>
</inputs>
<outputs provided_metadata_file="metadata_json">
<data format="auto" name="output1"/>
davelopez marked this conversation as resolved.
Show resolved Hide resolved
</outputs>
<tests>
<test>
<param name="input1" ftype="tar" value="testdir1.tar"/>
<param name="__target_datatype__" value="directory"/>
<output name="output1" ftype="directory" value="testdir1.tar.directory"/>
</test>
</tests>
<help>
</help>
</tool>
2 changes: 1 addition & 1 deletion lib/galaxy/datatypes/converters/tar_to_directory.xml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<tool id="CONVERTER_tar_to_directory" name="Convert tar to directory" version="1.0.1" profile="17.05">
<!-- Don't use tar directly so we can verify safety of results - tar -xzf '$input1'; -->
<requirements>
<requirement type="package" version="23.1">galaxy-util</requirement>
<requirement type="package" version="23.2.1">galaxy-util</requirement>
</requirements>
<command>
mkdir '$output1.files_path';
Expand Down
12 changes: 12 additions & 0 deletions lib/galaxy/datatypes/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -1212,6 +1212,18 @@ def regex_line_dataprovider(
class Directory(Data):
mvdbeek marked this conversation as resolved.
Show resolved Hide resolved
"""Class representing a directory of files."""

file_ext = "directory"

def _archive_main_file(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for roundtripping a directory via the API ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can try!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test, but I'm not sure why I'm getting ModuleNotFoundError: No module named 'galaxy' when running the converter tool... using the UI seems to work fine 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the API tests are setting up dependency resolution. I am mostly interested in verifying that the structure of the tar archive is the same pre-and-post upload. In fact I think even the checksum should match if we're not compressing the archive. In either case if you upload a tar file and download it again that should be sufficient. The converter is tested in the test framework.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see, so you mean something like this fc7e959#diff-a6ab1700bcef9e1585a2bb0f84e8888470a770fb81c3e0337930e7cad573093fR662

I'll do that 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this:

    def test_fetch_directory(self, history_id):
        testdir = TestDataResolver().get_filename("testdir.tar")
        with open(testdir, "rb") as fh:
            details = self._upload_and_get_details(
                fh, api="fetch", history_id=history_id, ext="directory", assert_ok=True
            )
        assert details["file_ext"] == "directory"
        assert details["file_size"] == 3584
        content = self.dataset_populator.get_history_dataset_content(
            history_id, dataset=details, to_ext="directory", type="bytes"
        )
        dir_path = decompress_bytes_to_directory(cast(bytes, content))
        assert dir_path.endswith("testdir")
        for path, entry_class in EXPECTED_CONTENTS.items():
            path = os.path.join(dir_path, os.path.pardir, path)
            if entry_class == "Directory":
                assert os.path.isdir(path)
            else:
                assert os.path.isfile(path)

But if I don't run the converter manually instead of the to_ext="directory" the extra_files_path is empty, I guess that is why you have more changes persisting extra files in the object store in your referenced branch dev...mvdbeek:galaxy:directory_datatype_improvements#diff-8640d91ef47bca302b00039012979f4b1b79f5dbffbe2431bc9a05f19fb4c7d0R132

Should we merge your branch instead? Is something still missing in your branch or should that be how to do it?

Sorry, I'm a bit lost 😅

self, archive: ZipstreamWrapper, display_name: str, data_filename: str
) -> Tuple[bool, str, str]:
"""Overwrites the method to not do anything.

No main file gets added to a directory archive.
"""
error, msg, messagetype = False, "", ""
return error, msg, messagetype


class GenericAsn1(Text):
"""Class for generic ASN.1 text format"""
Expand Down
Loading