Skip to content

Commit

Permalink
update rnacentral variants name
Browse files Browse the repository at this point in the history
Signed-off-by: Zhiyuan Chen <[email protected]>
  • Loading branch information
ZhiyuanChen committed Oct 25, 2024
1 parent 0681c31 commit f09d83a
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 9 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,7 @@ def convert_dataset_(df: pd.DataFrame):

def convert_dataset(convert_config):
df = dl.load_pandas(convert_config.dataset_path)
fd = convert_dataset_(df)
save_dataset(convert_config, {"test": fd})
save_dataset(convert_config, convert_dataset_(df), filename="test.parquet")


class ConvertConfig(ConvertConfig_):
Expand Down
12 changes: 6 additions & 6 deletions multimolecule/datasets/rnacentral/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,14 +93,14 @@ This is an UNOFFICIAL release of the [RNAcentral](https://rnacentral.org) by the

## Variations

This dataset is available in five variants:
This dataset is available in five additional variants:

- [rnacentral](https://huggingface.co/datasets/multimolecule/rnacentral): The main RNAcentral dataset.
- [rnacentral-512](https://huggingface.co/datasets/multimolecule/rnacentral-1024): RNAcentral dataset with all sequences truncated to 512 nucleotides.
- [rnacentral-1024](https://huggingface.co/datasets/multimolecule/rnacentral-1024): RNAcentral dataset with all sequences truncated to 1024 nucleotides.
- [rnacentral-2048](https://huggingface.co/datasets/multimolecule/rnacentral-2048): RNAcentral dataset with all sequences truncated to 2048 nucleotides.
- [rnacentral-4096](https://huggingface.co/datasets/multimolecule/rnacentral-4096): RNAcentral dataset with all sequences truncated to 4096 nucleotides.
- [rnacentral-8192](https://huggingface.co/datasets/multimolecule/rnacentral-8192): RNAcentral dataset with all sequences truncated to 8192 nucleotides.
- [rnacentral.512](https://huggingface.co/datasets/multimolecule/rnacentral.512): RNAcentral dataset with all sequences truncated to 512 nucleotides.
- [rnacentral.1024](https://huggingface.co/datasets/multimolecule/rnacentral.1024): RNAcentral dataset with all sequences truncated to 1024 nucleotides.
- [rnacentral.2048](https://huggingface.co/datasets/multimolecule/rnacentral.2048): RNAcentral dataset with all sequences truncated to 2048 nucleotides.
- [rnacentral.4096](https://huggingface.co/datasets/multimolecule/rnacentral.4096): RNAcentral dataset with all sequences truncated to 4096 nucleotides.
- [rnacentral.8192](https://huggingface.co/datasets/multimolecule/rnacentral.8192): RNAcentral dataset with all sequences truncated to 8192 nucleotides.

## Derived Datasets

Expand Down
2 changes: 1 addition & 1 deletion multimolecule/datasets/rnacentral/rnacentral.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ class ConvertConfig(ConvertConfig_):

def post(self):
if self.max_seq_len is not None:
self.output_path = f"{self.output_path}-{self.max_seq_len}"
self.output_path = f"{self.output_path}.{self.max_seq_len}"
super().post()


Expand Down

0 comments on commit f09d83a

Please sign in to comment.