Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement class-based readers/writers #176

Open
14 tasks
dalonsoa opened this issue Dec 20, 2024 · 0 comments
Open
14 tasks

Implement class-based readers/writers #176

dalonsoa opened this issue Dec 20, 2024 · 0 comments
Labels
complex This issue is a complex one that might require some thought enhancement New feature or request umbrella Issue made of other issues
Milestone

Comments

@dalonsoa
Copy link
Collaborator

dalonsoa commented Dec 20, 2024

Problem

Currently, each reader/writer need to read the header, extract the information it needs from there, and then read the data - and in the future, validate it (see #14). That is a lot of repetition of the same code when each reader/writer should only worry about the actual read/write of the data. It also makes harder to implement common functionality across all readers/writers, should it be necessary, as the code would need to be updated in multiple places.

Solution

Move to class-based readers where all the common functionality can be packed together in a base class and where each reader/writer only needs to implement the specific reading action.

It could look something like the following, sticking the whole reading process in one go in the __init__ method, as if it were a function:

class BaseReader(ABC):

    def __init__(...):
        ...
        with filename.open(...) as f:
            self._read_header(f)
            self._read_data(f)

        self._validate_data()

    @abstractmethod
    def _read_data(...):
        # Read the actual tabular data. The main method to be overridden

    def _read_header(...):
        # Read the header and extract any common metadata

    def _validate_data(...):
        # Validate the data. Might be overridden for each format

    @property
    def data(...) -> Any:
        # Return the data. Subclasses might override this to return a specific type

    @property
    def metadata(...) -> Dict[str, Any]:
        # Return the metadata.

And similar for the writers. This has the advantage that the same file object is used for both the header and the data, as suggested in #28 . Actually adding validation is a task for #14 .

Addressing this PR will require multiple issues - and multiple PR - so convert the following in issues as needed:

Readers

  • Write base class for readers
  • Write specific subclass for reading into lists
  • Write specific subclass for reading into dicts
  • Write specific subclass for reading into numpy arrays
  • Write specific subclass for reading into pandas dataframes
  • Write specific subclass for reading into polars dataframes
  • [New] Write specific subclass for reading into iterators

Writers

This should be compatible with the existing Writer class, which will need updating.

  • Write base class for writers
  • Write specific subclass for writing lists
  • Write specific subclass for writing dicts
  • Write specific subclass for writing numpy arrays
  • Write specific subclass for writing pandas dataframes
  • Write specific subclass for writing polars dataframes
  • Write specific subclass for writing incrementally (replaces current class for that)

As the class-based readers are implemented, the existing functions should be replaced and include a deprecation warning. Eg:

def read_to_array(...):
    raise FutureWarning(...) # Mention version 2.0.0, for example
    reader = ArrayReader(...)
    return reader.data, reader.metadata

So we keep the same API, but internally we use the class-based approach.

@dalonsoa dalonsoa added enhancement New feature or request complex This issue is a complex one that might require some thought umbrella Issue made of other issues labels Dec 20, 2024
@dalonsoa dalonsoa changed the title Implement class-based readers Implement class-based readers/writers Dec 20, 2024
@dalonsoa dalonsoa added this to the v1.0.0 milestone Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complex This issue is a complex one that might require some thought enhancement New feature or request umbrella Issue made of other issues
Projects
None yet
Development

No branches or pull requests

1 participant