Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add table schema validator #125

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci_template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ on:
type: string
python-version:
description: 'Python version'
default: '["3.9", "3.12"]'
default: '["3.10", "3.13"]'
type: string

jobs:
Expand Down Expand Up @@ -38,6 +38,6 @@ jobs:
run: poetry run pytest

- name: Upload coverage to Codecov
if: success() && (matrix.os == 'ubuntu-latest' && matrix.python-version == 3.9)
if: success() && (matrix.os == 'ubuntu-latest' && matrix.python-version == 3.10)
uses: codecov/codecov-action@v5

4 changes: 2 additions & 2 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
uses: ./.github/workflows/ci_template.yml
with:
os: '["ubuntu-latest", "windows-latest", "macos-latest"]'
python-version: '["3.9", "3.10", "3.11", "3.12"]'
python-version: '["3.10", "3.11", "3.12", "3.13"]'

build-wheel:
needs: test
Expand All @@ -37,7 +37,7 @@ jobs:

- uses: actions/setup-python@v5
with:
python-version: 3.9
python-version: 3.10

- name: Install Poetry
uses: abatilo/[email protected]
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ repos:
rev: "v1.13.0"
hooks:
- id: mypy
additional_dependencies: [types-PyYAML]
additional_dependencies: [types-PyYAML, pydantic]
- repo: https://github.com/igorshubovych/markdownlint-cli
rev: v0.43.0
hooks:
Expand Down
65 changes: 65 additions & 0 deletions csvy/validators/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
"""Validators for the CSVY format."""

from collections.abc import Mapping
from typing import Any

from pydantic import BaseModel

from .csv_dialect import CSVDialectValidator # noqa: F401
from .registry import VALIDATORS_REGISTRY, register_validator # noqa: F401
from .table_schema import SchemaValidator # noqa: F401


def validate_header(header: dict[str, Any]) -> dict[str, Any]:
"""Run the validators on the header.

This function runs the validators on the header. It uses the keys of the header to
find the validators in the registry and runs them on the corresponding values. As
a result, some values in the header may be replaced by the validated values in the
form of Pydantic models.

If the header is an already validated header, the Pydantic models within, if any,
are dumped to dictionaries and re-validated, again. This accounts for the case where
attributes of the Pydantic models are changed to invalid values.

Args:
header: The header of the CSVY file.

Returns:
The validated header.

"""
validated_header: dict[str, Any] = {}
for key, value in header.items():
value_ = value.model_dump() if isinstance(value, BaseModel) else value
if key in VALIDATORS_REGISTRY:
if not isinstance(value_, Mapping):
raise TypeError(
f"Value for '{key}' must be a mapping, not a '{type(value_)}'."
)
validator = VALIDATORS_REGISTRY[key]
validated_header[key] = validator(**value_)
else:
validated_header[key] = value_
return validated_header


def header_to_dict(header: dict[str, Any]) -> dict[str, Any]:
"""Transform the header into a serializable dictionary.

Transforms the header with validators to a header with dictionaries that can be
saved as yaml.

Args:
header: Dictionary to be saved as the header of the CSVY file.

Returns:
The validated header, as a serializable dictionary.

"""
validated_header = {}
for key, value in header.items():
validated_header[key] = (
value.model_dump() if isinstance(value, BaseModel) else value
)
return validated_header
98 changes: 6 additions & 92 deletions csvy/validators.py → csvy/validators/csv_dialect.py
Original file line number Diff line number Diff line change
@@ -1,99 +1,13 @@
"""Module that contains validators for the CSVY file format."""
"""CSV Dialect-related validation."""

from __future__ import annotations

import csv
from collections.abc import Mapping
from typing import Any, Callable, Optional, TypeVar
from typing import TypeVar

from pydantic import BaseModel, Field

VALIDATORS_REGISTRY: dict[str, type[BaseModel]] = {}
"""Registry of validators to run on the header."""


def register_validator(
name: str, overwrite: bool = False
) -> Callable[[type[BaseModel]], type[BaseModel]]:
"""Register a validator in the registry.
This function is a decorator that registers a validator in the registry. The name
of the validator is used as the key in the registry.
Args:
name: The name of the validator.
overwrite: Whether to overwrite the validator if it already exists.
Returns:
The decorator function that registers the validator.
"""

def decorator(cls: type[BaseModel]) -> type[BaseModel]:
if not issubclass(cls, BaseModel):
raise TypeError("Validators must be subclasses of pydantic.BaseModel.")

if name in VALIDATORS_REGISTRY and not overwrite:
raise ValueError(f"Validator with name '{name}' already exists.")

VALIDATORS_REGISTRY[name] = cls
return cls

return decorator


def validate_header(header: dict[str, Any]) -> dict[str, Any]:
"""Run the validators on the header.
This function runs the validators on the header. It uses the keys of the header to
find the validators in the registry and runs them on the corresponding values. As
a result, some values in the header may be replaced by the validated values in the
form of Pydantic models.
If the header is an already validated header, the Pydantic models within, if any,
are dumped to dictionaries and re-validated, again. This accounts for the case where
attributes of the Pydantic models are changed to invalid values.
Args:
header: The header of the CSVY file.
Returns:
The validated header.
"""
validated_header: dict[str, Any] = {}
for key, value in header.items():
value_ = value.model_dump() if isinstance(value, BaseModel) else value
if key in VALIDATORS_REGISTRY:
if not isinstance(value_, Mapping):
raise TypeError(
f"Value for '{key}' must be a mapping, not a '{type(value_)}'."
)
validator = VALIDATORS_REGISTRY[key]
validated_header[key] = validator(**value_)
else:
validated_header[key] = value_
return validated_header


def header_to_dict(header: dict[str, Any]) -> dict[str, Any]:
"""Transform the header into a serializable dictionary.
Transforms the header with validators to a header with dictionaries that can be
saved as yaml.
Args:
header: Dictionary to be saved as the header of the CSVY file.
Returns:
The validated header, as a serializable dictionary.
"""
validated_header = {}
for key, value in header.items():
validated_header[key] = (
value.model_dump() if isinstance(value, BaseModel) else value
)
return validated_header

from .registry import register_validator

# Create a generic variable that can be 'Parent', or any subclass.
T = TypeVar("T", bound="CSVDialectValidator")
Expand Down Expand Up @@ -127,7 +41,7 @@ class CSVDialectValidator(BaseModel):

delimiter: str = Field(default=",")
doublequote: bool = Field(default=True)
escapechar: Optional[str] = Field(default=None)
escapechar: str | None = Field(default=None)
lineterminator: str = Field(default="\r\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be clearer to explicitly put the default value here, e.g.:

Suggested change
escapechar: str | None = Field(default=None)
escapechar: str = Field(default="\\")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the default accoriding to the specification is not set it https://specs.frictionlessdata.io/csv-dialect/#specification

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And now that I check the specification, I'm missing several fields... 😢

quotechar: str = Field(default='"')
skipinitialspace: bool = Field(default=False)
Expand Down
38 changes: 38 additions & 0 deletions csvy/validators/registry.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
"""Registry of validators to run on the header."""

from collections.abc import Callable

from pydantic import BaseModel

VALIDATORS_REGISTRY: dict[str, type[BaseModel]] = {}
"""Registry of validators to run on the header."""


def register_validator(
name: str, overwrite: bool = False
) -> Callable[[type[BaseModel]], type[BaseModel]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the decorator package for these situations. It also fixes up the type hints for decorated functions, which can otherwise be an issue.

I appreciate you may not want to add another dependency though!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've no problem with adding new dependencies, but I'm not convinced it adds much value in this case for just one, very simple decorator that just registers something and spits the same input.

"""Register a validator in the registry.

This function is a decorator that registers a validator in the registry. The name
of the validator is used as the key in the registry.

Args:
name: The name of the validator.
overwrite: Whether to overwrite the validator if it already exists.

Returns:
The decorator function that registers the validator.

"""

def decorator(cls: type[BaseModel]) -> type[BaseModel]:
if not issubclass(cls, BaseModel):
raise TypeError("Validators must be subclasses of pydantic.BaseModel.")

if name in VALIDATORS_REGISTRY and not overwrite:
raise ValueError(f"Validator with name '{name}' already exists.")

VALIDATORS_REGISTRY[name] = cls
return cls

return decorator
Loading
Loading