Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add resolve files #310

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions src/datachain/lib/dc.py
Original file line number Diff line number Diff line change
Expand Up @@ -1689,3 +1689,19 @@
def offset(self, offset: int) -> "Self":
"""Return the results starting with the offset row."""
return super().offset(offset)

def resolve_files(self, signal: str = "file") -> "Self":
"""Check if the file object is valid and update its is_valid field."""

def check_file(file: File) -> File:

Check warning on line 1696 in src/datachain/lib/dc.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/dc.py#L1696

Added line #L1696 was not covered by tests
if not isinstance(file, File):
raise TypeError(f"Signal '{signal}' is not a File object")

Check warning on line 1698 in src/datachain/lib/dc.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/dc.py#L1698

Added line #L1698 was not covered by tests

try:

Check warning on line 1700 in src/datachain/lib/dc.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/dc.py#L1700

Added line #L1700 was not covered by tests
with file.open():
file.is_valid = True
except OSError:
file.is_valid = False
return file

Check warning on line 1705 in src/datachain/lib/dc.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/dc.py#L1702-L1705

Added lines #L1702 - L1705 were not covered by tests

return self.map(**{signal: check_file})

Check warning on line 1707 in src/datachain/lib/dc.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/dc.py#L1707

Added line #L1707 was not covered by tests
4 changes: 4 additions & 0 deletions src/datachain/lib/file.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@
last_modified: datetime = Field(default=TIME_ZERO)
location: Optional[Union[dict, list[dict]]] = Field(default=None)
vtype: str = Field(default="")
is_valid: Optional[bool] = Field(default=None)

_datachain_column_types: ClassVar[dict[str, Any]] = {
"source": String,
Expand Down Expand Up @@ -191,6 +192,9 @@
@contextmanager
def open(self, mode: Literal["rb", "r"] = "rb"):
"""Open the file and return a file object."""
if self.is_valid is False:
raise FileNotFoundError(f"File {self.path} is not valid")

Check warning on line 196 in src/datachain/lib/file.py

View check run for this annotation

Codecov / codecov/patch

src/datachain/lib/file.py#L196

Added line #L196 was not covered by tests

if self.location:
with VFileRegistry.resolve(self, self.location) as f: # type: ignore[arg-type]
yield f
Expand Down
Loading