-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allows to run parallel pipelines in separate threads #813
Changes from all commits
4eae3bd
0e7703c
1ce3098
5e882e3
148e93c
aba0db6
1d21147
795baaa
5ab7a58
edf98c9
9055af4
7ebbe00
37ba925
a7c9543
910bfa0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
import abc | ||
from dataclasses import dataclass | ||
from typing import IO, TYPE_CHECKING, Any, Dict, List, Optional, Sequence, Type, Union | ||
from typing import IO, TYPE_CHECKING, Any, Dict, List, Optional, Sequence, Type, NamedTuple | ||
|
||
from dlt.common import json | ||
from dlt.common.configuration import configspec, known_sections, with_config | ||
|
@@ -23,6 +23,12 @@ class TFileFormatSpec: | |
supports_compression: bool = False | ||
|
||
|
||
class DataWriterMetrics(NamedTuple): | ||
file_path: str | ||
items_count: int | ||
file_size: int | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe column count? but that is not really important tbh. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I plan to add elapsed time (start stop). Column count is not known at this moment There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not during extract. but it is known during normalize. you can however get the column count from the relevant schema... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes elapsed would be cool too! |
||
|
||
|
||
class DataWriter(abc.ABC): | ||
def __init__(self, f: IO[Any], caps: DestinationCapabilitiesContext = None) -> None: | ||
self._f = f | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't you use pythons thread local context to do all this? https://docs.python.org/3/library/threading.html#thread-local-data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I know it but when you look at the code, there are exceptions to that behavior.
so yeah I could use
local()
but there are exceptions so I'd need to keep more dictionaries. or you can force the thread id for local()?