-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added info about "Rows count" and "Data loading times" #929
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
e163497
Added info about "Rows count" and "Data loading times"
dat-a-man ef8eb96
Added info about "Rows count" and "Data loading times"
dat-a-man 57b18d9
Added info about "Rows count" and "Data loading times"
dat-a-man 3dafdd9
Added info about "Rows count" and "Data loading times"
dat-a-man a751d75
Update
dat-a-man 01de058
Update
dat-a-man a23e4ad
Update
dat-a-man 55f5f9f
Updated table
dat-a-man e4ac3ef
Updated table
dat-a-man dc95e6b
Updated
dat-a-man b98e542
Merge branch 'master' into docs/run_monitoring_update
adrianbr File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -60,3 +60,60 @@ charts and time-series charts that provide a baseline or a pattern that a person | |
|
||
For example, to monitor data loading, consider plotting "count of records by `loaded_at` date/hour", | ||
"created at", "modified at", or other recency markers. | ||
|
||
### Rows count | ||
To find the number of rows loaded per table, use the following command: | ||
|
||
```shell | ||
dlt pipeline <pipeline_name> trace | ||
``` | ||
|
||
This command will display the names of the tables that were loaded and the number of rows in each table. | ||
The above command provides the row count for the Chess source. As shown below: | ||
|
||
```shell | ||
Step normalize COMPLETED in 2.37 seconds. | ||
Normalized data for the following tables: | ||
- _dlt_pipeline_state: 1 row(s) | ||
- payments: 1329 row(s) | ||
- tickets: 1492 row(s) | ||
- orders: 2940 row(s) | ||
- shipment: 2382 row(s) | ||
- retailers: 1342 row(s) | ||
``` | ||
|
||
To load these info back to the destination you can use the following: | ||
```python | ||
# Create a pipeline with the specified name, destination, and dataset | ||
# Run the pipeline | ||
|
||
# Get the trace of the last run of the pipeline | ||
# The trace contains timing information on extract, normalize, and load steps | ||
trace = pipeline.last_trace | ||
|
||
# Load the trace information into a table named "_trace" in the destination | ||
pipeline.run([trace], table_name="_trace") | ||
``` | ||
This process loads several additional tables to the destination, which provide insights into | ||
the extract, normalize, and load steps. Information on the number of rows loaded for each table, | ||
along with the `load_id`, can be found in the `_trace__steps__extract_info__table_metrics` table. | ||
The `load_id` is an epoch timestamp that indicates when the loading was completed. Here's graphical | ||
representation of the rows loaded with `load_id` for different tables: | ||
|
||
 | ||
|
||
### Data load time | ||
Data loading time for each table can be obtained by using the following command: | ||
|
||
```shell | ||
dlt pipeline <pipeline_name> load-package | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it possible to get this info with python? may be from loading info? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
``` | ||
|
||
The above information can also be obtained from the script as follows: | ||
|
||
```python | ||
info = pipeline.run(source, table_name="table_name", write_disposition='append') | ||
|
||
print(info.load_packages[0]) | ||
``` | ||
> `load_packages[0]` will print the information of the first load package in the list of load packages. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this info would fit better in
## Data monitoring
sectionThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay