Skip to content

Commit 9404f1a

Browse files
authored
Added info about "Rows count" and "Data loading times" (#929)
1 parent 40e6a89 commit 9404f1a

File tree

1 file changed

+57
-0
lines changed

1 file changed

+57
-0
lines changed

docs/website/docs/running-in-production/monitoring.md

+57
Original file line numberDiff line numberDiff line change
@@ -60,3 +60,60 @@ charts and time-series charts that provide a baseline or a pattern that a person
6060

6161
For example, to monitor data loading, consider plotting "count of records by `loaded_at` date/hour",
6262
"created at", "modified at", or other recency markers.
63+
64+
### Rows count
65+
To find the number of rows loaded per table, use the following command:
66+
67+
```shell
68+
dlt pipeline <pipeline_name> trace
69+
```
70+
71+
This command will display the names of the tables that were loaded and the number of rows in each table.
72+
The above command provides the row count for the Chess source. As shown below:
73+
74+
```shell
75+
Step normalize COMPLETED in 2.37 seconds.
76+
Normalized data for the following tables:
77+
- _dlt_pipeline_state: 1 row(s)
78+
- payments: 1329 row(s)
79+
- tickets: 1492 row(s)
80+
- orders: 2940 row(s)
81+
- shipment: 2382 row(s)
82+
- retailers: 1342 row(s)
83+
```
84+
85+
To load these info back to the destination you can use the following:
86+
```python
87+
# Create a pipeline with the specified name, destination, and dataset
88+
# Run the pipeline
89+
90+
# Get the trace of the last run of the pipeline
91+
# The trace contains timing information on extract, normalize, and load steps
92+
trace = pipeline.last_trace
93+
94+
# Load the trace information into a table named "_trace" in the destination
95+
pipeline.run([trace], table_name="_trace")
96+
```
97+
This process loads several additional tables to the destination, which provide insights into
98+
the extract, normalize, and load steps. Information on the number of rows loaded for each table,
99+
along with the `load_id`, can be found in the `_trace__steps__extract_info__table_metrics` table.
100+
The `load_id` is an epoch timestamp that indicates when the loading was completed. Here's graphical
101+
representation of the rows loaded with `load_id` for different tables:
102+
103+
![image](https://storage.googleapis.com/dlt-blog-images/docs_monitoring_count_of_rows_vs_load_id.jpg)
104+
105+
### Data load time
106+
Data loading time for each table can be obtained by using the following command:
107+
108+
```shell
109+
dlt pipeline <pipeline_name> load-package
110+
```
111+
112+
The above information can also be obtained from the script as follows:
113+
114+
```python
115+
info = pipeline.run(source, table_name="table_name", write_disposition='append')
116+
117+
print(info.load_packages[0])
118+
```
119+
> `load_packages[0]` will print the information of the first load package in the list of load packages.

0 commit comments

Comments
 (0)