Skip to content

Commit 4126543

Browse files
committed
Add basic description of the on-disk report structure
1 parent fb2da5f commit 4126543

File tree

1 file changed

+143
-0
lines changed

1 file changed

+143
-0
lines changed
Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
# PODNAME: Devel::StatProfiler::ReportStructure - developer documentation for aggregation classes
2+
3+
=head1 DESCRIPTION
4+
5+
B<Developer documentation for aggregation classes>.
6+
7+
=head1 ON-DISK LAYOUT
8+
9+
Multiple aggregated reports for a single code release are stored under
10+
a single directory. Unless specified, all files are Sereal blobs.
11+
12+
The HTML report generator assumes to be able to fetch the source code
13+
for files. There is support for reading files directly from disk or
14+
for fetching them from a local git clone (it uses C<git cat-file>, so
15+
it can be a bare clone). The file contents need to match the source
16+
code that was running while collecting profiling data.
17+
18+
The aggregate structure for a single code release report is:
19+
20+
<release id>/
21+
# eval source code
22+
__source__/
23+
# common state
24+
__state__/
25+
generalogy.<shard id>
26+
last_sample.<shard id>
27+
metadata.<shard id>
28+
shard.<shard id>
29+
sourcemap.<shard id>
30+
source.<shard id>
31+
processed.<process id>.<shard id>
32+
# first aggregation id
33+
aggregate1/
34+
metadata.<shard id>
35+
report.<timebox1>.<shard id>
36+
report.<timebox2>.<shard id>
37+
# second aggregation id
38+
aggregate2/
39+
...
40+
41+
=over 4
42+
43+
=item C<< release id >>
44+
45+
an arbitrary user-provided identifier, for example a Git commit/tag.
46+
47+
=item C<< shard id >>
48+
49+
an arbitrary identifier, for example an host name. Files should be
50+
written from a single aggregation host, and will be merged together to
51+
generate the HTML report.
52+
53+
=item C<< timebox >>
54+
55+
a number of seconds since the epoch, old timeboxed data can be deleted
56+
at user's discretion.
57+
58+
=back
59+
60+
=head2 Aggregate directory
61+
62+
Many of the files below contain refernces to source file/line numbers.
63+
64+
All line numbers are logical line numbers (the ones reported by
65+
C<warn()>/C<die()>); those generally match physical line numbers,
66+
except in the presence of C<#line> directives.
67+
68+
Source files of the form C<eval:HASH> refer to the eval source code
69+
having MD5 hash C<HASH>. There should never be eval references of the
70+
form C<(eval 123)>.
71+
72+
All other source file references are logical source files (the ones
73+
reported by C<warn()>/C<die()>); those generally match physical line
74+
numbers, except in the presence of C<#line> directives.
75+
76+
Generated reports contain an entry for each physical file, so there is
77+
code in the report generator to piece together multiple logical
78+
reports into a merged report for a single physical file.
79+
80+
=head2 Report file(s)
81+
82+
The aggregated profiling data, composed mainly of a map from logical
83+
file names to the per-line count of exclusive/inclusive samples and a
84+
map from subroutines to call sites and callees.
85+
86+
This is the main data used to generate the HTML report.
87+
88+
=head3 Metadata file(s)
89+
90+
Currently only contains the number of samples aggregated into the
91+
corresponding report file.
92+
93+
=head2 State directory
94+
95+
=head3 Shard file(s)
96+
97+
Empty flag files, a quicko way of enumerating the shards ids.
98+
99+
=head3 Metadata file(s)
100+
101+
User-provided metadata keys, added to the reports using
102+
C<set_global_metadata> and C<write_custom_metadata>.
103+
104+
=head3 Processed file(s)
105+
106+
State of C<Devel::StatProfiler::SectionChangeReader>, saved when the
107+
profile data has been split to multiple files and not all files have
108+
been processed yet.
109+
110+
=head3 Last sample file(s)
111+
112+
Tracks the time at which the last file for a given process id was
113+
processed. Used to clean up the processing state for
114+
C<Devel::StatProfiler::SectionChangeReader>.
115+
116+
=head3 Genealogy file(s)
117+
118+
Tracks the parent-child relationship between process ids, used to map
119+
the eval id (e.g. C<(eval 123)>) to the corresponding source code.
120+
121+
=head3 Source map file(s)
122+
123+
Information about C<#line> directives contained in eval source code,
124+
used to map a lines as reported in the profile to source code lines
125+
used during rendering.
126+
127+
For non-eval source code, the corresponding information is parsed from
128+
the source code files on disk.
129+
130+
=head3 Source file(s)
131+
132+
Maps process ids into a list of evals that were seen by that process,
133+
and each eval and the hash of the source code. The source code hash
134+
can be used to to fetch the actual eval source code, and more
135+
importantly to merge profiling data from multiple independent evals.
136+
137+
=head2 Source directory
138+
139+
Contains a file for each C<eval STRING>; the file is named after the
140+
MD5 hash of the source code and stored in a 2-level deep directory
141+
structure. Files are just source code (not Sereal blobs).
142+
143+
=cut

0 commit comments

Comments
 (0)