|
| 1 | +# PODNAME: Devel::StatProfiler::ReportStructure - developer documentation for aggregation classes |
| 2 | + |
| 3 | +=head1 DESCRIPTION |
| 4 | + |
| 5 | +B<Developer documentation for aggregation classes>. |
| 6 | + |
| 7 | +=head1 ON-DISK LAYOUT |
| 8 | + |
| 9 | +Multiple aggregated reports for a single code release are stored under |
| 10 | +a single directory. Unless specified, all files are Sereal blobs. |
| 11 | + |
| 12 | +The HTML report generator assumes to be able to fetch the source code |
| 13 | +for files. There is support for reading files directly from disk or |
| 14 | +for fetching them from a local git clone (it uses C<git cat-file>, so |
| 15 | +it can be a bare clone). The file contents need to match the source |
| 16 | +code that was running while collecting profiling data. |
| 17 | + |
| 18 | +The aggregate structure for a single code release report is: |
| 19 | + |
| 20 | + <release id>/ |
| 21 | + # eval source code |
| 22 | + __source__/ |
| 23 | + # common state |
| 24 | + __state__/ |
| 25 | + generalogy.<shard id> |
| 26 | + last_sample.<shard id> |
| 27 | + metadata.<shard id> |
| 28 | + shard.<shard id> |
| 29 | + sourcemap.<shard id> |
| 30 | + source.<shard id> |
| 31 | + processed.<process id>.<shard id> |
| 32 | + # first aggregation id |
| 33 | + aggregate1/ |
| 34 | + metadata.<shard id> |
| 35 | + report.<timebox1>.<shard id> |
| 36 | + report.<timebox2>.<shard id> |
| 37 | + # second aggregation id |
| 38 | + aggregate2/ |
| 39 | + ... |
| 40 | + |
| 41 | +=over 4 |
| 42 | + |
| 43 | +=item C<< release id >> |
| 44 | + |
| 45 | +an arbitrary user-provided identifier, for example a Git commit/tag. |
| 46 | + |
| 47 | +=item C<< shard id >> |
| 48 | + |
| 49 | +an arbitrary identifier, for example an host name. Files should be |
| 50 | +written from a single aggregation host, and will be merged together to |
| 51 | +generate the HTML report. |
| 52 | + |
| 53 | +=item C<< timebox >> |
| 54 | + |
| 55 | +a number of seconds since the epoch, old timeboxed data can be deleted |
| 56 | +at user's discretion. |
| 57 | + |
| 58 | +=back |
| 59 | + |
| 60 | +=head2 Aggregate directory |
| 61 | + |
| 62 | +Many of the files below contain refernces to source file/line numbers. |
| 63 | + |
| 64 | +All line numbers are logical line numbers (the ones reported by |
| 65 | +C<warn()>/C<die()>); those generally match physical line numbers, |
| 66 | +except in the presence of C<#line> directives. |
| 67 | + |
| 68 | +Source files of the form C<eval:HASH> refer to the eval source code |
| 69 | +having MD5 hash C<HASH>. There should never be eval references of the |
| 70 | +form C<(eval 123)>. |
| 71 | + |
| 72 | +All other source file references are logical source files (the ones |
| 73 | +reported by C<warn()>/C<die()>); those generally match physical line |
| 74 | +numbers, except in the presence of C<#line> directives. |
| 75 | + |
| 76 | +Generated reports contain an entry for each physical file, so there is |
| 77 | +code in the report generator to piece together multiple logical |
| 78 | +reports into a merged report for a single physical file. |
| 79 | + |
| 80 | +=head2 Report file(s) |
| 81 | + |
| 82 | +The aggregated profiling data, composed mainly of a map from logical |
| 83 | +file names to the per-line count of exclusive/inclusive samples and a |
| 84 | +map from subroutines to call sites and callees. |
| 85 | + |
| 86 | +This is the main data used to generate the HTML report. |
| 87 | + |
| 88 | +=head3 Metadata file(s) |
| 89 | + |
| 90 | +Currently only contains the number of samples aggregated into the |
| 91 | +corresponding report file. |
| 92 | + |
| 93 | +=head2 State directory |
| 94 | + |
| 95 | +=head3 Shard file(s) |
| 96 | + |
| 97 | +Empty flag files, a quicko way of enumerating the shards ids. |
| 98 | + |
| 99 | +=head3 Metadata file(s) |
| 100 | + |
| 101 | +User-provided metadata keys, added to the reports using |
| 102 | +C<set_global_metadata> and C<write_custom_metadata>. |
| 103 | + |
| 104 | +=head3 Processed file(s) |
| 105 | + |
| 106 | +State of C<Devel::StatProfiler::SectionChangeReader>, saved when the |
| 107 | +profile data has been split to multiple files and not all files have |
| 108 | +been processed yet. |
| 109 | + |
| 110 | +=head3 Last sample file(s) |
| 111 | + |
| 112 | +Tracks the time at which the last file for a given process id was |
| 113 | +processed. Used to clean up the processing state for |
| 114 | +C<Devel::StatProfiler::SectionChangeReader>. |
| 115 | + |
| 116 | +=head3 Genealogy file(s) |
| 117 | + |
| 118 | +Tracks the parent-child relationship between process ids, used to map |
| 119 | +the eval id (e.g. C<(eval 123)>) to the corresponding source code. |
| 120 | + |
| 121 | +=head3 Source map file(s) |
| 122 | + |
| 123 | +Information about C<#line> directives contained in eval source code, |
| 124 | +used to map a lines as reported in the profile to source code lines |
| 125 | +used during rendering. |
| 126 | + |
| 127 | +For non-eval source code, the corresponding information is parsed from |
| 128 | +the source code files on disk. |
| 129 | + |
| 130 | +=head3 Source file(s) |
| 131 | + |
| 132 | +Maps process ids into a list of evals that were seen by that process, |
| 133 | +and each eval and the hash of the source code. The source code hash |
| 134 | +can be used to to fetch the actual eval source code, and more |
| 135 | +importantly to merge profiling data from multiple independent evals. |
| 136 | + |
| 137 | +=head2 Source directory |
| 138 | + |
| 139 | +Contains a file for each C<eval STRING>; the file is named after the |
| 140 | +MD5 hash of the source code and stored in a 2-level deep directory |
| 141 | +structure. Files are just source code (not Sereal blobs). |
| 142 | + |
| 143 | +=cut |
0 commit comments