Bug 1990742 - Ingest perfherder_data from JSON artifacts instead of parsing logs #8997

junngo · 2025-09-26T14:18:01Z

Currently, Treeherder ingests performance data (PERFHERDER_DATA:) by parsing raw logs.
This patch supports reading data from the perfherder-data.json artifact instead.
For now, both the existing log parsing and the new JSON ingestion run in parallel to maintain compatibility.

bugzilla :https://bugzilla.mozilla.org/show_bug.cgi?id=1990742

gmierz · 2025-09-29T12:34:19Z

treeherder/log_parser/tasks.py

    return artifact_list
+
+
+def post_perfherder_artifacts(job_log):


@junngo I think it would be better for us to put this into a separate area. This folder seems to be specifically for parsing logs, but we're parsing JSONs instead. What do you think about having this task defined here in the perf directory? https://github.com/mozilla/treeherder/blob/505ad6b4047f77fc3ecdea63e57881116340d0fb/treeherder/perf/tasks.py

@gmierz Splitting the code is a great idea. Creating a separate file under the code directory [0] looks good to me. It feels more cohesive to put it there, since the log parsing [1] also lives in that folder.
Please consider my opinion and feel free to tell me about the directory location.

[0]
https://github.com/mozilla/treeherder/tree/505ad6b4047f77fc3ecdea63e57881116340d0fb/treeherder/log_parser
[1]

treeherder/treeherder/log_parser/artifactbuildercollection.py

Line 85 in 505ad6b

with make_request(self.url, stream=True) as response:

I added the new file based on your feedback. It seems more suitable since the JSON artifact isn’t part of the log parsing process :)

gmierz · 2025-09-29T12:43:47Z

treeherder/etl/perf.py

+                existing_replicates = set(
+                    PerformanceDatumReplicate.objects.filter(
+                        performance_datum=subtest_datum
+                    ).values_list("value", flat=True)


I'm guessing this is happening because of duplicate ingestion tasks (log, and json). I think we should find a way to default to using the JSON if they exist, and ignore the data we find in the logs. Maybe we could have a list of tests that we start with for testing this out? I'm thinking we could start with these tasks since the data they produce is not useful so any failures won't be problematic: https://treeherder.mozilla.org/jobs?repo=autoland&searchStr=regress&revision=6bd2ea6b9711dc7739d8ee7754b9330b11d0719d&selectedTaskRun=K87CGE6IT1GHl6wD4Skbyw.0

Exactly, log parsing and the JSON file feature are both active right now, so I handled the duplication.
I’ll revert that, add an allowlist, and only call _load_perf_datum for whitelisted tests when needed.

…arsing logs

junngo · 2025-09-30T14:57:15Z

treeherder/etl/perf.py

+        "awsy": ["ALL"],
+        "build_metrics": ["decision", "compiler warnings"],
+        "browsertime": ["constant-regression"],
+    }


The job is processed if at least one suite name matches the allowlist (e.g. compiler warnings).
This list is just a sample. We’ll gradually update it to expand JSON artifact usage.
[0]
https://firefoxci.taskcluster-artifacts.net/KZ6krBACTcyC1_q_tUejTA/0/public/build/perfherder-data-building.json

junngo · 2025-10-01T03:39:42Z

ID	Framework	Enabled	Suites
1	talos	true
2	build_metrics	true	compiler warnings, compiler_metrics, decision ...
4	awsy	true
5	awfy	false
6	platform_microbench	true
10	raptor	true
11	js-bench	true
12	devtools	true
13	browsertime	true	constant-regression ...
14	vcs	false
15	mozperftest	true
16	fxrecord	true
17	telemetry	true

I have a list of frameworks generated locally by django code.
It would be good to gradually reflect the less important framework-suite mappings one by one.

[0]
compiler warnings: https://firefoxci.taskcluster-artifacts.net/NE-naCeqSyenKogxu0nD4Q/0/public/build/perfherder-data-building.json
compiler_metrics: https://firefoxci.taskcluster-artifacts.net/P1T_HaXURD-r59ymlz5GWA/0/public/build/perfherder-data-compiler-metrics.json
decision: https://firefoxci.taskcluster-artifacts.net/OKsoq3lARpCjUhwVjqDddA/0/public/perfherder-data-decision.json

junngo

note:

# treeherder/etl/jobs.py
parse_logs.apply_async(queue=queue, args=[job.id, [job_log.id], priority])

I considered splitting the queues, but decided to keep using the existing ones to avoid code duplication and increased complexity.

https://github.com/mozilla/treeherder/pull/8997/files#diff-937b3e21ad52eec5277a7f52f51572348a072addafb88a049f9fe302ae437e76R369

junngo marked this pull request as draft September 26, 2025 14:18

gmierz self-requested a review September 29, 2025 12:27

gmierz reviewed Sep 29, 2025

View reviewed changes

Bug 1990742 - Ingest perfherder_data from JSON artifacts instead of p…

26bc32d

…arsing logs

junngo force-pushed the ingest-perfherder-data branch from 34855c7 to 26bc32d Compare September 30, 2025 14:44

junngo marked this pull request as ready for review September 30, 2025 14:44

junngo requested review from beatrice-acasandrei and esanuandra as code owners September 30, 2025 14:44

junngo commented Sep 30, 2025

View reviewed changes

junngo commented Oct 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug 1990742 - Ingest perfherder_data from JSON artifacts instead of parsing logs #8997

Bug 1990742 - Ingest perfherder_data from JSON artifacts instead of parsing logs #8997

junngo commented Sep 26, 2025 •

edited

Loading

Uh oh!

gmierz Sep 29, 2025

Uh oh!

junngo Sep 29, 2025

Uh oh!

junngo Oct 1, 2025

Uh oh!

gmierz Sep 29, 2025

Uh oh!

junngo Sep 29, 2025

Uh oh!

junngo Sep 30, 2025 •

edited

Loading

Uh oh!

junngo commented Oct 1, 2025

Uh oh!

junngo left a comment •

edited

Loading

Uh oh!

Uh oh!

Bug 1990742 - Ingest perfherder_data from JSON artifacts instead of parsing logs #8997

Are you sure you want to change the base?

Bug 1990742 - Ingest perfherder_data from JSON artifacts instead of parsing logs #8997

Conversation

junngo commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gmierz Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

junngo Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

junngo Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

gmierz Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

junngo Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

junngo Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junngo commented Oct 1, 2025

Uh oh!

junngo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

junngo commented Sep 26, 2025 •

edited

Loading

junngo Sep 30, 2025 •

edited

Loading

junngo left a comment •

edited

Loading