Skip to content

Conversation

junngo
Copy link
Contributor

@junngo junngo commented Sep 26, 2025

Currently, Treeherder ingests performance data (PERFHERDER_DATA:) by parsing raw logs.
This patch supports reading data from the perfherder-data.json artifact instead.
For now, both the existing log parsing and the new JSON ingestion run in parallel to maintain compatibility.

bugzilla :https://bugzilla.mozilla.org/show_bug.cgi?id=1990742

@junngo junngo marked this pull request as draft September 26, 2025 14:18
@gmierz gmierz self-requested a review September 29, 2025 12:27
return artifact_list


def post_perfherder_artifacts(job_log):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junngo I think it would be better for us to put this into a separate area. This folder seems to be specifically for parsing logs, but we're parsing JSONs instead. What do you think about having this task defined here in the perf directory? https://github.com/mozilla/treeherder/blob/505ad6b4047f77fc3ecdea63e57881116340d0fb/treeherder/perf/tasks.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmierz Splitting the code is a great idea. Creating a separate file under the code directory [0] looks good to me. It feels more cohesive to put it there, since the log parsing [1] also lives in that folder.
Please consider my opinion and feel free to tell me about the directory location.

[0]
https://github.com/mozilla/treeherder/tree/505ad6b4047f77fc3ecdea63e57881116340d0fb/treeherder/log_parser
[1]

with make_request(self.url, stream=True) as response:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the new file based on your feedback. It seems more suitable since the JSON artifact isn’t part of the log parsing process :)

existing_replicates = set(
PerformanceDatumReplicate.objects.filter(
performance_datum=subtest_datum
).values_list("value", flat=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is happening because of duplicate ingestion tasks (log, and json). I think we should find a way to default to using the JSON if they exist, and ignore the data we find in the logs. Maybe we could have a list of tests that we start with for testing this out? I'm thinking we could start with these tasks since the data they produce is not useful so any failures won't be problematic: https://treeherder.mozilla.org/jobs?repo=autoland&searchStr=regress&revision=6bd2ea6b9711dc7739d8ee7754b9330b11d0719d&selectedTaskRun=K87CGE6IT1GHl6wD4Skbyw.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, log parsing and the JSON file feature are both active right now, so I handled the duplication.
I’ll revert that, add an allowlist, and only call _load_perf_datum for whitelisted tests when needed.

@junngo junngo force-pushed the ingest-perfherder-data branch from 34855c7 to 26bc32d Compare September 30, 2025 14:44
@junngo junngo marked this pull request as ready for review September 30, 2025 14:44
"awsy": ["ALL"],
"build_metrics": ["decision", "compiler warnings"],
"browsertime": ["constant-regression"],
}
Copy link
Contributor Author

@junngo junngo Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The job is processed if at least one suite name matches the allowlist (e.g. compiler warnings).
This list is just a sample. We’ll gradually update it to expand JSON artifact usage.
[0]
https://firefoxci.taskcluster-artifacts.net/KZ6krBACTcyC1_q_tUejTA/0/public/build/perfherder-data-building.json

@junngo
Copy link
Contributor Author

junngo commented Oct 1, 2025

ID Framework Enabled Suites
1 talos true
2 build_metrics true compiler warnings, compiler_metrics, decision ...
4 awsy true
5 awfy false
6 platform_microbench true
10 raptor true
11 js-bench true
12 devtools true
13 browsertime true constant-regression ...
14 vcs false
15 mozperftest true
16 fxrecord true
17 telemetry true

I have a list of frameworks generated locally by django code.
It would be good to gradually reflect the less important framework-suite mappings one by one.

[0]
compiler warnings: https://firefoxci.taskcluster-artifacts.net/NE-naCeqSyenKogxu0nD4Q/0/public/build/perfherder-data-building.json
compiler_metrics: https://firefoxci.taskcluster-artifacts.net/P1T_HaXURD-r59ymlz5GWA/0/public/build/perfherder-data-compiler-metrics.json
decision: https://firefoxci.taskcluster-artifacts.net/OKsoq3lARpCjUhwVjqDddA/0/public/perfherder-data-decision.json

Copy link
Contributor Author

@junngo junngo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note:

# treeherder/etl/jobs.py
parse_logs.apply_async(queue=queue, args=[job.id, [job_log.id], priority])

I considered splitting the queues, but decided to keep using the existing ones to avoid code duplication and increased complexity.

https://github.com/mozilla/treeherder/pull/8997/files#diff-937b3e21ad52eec5277a7f52f51572348a072addafb88a049f9fe302ae437e76R369

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants