-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug-1921849: support elasticsearch 8 #6741
base: main
Are you sure you want to change the base?
Conversation
d96b425
to
9ac223f
Compare
This comment was marked as resolved.
This comment was marked as resolved.
32912f9
to
b360970
Compare
This comment was marked as resolved.
This comment was marked as resolved.
19b3ea4
to
c0e2cfd
Compare
Co-authored-by: krzepka <[email protected]>
c0e2cfd
to
2704c10
Compare
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to do this PR in two passes. This is a code read pass.
While you're fixing things I brought up, I'll spend some time going through some manual testing for things I'm wondering about.
Then after you make changes, I'll read through those and add anything that came up in manual testing.
"storage_mapping": { | ||
"analyzer": "semicolon_keywords", | ||
"type": "text", | ||
"fielddata": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my notes, fielddata docs are here:
https://www.elastic.co/guide/en/elasticsearch/reference/8.15/text.html#fielddata-mapping-param
This is a text field and in order to aggregate/sort on it, we need to set fielddata=True
.
We don't want to do this:
"storage_mapping": {
"analyzer": "semicolon_keywords",
"fields": {"full": {"type": "keyword"}},
"type": "text",
}
because we want to treat each token separately for aggregation.
@relud Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that matches my understanding
# out which *key* had the bad input. | ||
for key, value in kwargs.items(): | ||
if value == bad_input: | ||
raise BadArgumentError(key) from exc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you tell me more about the changes in this section and the below section? It seems like the original code handled two different kinds of errors and the new code only handles one of those here and the other one in a different block. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. This used to handle malformed query and invalid regex, but now it only handles malformed query, and bad regex is now considered a shard failure by ES, so it's handled below.
991c724
to
b460f21
Compare
b460f21
to
b3a3c73
Compare
except elasticsearch.exceptions.TransportError as e: | ||
# If this is a TransportError, we try to figure out what the error | ||
except elasticsearch.BadRequestError as e: | ||
# If this is a BadRequestError, we try to figure out what the error | ||
# is and fix the document and try again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this section removes fields that cause document_parsing_exception
and retries the document. This seems like an odd choice given that it happens after value fixing occurs, which should already be preventing the three types of failure we catch here. The only way i can think of to reach this code block in production is if we are writing to a field not in our mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It used to be the case that it wrote all the data into Elasticsearch even if it wasn't in the mapping. That way when they add new fields to the crash report, they'd get indexed even if Socorro didn't explicitly have support for it. While the intentions were good, that was terrible so I changed it such that it only indexes what's defined in super search fields and in the mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because crashstorage metrics keys are composed in the processor for crash storage destinations, when we switch to PREFER_NEW
, a new metrics key is emitted which isn't documented in socorro/statsd_metrics.yml
.
Running the local dev environment and processing crashes kicks this up:
socorro-processor-1 | Traceback (most recent call last):
socorro-processor-1 | File "/app/socorro/lib/threaded_task_manager.py", line 250, in run
socorro-processor-1 | function(*args, **kwargs) # execute the task
socorro-processor-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^
socorro-processor-1 | File "/app/socorro/processor/processor_app.py", line 144, in transform
socorro-processor-1 | self.process_crash(
socorro-processor-1 | File "/app/socorro/processor/processor_app.py", line 215, in process_crash
socorro-processor-1 | with METRICS.timer(
socorro-processor-1 | File "/usr/local/lib/python3.11/contextlib.py", line 144, in __exit__
socorro-processor-1 | next(self.gen)
socorro-processor-1 | File "/usr/local/lib/python3.11/site-packages/markus/main.py", line 509, in timer
socorro-processor-1 | self.timing(stat, value=delta * 1000.0, tags=tags)
socorro-processor-1 | File "/usr/local/lib/python3.11/site-packages/markus/main.py", line 420, in timing
socorro-processor-1 | self._publish(
socorro-processor-1 | File "/usr/local/lib/python3.11/site-packages/markus/main.py", line 280, in _publish
socorro-processor-1 | record = metrics_filter.filter(record)
socorro-processor-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socorro-processor-1 | File "/usr/local/lib/python3.11/site-packages/markus/filters.py", line 122, in filter
socorro-processor-1 | raise MetricsUnknownKey(f"metrics key {record.key!r} is unknown")
socorro-processor-1 | markus.filters.MetricsUnknownKey: metrics key 'socorro.processor.legacy_es.save_processed_crash' is unknown
Can you add that to statsd_metrics.yml
?
This kicks up an error because I think ES 8 hasn't finishing starting up, yet:
- set
ELASTICSEARCH_MODE=PREFER_NEW
in.env
(I rebased against main to pick up the just and.env
changes) - do
docker compose stop
to stop everything so nothing is running - do
just build
- do
just setup
Error:
Traceback (most recent call last):
File "/app/socorro-cmd", line 202, in <module>
cmd_main()
File "/app/socorro-cmd", line 198, in cmd_main
import_and_run(runner)
File "/app/socorro-cmd", line 129, in import_and_run
sys.exit(app(sys.argv[1:]))
^^^^^^^^^^^^^^^^^
File "/app/bin/es_cli.py", line 156, in main
es_group(argv)
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/bin/es_cli.py", line 143, in cmd_delete
indices_to_delete = crashstorage.get_indices()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/socorro/external/es/crashstorage.py", line 409, in get_indices
indices = self.client.get_indices()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/socorro/external/es/connection_context.py", line 86, in get_indices
return self.indices_client().get_alias().keys()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/elasticsearch/_sync/client/utils.py", line 446, in wrapped
return api(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/elasticsearch/_sync/client/indices.py", line 1901, in get_alias
return self.perform_request( # type: ignore[return-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/elasticsearch/_sync/client/_base.py", line 423, in perform_request
return self._client.perform_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/elasticsearch/_sync/client/_base.py", line 271, in perform_request
response = self._perform_request(
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/elasticsearch/_sync/client/_base.py", line 316, in _perform_request
meta, resp_body = self.transport.perform_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/elastic_transport/_transport.py", line 342, in perform_request
resp = node.perform_request(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/elastic_transport/_node/_http_urllib3.py", line 202, in perform_request
raise err from None
elastic_transport.ConnectionError: Connection error caused by: ConnectionError(Connection error caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x745b18446cd0>: Failed to establish a new connection: [Errno 111] Connection refused))
If I pause and then run just setup
a second time, it works fine.
Can you add a depends_on
or a waitfor
or whatever it is that's needed, please?
I went through and tested these things with both the default and PREFER_NEW
settings:
-
processing crash reports, super search, signature report -- work fine
-
top crashers report -- kicks up error:
socorro-webapp-1 | Traceback (most recent call last): socorro-webapp-1 | File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner socorro-webapp-1 | response = get_response(request) socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 197, in _get_response socorro-webapp-1 | response = wrapped_callback(request, *callback_args, **callback_kwargs) socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/usr/local/lib/python3.11/site-packages/sentry_sdk/integrations/django/views.py", line 90, in sentry_wrapped_callback socorro-webapp-1 | return callback(request, *args, **kwargs) socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/app/webapp/crashstats/crashstats/decorators.py", line 149, in inner socorro-webapp-1 | response = view(request, *args, **kwargs) socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/app/webapp/crashstats/crashstats/decorators.py", line 101, in inner socorro-webapp-1 | return view(request, *args, **kwargs) socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/app/webapp/crashstats/crashstats/decorators.py", line 68, in inner socorro-webapp-1 | return view(request, *args, **kwargs) socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/app/webapp/crashstats/topcrashers/views.py", line 298, in topcrashers socorro-webapp-1 | return render(request, "topcrashers/topcrashers.html", context) socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/usr/local/lib/python3.11/site-packages/sentry_sdk/utils.py", line 1788, in runner socorro-webapp-1 | return sentry_patched_function(*args, **kwargs) socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/usr/local/lib/python3.11/site-packages/sentry_sdk/integrations/django/templates.py", line 105, in render socorro-webapp-1 | return real_render(request, template_name, context, *args, **kwargs) socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/usr/local/lib/python3.11/site-packages/django/shortcuts.py", line 24, in render socorro-webapp-1 | content = loader.render_to_string(template_name, context, request, using=using) socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/usr/local/lib/python3.11/site-packages/django/template/loader.py", line 62, in render_to_string socorro-webapp-1 | return template.render(context, request) socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/usr/local/lib/python3.11/site-packages/django_jinja/backend.py", line 59, in render socorro-webapp-1 | return mark_safe(self._process_template(self.template.render, context, request)) socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/usr/local/lib/python3.11/site-packages/django_jinja/backend.py", line 105, in _process_template socorro-webapp-1 | return handler(context) socorro-webapp-1 | ^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/usr/local/lib/python3.11/site-packages/jinja2/environment.py", line 1304, in render socorro-webapp-1 | self.environment.handle_exception() socorro-webapp-1 | File "/usr/local/lib/python3.11/site-packages/jinja2/environment.py", line 939, in handle_exception socorro-webapp-1 | raise rewrite_traceback_stack(source=source) socorro-webapp-1 | File "/app/webapp/crashstats/topcrashers/jinja2/topcrashers/topcrashers.html", line 7, in top-level template code socorro-webapp-1 | {% extends "crashstats_base.html" %} socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/app/webapp/crashstats/crashstats/jinja2/crashstats_base.html", line 150, in top-level template code socorro-webapp-1 | {% block content %}{% endblock %} socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/app/webapp/crashstats/topcrashers/jinja2/topcrashers/topcrashers.html", line 193, in block 'content' socorro-webapp-1 | {% if topcrashers_stats_item.is_startup_window_crash %} socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/usr/local/lib/python3.11/site-packages/jinja2/environment.py", line 487, in getattr socorro-webapp-1 | return getattr(obj, attribute) socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/usr/local/lib/python3.11/site-packages/django/utils/functional.py", line 57, in __get__ socorro-webapp-1 | res = instance.__dict__[self.name] = self.func(instance) socorro-webapp-1 | ^^^^^^^^^^^^^^^^^^^ socorro-webapp-1 | File "/app/webapp/crashstats/crashstats/utils.py", line 177, in is_startup_window_crash socorro-webapp-1 | if row["term"] < 60: socorro-webapp-1 | ^^^^^^^^^^^^^^^^ socorro-webapp-1 | TypeError: '<' not supported between instances of 'str' and 'int'
-
custom queries (only available to obs team) work fine
-
I did aggregations with
missing_symbols
,modules_in_stack
,topmost_filename
, anduseragent_locale
and they all work fine -- aggregation is done on tokens and not the whole value -
abort_message
works with both matches and is-exact operators -
startup_crash
has T and F values -
_return_query=1
with local dev environment looks fine; it's different than what we see in prod, but that's expected due to ES API changes -
I went through all the supersearchfacet examples in the crashstats-tools docs and they all worked fine
This is looking good. That TopCrashers issue needs to be looked at.
9988801
to
0e51573
Compare
|
||
startup_crash_msg = 'title="Startup Crash"' | ||
potential_startup_crash_msg = 'title="Potential Startup Crash"' | ||
potential_startup_window_crash_msg = 'title="Potential Startup Crash, more than ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified this test to also check is_startup_window_crash
to cover the failure case you manually observed, and confirm that my solution for identifying boolean aggregation terms is working as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done--thank you!
0e51573
to
0b51672
Compare
0b51672
to
8cedc0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through and tested the issues I raised in the previous review.
bin/process_crashes.sh
works now.- Things correctly wait for both es and legacy_es containers to start up.
- TopCrashers works now.
I wrote up bug 1933824 about a curiosity I hit when uploading some crash report dump files to gcs emulator. That's not related to these changes.
My only issue is that I'm not sure why you added four new metrics to statsd_metrics.yml
rather than just the one I was hitting issues with.
Everything else looks fine as far as I can tell.
r+wc
socorro.processor.legacy_es.save_processed_crash: | ||
type: "timing" | ||
description: | | ||
Timer for how long it takes to save the processed crash to Elasticsearch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure we only needed to add socorro.processor.legacy_es.save_processed_crash
because that's the only one that's composed and emitted by the processor:
socorro/socorro/processor/processor_app.py
Lines 215 to 218 in 8d6449d
with METRICS.timer( | |
f"processor.{dest.crash_destination_name}.save_processed_crash" | |
): | |
dest.save_processed_crash(raw_crash, processed_crash) |
When you tested this, did you hit errors that caused you to add the other three metrics?
use
ELASTICSEARCH_MODE=PREFER_NEW
to make the webapp use es8 and the processor write to both es 1.4 and es8