Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Automatic Import] Safely access non-identifier fields in Painless if context #205220

Merged
merged 9 commits into from
Jan 3, 2025

Conversation

ilyannn
Copy link
Contributor

@ilyannn ilyannn commented Dec 27, 2024

Release Note

Fixes how Automatic Import generates accesses for the field names that are not valid Painless identifiers.

Summary

Closes #205024

We add utility functions to access nested fields in Painless in a safe way and modify the existing ECS generation logic to use them.

This access happens using the object?.get("field") syntax for complex cases, while falling back to the familiar ctx.field for the cases where field is a valid Painless identifier and ctx is known to be non-nullable.

In the future this should be taken care of by the new $('a.b.c', defaultValue) accessor function (elastic/elasticsearch#101274). For now, it's not available:

{
  "error": {
...
    "type": "script_exception",
    "reason": "compile error",
    "processor_type": "set",
    "script": "$('cloud_provider', '') == 'azure'",
    "lang": "painless",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "invalid shortcut [$] for [field]; ensure [field] exists in this context"
    }
  },
  "status": 400
}

This takes care of the compile-time correctness of field accesses. Note that it is still possible for generated pipelines to fail in runtime on unexpected input, e.g. accessing a nested field a.b fails for the document of the form {"a": "string"}.

Testing

The two utility files we add are fully covered with unit tests:
image.

Here's the generated package for logs containing only the @timestamp field:

  - date:
      if: ctx.ai_202501022326__timestamp?.logs?.get("@timestamp") != null
      tag: date_processor_ai_202501022326__timestamp.logs.@timestamp
      field: ai_202501022326__timestamp.logs.@timestamp
      target_field: '@timestamp'
      formats:
        - yyyy-MM-dd'T'HH:mm:ss.SSSXXX
        - ISO8601

Checklist

  • Unit or functional tests were updated or added to match the most common scenarios
  • The PR description includes the appropriate Release Notes section, and the correct release_note:* label is applied per the guidelines

@ilyannn ilyannn added bug Fixes for quality problems that affect the customer experience Team:Security-Scalability Team label for Security Integrations Scalability Team Feature:AutomaticImport labels Dec 27, 2024
@ilyannn ilyannn self-assigned this Dec 27, 2024
@ilyannn ilyannn marked this pull request as ready for review January 2, 2025 22:33
@ilyannn ilyannn requested a review from a team as a code owner January 2, 2025 22:33
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-scalability (Team:Security-Scalability)

@ilyannn ilyannn enabled auto-merge (squash) January 2, 2025 22:41
@ilyannn ilyannn added the backport:prev-major Backport to (8.x, 8.17, 8.16) the previous major branch and other branches in development label Jan 2, 2025
* @returns `true` if the string is a valid Painless identifier and not a reserved word, `false` otherwise.
*/
export function isPainlessIdentifier(s: string): boolean {
return PAINLESS_IDENTIFIER_REGEXP.test(s) && !PAINLESS_RESERVED_WORDS.has(s);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the check for painless reserved keywords necessary? I get the regex check, but for example

Map test = new HashMap();
test.if = 'Reserved keyword';

return test.if;

do not throw any error for me, using the Painless Lab

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point! I've taken this logic from the documentation that says:

Use an identifier as a named token to specify a ... field ...

and that a keyword cannot be an identifier, but perhaps the documentation is incomplete! I'll test whether this works specifically for the ingest pipelines and if yes I'll remove the check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes when accessing the field all identifiers are fine:

ctx.if?.host // ok

though this is still understood as if statement:

if.something?.host // "unexpected token ['.'] was expecting one of ['(']."

@ilyannn ilyannn disabled auto-merge January 3, 2025 13:45
@ilyannn ilyannn changed the title Safely access non-identifier fields in Painless if context [Automatic Import] Safely access non-identifier fields in Painless if context Jan 3, 2025
@ilyannn ilyannn enabled auto-merge (squash) January 3, 2025 14:05
@ilyannn ilyannn merged commit 1fb16c5 into elastic:main Jan 3, 2025
8 checks passed
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.16, 8.17, 8.x

https://github.com/elastic/kibana/actions/runs/12600012995

@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #110 / Spaces app space solution tour solution tour does not show the solution tour after deleting spaces and leave only the default

Metrics [docs]

✅ unchanged

cc @ilyannn

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Jan 3, 2025
… context (elastic#205220)

Closes elastic#205024

We add utility functions to access nested fields in Painless in a safe
way and modify the existing ECS generation logic to use them.

This access happens using the `object?.get("field")` syntax for complex
cases, while falling back to the familiar `ctx.field` for the cases
where `field` is a valid Painless identifier and `ctx` is known to be
non-nullable.

This takes care of the compile-time correctness of field accesses. Note
that it is still possible for generated pipelines to fail in runtime on
unexpected input, e.g. accessing a nested field `a.b` fails for the
document of the form `{"a": "string"}`.

See the PR for more details, release note and test results.

---------

Co-authored-by: kibanamachine <[email protected]>
(cherry picked from commit 1fb16c5)
@kibanamachine
Copy link
Contributor

💔 Some backports could not be created

Status Branch Result
8.16 Backport failed because of merge conflicts

You might need to backport the following PRs to 8.16:
- [Infra][Hosts] Reset SearchBar refresh state to fully disable auto-refresh (#205416)
- Upgrade cypress to 13.17.0 (#205437)
8.17 Backport failed because of merge conflicts

You might need to backport the following PRs to 8.17:
- [Infra][Hosts] Reset SearchBar refresh state to fully disable auto-refresh (#205416)
- Upgrade cypress to 13.17.0 (#205437)
8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Manual backport

To create the backport manually run:

node scripts/backport --pr 205220

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Jan 3, 2025
…ess if context (#205220) (#205510)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[Automatic Import] Safely access non-identifier fields in Painless if
context (#205220)](#205220)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Ilya
Nikokoshev","email":"[email protected]"},"sourceCommit":{"committedDate":"2025-01-03T15:31:46Z","message":"[Automatic
Import] Safely access non-identifier fields in Painless if context
(#205220)\n\nCloses
https://github.com/elastic/kibana/issues/205024\n\nWe add utility
functions to access nested fields in Painless in a safe\nway and modify
the existing ECS generation logic to use them.\n\nThis access happens
using the `object?.get(\"field\")` syntax for complex\ncases, while
falling back to the familiar `ctx.field` for the cases\nwhere `field` is
a valid Painless identifier and `ctx` is known to
be\nnon-nullable.\n\nThis takes care of the compile-time correctness of
field accesses. Note\nthat it is still possible for generated pipelines
to fail in runtime on\nunexpected input, e.g. accessing a nested field
`a.b` fails for the\ndocument of the form `{\"a\": \"string\"}`.\n\nSee
the PR for more details, release note and test
results.\n\n---------\n\nCo-authored-by: kibanamachine
<[email protected]>","sha":"1fb16c59521a9bbbbc71d151f4de5fa57323c7e2","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug","release_note:fix","v9.0.0","backport:prev-major","Team:Security-Scalability","Feature:AutomaticImport"],"title":"[Automatic
Import] Safely access non-identifier fields in Painless if
context","number":205220,"url":"https://github.com/elastic/kibana/pull/205220","mergeCommit":{"message":"[Automatic
Import] Safely access non-identifier fields in Painless if context
(#205220)\n\nCloses
https://github.com/elastic/kibana/issues/205024\n\nWe add utility
functions to access nested fields in Painless in a safe\nway and modify
the existing ECS generation logic to use them.\n\nThis access happens
using the `object?.get(\"field\")` syntax for complex\ncases, while
falling back to the familiar `ctx.field` for the cases\nwhere `field` is
a valid Painless identifier and `ctx` is known to
be\nnon-nullable.\n\nThis takes care of the compile-time correctness of
field accesses. Note\nthat it is still possible for generated pipelines
to fail in runtime on\nunexpected input, e.g. accessing a nested field
`a.b` fails for the\ndocument of the form `{\"a\": \"string\"}`.\n\nSee
the PR for more details, release note and test
results.\n\n---------\n\nCo-authored-by: kibanamachine
<[email protected]>","sha":"1fb16c59521a9bbbbc71d151f4de5fa57323c7e2"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/205220","number":205220,"mergeCommit":{"message":"[Automatic
Import] Safely access non-identifier fields in Painless if context
(#205220)\n\nCloses
https://github.com/elastic/kibana/issues/205024\n\nWe add utility
functions to access nested fields in Painless in a safe\nway and modify
the existing ECS generation logic to use them.\n\nThis access happens
using the `object?.get(\"field\")` syntax for complex\ncases, while
falling back to the familiar `ctx.field` for the cases\nwhere `field` is
a valid Painless identifier and `ctx` is known to
be\nnon-nullable.\n\nThis takes care of the compile-time correctness of
field accesses. Note\nthat it is still possible for generated pipelines
to fail in runtime on\nunexpected input, e.g. accessing a nested field
`a.b` fails for the\ndocument of the form `{\"a\": \"string\"}`.\n\nSee
the PR for more details, release note and test
results.\n\n---------\n\nCo-authored-by: kibanamachine
<[email protected]>","sha":"1fb16c59521a9bbbbc71d151f4de5fa57323c7e2"}}]}]
BACKPORT-->

Co-authored-by: Ilya Nikokoshev <[email protected]>
kowalczyk-krzysztof pushed a commit to kowalczyk-krzysztof/kibana that referenced this pull request Jan 7, 2025
… context (elastic#205220)

Closes elastic#205024

We add utility functions to access nested fields in Painless in a safe
way and modify the existing ECS generation logic to use them.

This access happens using the `object?.get("field")` syntax for complex
cases, while falling back to the familiar `ctx.field` for the cases
where `field` is a valid Painless identifier and `ctx` is known to be
non-nullable.

This takes care of the compile-time correctness of field accesses. Note
that it is still possible for generated pipelines to fail in runtime on
unexpected input, e.g. accessing a nested field `a.b` fails for the
document of the form `{"a": "string"}`.

See the PR for more details, release note and test results.

---------

Co-authored-by: kibanamachine <[email protected]>
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this pull request Jan 13, 2025
… context (elastic#205220)

Closes elastic#205024

We add utility functions to access nested fields in Painless in a safe
way and modify the existing ECS generation logic to use them.

This access happens using the `object?.get("field")` syntax for complex
cases, while falling back to the familiar `ctx.field` for the cases
where `field` is a valid Painless identifier and `ctx` is known to be
non-nullable.

This takes care of the compile-time correctness of field accesses. Note
that it is still possible for generated pipelines to fail in runtime on
unexpected input, e.g. accessing a nested field `a.b` fails for the
document of the form `{"a": "string"}`.

See the PR for more details, release note and test results.

---------

Co-authored-by: kibanamachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-major Backport to (8.x, 8.17, 8.16) the previous major branch and other branches in development bug Fixes for quality problems that affect the customer experience Feature:AutomaticImport release_note:fix Team:Security-Scalability Team label for Security Integrations Scalability Team v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Automatic Import] Quote fields in Painless scripts
4 participants