Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting "Can not find message descriptor by type_url" error when calling client.logging_api.write_entries() #945

Open
minherz opened this issue Oct 10, 2024 · 9 comments
Assignees
Labels
api: logging Issues related to the googleapis/python-logging API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@minherz
Copy link
Contributor

minherz commented Oct 10, 2024

Environment details

CloudRun v2 running a job as described in Import logs from storage to logging solution architecture.

  • OS type and version: N/A
  • Python version: Current version defined in the buildpacl
  • pip version: Current version defined in the buildpacl
  • google-cloud-logging version: 3.5.*

Steps to reproduce

  1. Deploy architecture
  2. Import previously exported files. Because these files are property of the customer they cannot be shared. I've sent a request to identify the specific object(s).

Code example

Stack trace

{
  "textPayload": "Task #1, failed: Failed to parse serviceData field: Can not find message descriptor by type_url: type.googleapis.com/google.cloud.bigquery.logging.v1.AuditData at LogEntry.protoPayload.serviceData.",
  "insertId": "6707b2f0000123f332576c49",
  "resource": {
    "type": "cloud_run_job",
    "labels": {
      "job_name": "log-reingest",
      "project_id": "oxydincproject",
      "location": "europe-west1"
    }
  },
  "timestamp": "2024-10-10T10:56:48.074739Z",
  "labels": {
    "run.googleapis.com/task_index": "0",
    "instanceId": "007989f2a1edc07a9422045e0489e77a3082305d4721358c83378156601ca21ae45476480ada98bdc376be7f04cbf17182042fba56df74748bf7585f9f7d78e5a359b046d6",
    "run.googleapis.com/task_attempt": "0",
    "run.googleapis.com/execution_name": "log-reingest-8frwp"
  },
  "logName": "projects/oxydincproject/logs/[run.googleapis.com](http://run.googleapis.com/)%2Fstderr",
  "receiveTimestamp": "2024-10-10T10:56:48.077400529Z"
}

Code of the solution can be found at python-docs-samples/logging/import-logs.

See below for the minimal code sample that reproduces the problem.

@minherz minherz added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. triage me I really want to be triaged. labels Oct 10, 2024
@product-auto-label product-auto-label bot added the api: logging Issues related to the googleapis/python-logging API. label Oct 10, 2024
@minherz
Copy link
Contributor Author

minherz commented Oct 14, 2024

After a few tests I have managed to create a minimal code sample that demonstrates this behavior. Try to run the following code to see this error:

import sys
from google.cloud import logging_v2

TEST_ENTRY = {
    "logName": "placeholder",
    "protoPayload": {
        "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
        "authenticationInfo": {
            "principalEmail": "service-org-12345@gcp-sa-scc-notification.iam.gserviceaccount.com"
        },
        "authorizationInfo": [
            {
                "granted": True,
                "permission": "bigquery.tables.update",
                "resource": "projects/someproject/datasets/sccFindings/tables/findings",
                "resourceAttributes": {}
            }
        ],
        "serviceData": {
            '@type': 'type.googleapis.com/google.cloud.bigquery.logging.v1.AuditData',
            'tableUpdateRequest': {
                'resource': {
                    'info': {},
                    'schemaJson': '{}',
                    'tableName': {
                        'datasetId': 'sccFindings',
                        'projectId': 'someproject',
                        'tableId': 'findings'
                    },
                    'updateTime': '2024-08-20T15:01:48.399Z',
                    'view': {}
                }
            }
        },
        "methodName": "google.cloud.bigquery.v2.TableService.PatchTable",
        "requestMetadata": {
            "callerIp": "private",
            "destinationAttributes": {},
            "requestAttributes": {}
        },
        "resourceName": "projects/someproject/datasets/sccFindings/tables/findings",
        "serviceName": "bigquery.googleapis.com",
        "status": {}
    },
    "resource": {
        "labels": {
            "dataset_id": "sccFindings",
            "project_id": "someproject"
        },
        "type": "bigquery_dataset"
    },
    "severity": "NOTICE",
}

def main():

    client = logging_v2.Client()
    TEST_ENTRY['logName'] = f"projects/{client.project}/logs/test_writing_logs"
    logs = [TEST_ENTRY]
    client.logging_api.write_entries(logs)

# Start script
if __name__ == "__main__":
    try:
        main()
    except Exception as err:
        print(f"Task failed: {err}")
        sys.exit(1)

@minherz
Copy link
Contributor Author

minherz commented Oct 14, 2024

If the protobuf payload uses metadata field instead of serviceData field, the code successfully writes the log entry.

Looks like the assumption mentioned in the code comment does not hold for the type.googleapis.com/google.cloud.bigquery.logging.v1.AuditData protobuf datatype.

@minherz
Copy link
Contributor Author

minherz commented Oct 15, 2024

Changing the deprecated field serviceData to metadata resolves the problem. I am unsure how this patch should be applied.

@minherz
Copy link
Contributor Author

minherz commented Oct 21, 2024

See b/374328640 for additional information.

@gkevinzheng
Copy link
Contributor

@minherz I've been looking in to the failure message, and it looks like it's being raised in google.protobuf.json_format.ParseDict, in this code:

def _CreateMessageFromTypeUrl(type_url, descriptor_pool):
  """Creates a message from a type URL."""
  db = symbol_database.Default()
  pool = db.pool if descriptor_pool is None else descriptor_pool
  type_name = type_url.split('/')[-1]
  try:
    message_descriptor = pool.FindMessageTypeByName(type_name)
  except KeyError as e:
    raise TypeError(
        'Can not find message descriptor by type_url: {0}'.format(type_url)
    ) from e
  message_class = message_factory.GetMessageClass(message_descriptor)
  return message_class()

From what our code does, it looks like you can add the following in your code to resolve the issue:

from google.protobuf import symbol_database

symbol_database.Default().RegisterMessage(<message>)

The only issue is what the correct value of message to be put here, some sort of class that extends google.protobuf.message.Message and has that type URL.

@minherz
Copy link
Contributor Author

minherz commented Oct 24, 2024

Thank you, @gkevinzheng. The matter is that this particular type is a part of the official Google Cloud collection of types.
The application cannot guess how many types will be required. Moreover, as I explained in the workaround, this type is successfully parsed but when it uses not deprecated field. Please, have a look at the bug in my previously comment. As far as I understand, it proposes a more comprehensive solution that does not require registering each Google protobuf type in each application.

I am unsure whether the proposed solution has to be implemented in this client library or it addresses the bundle that the library already using.

For what it worth, I think that, given the protoPayload type for payload is used only for logs generated by Google Cloud services, the customers of this library are not supposed to register each protobuf type from Google Cloud that they plan to use with this library for just in case.

@gkevinzheng
Copy link
Contributor

@minherz Sorry for the late reply. I've looked at the issue and I think it would be more appropriate to file an issue in the Protobuf repository. Although this issue popped up for the customer while they interacted with the logging library, the root cause seems to be within google.protobuf.json_format.ParseDict.

@minherz
Copy link
Contributor Author

minherz commented Jan 4, 2025

@gkevinzheng I respectfully disagree. If you carefully read the description of the issue and review the minimal code that I provided you will see that it happens due to the protobuf types created specifically for the log entry structure.

If you are willing to readdress this issue in the protobuf repository, please create a generic code sample that uses protobuf package and results in the described behavior so it can be reported in the issue.

There is a workaround to fix the problem with the "retired" field in the log entry. For some reason the PR with workaround has not been reviewed yet:

    if "protoPayload" in log:
        payload = log.get("protoPayload")
        if "serviceData" in payload:
            # the following line changes the place of metadata in the dictionary
            payload["metadata"] = payload.pop("serviceData")

Of course the final decision is up to you as a maintainer of this library. This is definitely a corner case when a user tries to write a log entry with a specific protobuf payload. Please consider that leaving it unattended, this behavior is hard to debug and it will end up crashing user's application.

@gkevinzheng
Copy link
Contributor

gkevinzheng commented Jan 6, 2025

@minherz The code you provided fixes the issue right? Is there a proto object I could put in a unit test that replicates this without having to use client.logging_api.write_entries? Is the customer calling this method directly?

I would like to add this workaround into ProtobufEntry.to_api_repr instead of to write_entries directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: logging Issues related to the googleapis/python-logging API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

3 participants