Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCE/MAE Consumers start without waiting for matching versions #12138

Closed
mihai103 opened this issue Dec 16, 2024 · 2 comments
Closed

MCE/MAE Consumers start without waiting for matching versions #12138

mihai103 opened this issue Dec 16, 2024 · 2 comments
Assignees
Labels
bug Bug report

Comments

@mihai103
Copy link

Describe the bug
The MCE and MAE consumer services in DataHub start processing messages without waiting for the version check. This can potentially lead to issues where newer versions of consumers run with older versions of GMS.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy GMS, MCE and MAE as separate pods in a K8s cluster
  2. Upgrade images to new versions
  3. At this point the new GMS pod will wait for datahub-upgrade job to finish running with the corresponding version. The old GMS pod will continue running. MCE/MAE will start with the new version regardless of datahub-upgade.

Expected behavior
The expectation is that both MCE and MAE consumers will perform a version check during startup to match the version number published to the Kafka topic by datahub-upgrade.

Additional context
The bootstrapManager is not started in ApplicationStartupListener when running any consumer in standalone mode, as the context id is actually "application", which is the default for SpringBoot configured contexts, not WebApplicationContext.class.getName(). Maybe an alternative check to see if the current context is the root context would be to check if the current context has no parent, which should work in both standalone and gms only deployments.

@mihai103 mihai103 added the bug Bug report label Dec 16, 2024
@david-leifker david-leifker self-assigned this Jan 10, 2025
@david-leifker
Copy link
Collaborator

david-leifker commented Jan 14, 2025

First, thank you for calling this out, however I am not able to reproduce this situation and it appears to be working for me. I've outlined the steps below where I tried to reproduce it.

I thought I was able to reproduce this at first however it turned out that I was running the older image but a newer build of the jar locally. After fixing this and using the official v0.14.1 image with a current build v0.15.1-SNAPSHOT ... I am seeing both the mae-consumer and the mce-consumer waiting for the expected message.

There are multiple implementations of bootstrapManager and the consumers have a single bootstrap step, for the mce-consumer this is the factory

Essentially it just runs this step

Attached screenshots of the content of the kafka topic and the expected log messages from the consumer.

Image

Image

Image

@mihai103
Copy link
Author

mihai103 commented Jan 15, 2025

Hi @david-leifker ,

I tried again and I got the same problem.

So indeed I also get the warn: 2025-01-15 10:43:55,009 [ThreadPoolTaskExecutor-1] WARN c.l.m.b.k.DataHubUpgradeKafkaListener - System version is not up to date: v0.14.0.2-1. Waiting for datahub-upgrade to complete...

However then the mae service just starts and replaces the other container.

So that log indeed comes from com.linkedin.metadata.boot.kafka.DataHubUpgradeKafkaListener#checkSystemVersion listener, however what I think should be blocking the startup is the bootstrap step com.linkedin.metadata.boot.steps.WaitForSystemUpdateStep, which for me doesn't run.

Also if I add a breakpoint in com.linkedin.metadata.kafka.boot.ApplicationStartupListener#onApplicationEvent it never gets inside the if, meaning the bootstrap process is not even started. (I'm also not seeing the "Starting Bootstrap Process" log).

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

2 participants