Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNS/SQS Pubsub - Same subscriber consuming from 2 pubsub components - Event handling fails / is inconsistent #3646

Open
rakeshwalisheter opened this issue Jan 11, 2025 · 4 comments
Labels
kind/bug Something isn't working

Comments

@rakeshwalisheter
Copy link

rakeshwalisheter commented Jan 11, 2025

Expected Behavior

When 2 pubsub.aws.snssqs components are defined in a cluster and the same service subscribes to these 2 pubsub components, subscriber is expected to consistently receive and process events coming from both components.

I have validated that this behavior works fine when using RMQ-pubsub component.

Actual Behavior

Events are not received / processed; some events are processed multiple times; errors seens in daprd logs. See steps to reproduce below for details of how i setup the reproducer and what the outcome is.

Steps to Reproduce the Problem

(Please use the attached reproducer project which can easily run these tests in local (rmq) / EKS (sqssns) using tilt with simple commands)

  1. Configure 2 pubsub.aws.snssqs components - dapr-pubsub and dapr-highprio-pubsub
  2. Publisher (pub) randomly publishes events to topics testtopic1 on dapr-pubsub and testtopic2 on dapr-highprio-pubsub
  3. Declare subscriptions for these topics on the service called sub which is a simple FastAPI DaprApp as so -
@dapr_app.subscribe(pubsub='dapr-pubsub', topic='testtopic1')
def testtopic1_subscriber(event: CloudEvent):
    _process_event(event)

@dapr_app.subscribe(pubsub='dapr-highprio-pubsub', topic='testtopic2')
def testtopic2_subscriber(event: CloudEvent):
    _process_event(event)

def _process_event(event: CloudEvent):
    if event.topic == 'testtopic1':
        print(f" Processing event from testtopic1 {event.data['eventid']} \n")
    elif event.topic == 'testtopic2':
        print(f" >>>>>>>>>>>>>>>>>>>>>>>>>> Processing event from testtopic2  {event.data['eventid']} \n")
    time.sleep(1)
    return {'success': True}
  1. Deploy these services to EKS cluster using Tilt

  2. Trigger the publisher to send events - use the provisioned ingress component's endpoint - the url to hit to trigger this event would be - http:<ENDPOINT>/trigger.

  3. Notice logs from the subscriber pod. you will see errors like this -

[daprd] time="2025-01-11T02:41:04.361593043Z" level=error msg="error while handling received message. error is: handler for (sanitized) topic: testtopic2 was not found" app_id=sub component="dapr-pubsub (pubsub.aws.snssqs/v1)" instance=sub-d7cfc96c-zk2pn scope=dapr.contrib type=log ver=1.14.4

Further the number of msges expected to be received is not consistent. For instance in the log file that is attached to this bug, i expect 10-events from testtopic1 and 10 from testtopic2. But there are much lower number of actual events received while also there being duplicate events processed. See attached log file for a full run.

  1. Notes to run reproducer app with RabbitMQ in local-cluster

    • I use Tilt to manage K8s deployments. It can be installed from brew.
    • my Tiltfile uses a local cluster called rancher-desktop to run this test. Update LOCAL_CLUSTER_CONTEXT in Tiltfile to the cluster context name that you intend to use.
    • Ensure that the cluster-context is actually active - kubectl config use-context <your-context>
    • tilt up to deploy the test-app to your cluster in namespace rw-dapr-test. tilt down to bring it down.
  2. NOTES to run reproducer app in EKS

    • I use Tilt to manage K8s deployments. It can be installed from brew.
    • I am assuming that the EKS-cluster has Dapr system installed and the cluster's default ServiceAccount has all required permissions to SQS/SNS etc for this to work.
    • my Tiltfile uses a cluster called sandbox to run this test. Update SANDBOX_CLUSTER_CONTEXT in Tiltfile to the cluster context name that you intend to use.
    • Ensure that the cluster-context is actually active - kubectl config use-context <your-context>
    • tilt up to deploy the test-app to your cluster in namespace rw-dapr-test. tilt down to bring it down.

Log file showing errors and inconsistent event handling -
dapr-mulitpubsub-fail.log

Reproducer project -
dapr-multi-pubsub.zip

@rakeshwalisheter rakeshwalisheter added the kind/bug Something isn't working label Jan 11, 2025
@rakeshwalisheter
Copy link
Author

Dapr version that is used in this test

- name: dapr
  repository: https://dapr.github.io/helm-charts/
  version: 1.14.4

@yaron2
Copy link
Member

yaron2 commented Jan 23, 2025

Thanks, this helps

Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Feb 22, 2025
@rakeshwalisheter
Copy link
Author

bump to prevent going stale.

@github-actions github-actions bot removed the stale label Feb 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants