Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite taskcluster-events with a proper read-only RabbitMQ / websocket proxy #104

Closed
wants to merge 1 commit into from

Conversation

jonasfj
Copy link

@jonasfj jonasfj commented Mar 9, 2018

At mozilla we have a RabbitMQ server called pulse.mozilla.org, please read about this setup here. In short the system works as follows...

Publishers:

  • People who wants to publish messages
  • Registers a <username> at pulseguardian.mozilla.org
  • Creates a topic exchange name: exchanges/<username>/<exchangeName>
  • Publishes durable JSON messages to the exchange at-least once for each event
  • Events are published with a routing key of dot separated words, such as: sleeping.abc.monkey

Consumers:

  • People who wants to listen for messages
  • Registers a <username> at pulseguardian.mozilla.org
  • Creates a queue named queues/<username>/<queueName>
  • Binds the exchange of interest to the queue created, with a routing-key pattern:
    • # binds to anything
    • sleeping.# binds to anything where the first key is sleeping
    • sleeping.*.monkey binds to anything where the first key is sleeping and 3rd key is monkey
    • (see RabbitMQ topic exchanges for more details)
  • Client now reads messages from the queue and acknowleges messages when received

Note:
Often the <queueName> is a random UUID and the queue is set to be exclusive meaning other RabbitMQ connections can't consume events from the queue; and the queue also set to be auto-deleted when the connection is closed. Doing this ensures that queues are cleaned up, but also prevents reconnecting without risk of loosing messages.

taskcluster-events is a service that listens for incoming websockets, accepts websocket messages in JSON that say what exchanges and routing keys to bind to... it then creates a RabbitMQ channel and queue for each websocket and forwards messages from the queue to the websocket. This way web clients like the [pulse inspector] (https://tools.taskcluster.net/pulse-inspector/) can connect and listen for pulse events without RabbitMQ credentials, nor do web clients need to create a RabbitMQ connection which runs over TCP, making it hard to use RabbitMQ from a web-browser.

However, taskcluster-events is old poorly design, without protocol specification and fairly buggy. This RFC proposes a full rewrite a taskcluster-events which includes a full specification for the websocket protocol.
The project should aim to:

  • Provide a fairly generic read-only websocket proxy for listening to RabbitMQ exchanges,
  • Provide decent abuse protection against build-up of larges queues and too many connections,
    (prefering to refuse connections/clients rather than risk overloading the RabbitMQ cluster)
  • Contain a full specification of the websocket protocol,
  • A client implementation in pure-javascript using WebSockets
  • Prevent webclients from consuming from queues they didn't create, publishing messages or otherwise interfere with other users of the RabbitMQ cluster.

This probably means inventing some fairly high-level websocket protocol.
Rather than trying to proxy the AMQP protocol over a websocket. Though one approach could be to proxy the AMQP protocol, and support enforcement of various limitations, such that clients can't do bad things.

This could be a GSoC or Outreachy project, or just a fun thing to work on.
@jonasfj and @eliperelman would happily mentor this project as GSoC, Outreachy, or fun side-project.


Compared to taskcluster-events this will probably support:

  • one RabbitMQ channel/queue per websocket connection,
  • bind/unbind to exchanges
  • resume/pause
  • automatic reconnection
  • better abuse protection
  • this probably won't handle reconnection without risk of message drops (this would be complicated, with risk of leaking resources in face of crashing nodes)

@muddlebee
Copy link

muddlebee commented Feb 3, 2018

Hey @jonasfj @eliperelman where do I start? This seems like a big fix. I want to apply to GSOC. Will be glad if you could help

@jonasfj
Copy link
Author

jonasfj commented Feb 4, 2018

If you want to apply for GSoC I would suggest to find some minor bugs to work on.

@eliperelman, do you have any good getting started bugs? I have so far proposed:

But maybe we can come up with some simpler ones? Maybe some minor tweaks to the tools site?


Also ping us in #tc-contributors on irc.mozilla.org, I'm always connected to IRC, even when I'm sleeping :)
(just don't expect me to answer while I'm sleeping, I haven't perfected those skills yet)

@muddlebee
Copy link

Maybe some minor tweaks to the tools site?

By this are you referring to : https://github.com/taskcluster/taskcluster-docs ? @jonasfj
That would be a good start

@jonasfj
Copy link
Author

jonasfj commented Feb 4, 2018

Please move these questions to IRC, I don't want to pollute this RFC with unrelated information.
I'm jonasfj on irc we're in #tc-contributors and #taskcluster on irc.mozilla.org

@eliperelman
Copy link

@jonasfj nothing relevant to this, other than taskcluster/taskcluster-tools#353

@djmitche
Copy link
Contributor

djmitche commented Mar 9, 2018

@jonasfj we need to agree on this soon, before GSoC starts :)

@jonasfj
Copy link
Author

jonasfj commented Mar 9, 2018

@djmitche, any contentious parts here?

I guess you want me to push buttons and make a PR...

@djmitche
Copy link
Contributor

djmitche commented Mar 9, 2018

It seems like everything is contentious these days, so I'm guessing, yes, there are contentious parts here too. Yeah, please write up an RFC using the template and make a PR.

Copy link
Contributor

@djmitche djmitche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not jump over the "proposal" stage :)

Can you give a little more detail on what "abuse prevention" means?

## Motivation

* taskcluster-tools
* pulse-inspector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these two things the same? Or is the difference here that "taskcluster-tools" includes things like the task inspector, to dynamically update pages; while "pulse-inspector" is the pulse inspector specifically and its ability to listen to any exchange..


# Summary

Rewrite taskcluster-events to support:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: which of these are parts of a new PulseListener/PulseConnection, and which are parts of the proposed events service?

@jonasfj
Copy link
Author

jonasfj commented May 29, 2018

This have been accepted as a GSoC project, and we've changed a few things, such as probably not doing reconnect to keep things simple.

@jonasfj jonasfj closed this May 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants