Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout service #5

Open
1 task
ojow opened this issue May 28, 2018 · 1 comment
Open
1 task

Timeout service #5

ojow opened this issue May 28, 2018 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@ojow
Copy link
Contributor

ojow commented May 28, 2018

Timeout service

introduction

Xray 1.0

As described by Xray 1.0 doc:

The timeout service is used a scheduler by other services. Other services such as the Rule Engine and the Retry one depends on it.

The idea is that it's a standalone, distributed, fault-tolerant service communicating with external world through RabbitMQ. The goal of the service is to publish so called timer.Expired events at the specified point in time.
The API is (quoting the 1.0 doc):

Rwait(x, unit): wait x unit of time. Default unit is seconds, available: minutes, hours.
ScheduleTimerAt(timestamp): Wait until timestamp is reached. timestamp must be in seconds.

The 1.0 implementation uses Aerospike for fault-tolerance: all the timeout requests are stored there and are read in memory at the startup by scanning Aerospike.

Xray OAM

The proposed implementation is based on the prototype developed using Scala+Kafka Streams.

The required Kafka Streams components are:

  • the request KStream - basically it's a stream of requests from a Kafka topic
  • local persistent key-value store (backed by RocksDB by default, easily replaceable with open API) to store and sort the timer requests
  • Punctuator - the Kafka Stream abstraction for internal scheduling

The implementation is very simple: every request is stored in the local storage, Punctuator gets the oldest (up until now) requests from the store every, let's say, 100ms and publishes the timer.Expired events to the output topic while also deleting them from the local storage. Cancel requests are handled trivially by just deleting the records from the local storage.

Kafka Streams not only allows for easy and transparent implementation but also adds easily configurable delivery guarantees (including exactly once solution using Kafka transactions). The solution is automatically scalable because of consistent hashing approach used with Kafka topics (meaning that timeout requests for the same entity will be handled by the same node and resulting events will be published to the same partition).

The plan is to implement the same logic with Clojure, adding the Timeout Service a component of duct-tape/integrant. This is flexible enough to run it either as part of the Rule Engine process or a separate deployment.

scope and result

This is mostly just a port of the existing service with some improvements coming "for free" with Kafka Streams. In particular we don't aim to improve the granularity and maximum latency.

Expected deliveries:

  • Rwait(x, unit) and ScheduleTimerAt(timestamp) sites should be available for Orc programs
  • A Wiki page describing the service and the sites
  • Timeout service providing the sites should be implemented as part of the architecture of Clojure-based Rule Engine
  • Benchmarking and cost analytics results for the service should be included in the related report
@ojow ojow self-assigned this May 28, 2018
@ojow ojow added the enhancement New feature or request label May 28, 2018
@ojow ojow added this to the Xray 1.9 milestone May 28, 2018
ojow added a commit that referenced this issue May 29, 2018
ojow added a commit that referenced this issue May 30, 2018
ojow added a commit that referenced this issue May 30, 2018
ojow added a commit that referenced this issue May 30, 2018
ojow added a commit that referenced this issue May 30, 2018
ojow added a commit that referenced this issue May 30, 2018
ojow added a commit that referenced this issue May 31, 2018
ojow added a commit that referenced this issue May 31, 2018
ojow added a commit that referenced this issue May 31, 2018
ojow added a commit that referenced this issue May 31, 2018
ojow added a commit that referenced this issue May 31, 2018
ojow added a commit that referenced this issue May 31, 2018
ojow added a commit that referenced this issue May 31, 2018
ojow added a commit that referenced this issue May 31, 2018
ojow added a commit that referenced this issue May 31, 2018
ojow added a commit that referenced this issue Jun 1, 2018
ojow added a commit that referenced this issue Jun 1, 2018
ojow added a commit that referenced this issue Jun 1, 2018
ojow added a commit that referenced this issue Jun 4, 2018
ojow added a commit that referenced this issue Jun 4, 2018
prepor pushed a commit that referenced this issue Jun 4, 2018
ojow added a commit that referenced this issue Jun 4, 2018
@ojow
Copy link
Contributor Author

ojow commented Jun 5, 2018

Added wiki page: https://github.com/xray-tech/xray/wiki/Timeout-Service
More details will be added later, when we decide with Andrew how to add sites and how to benchmark.

@JanC JanC unassigned ojow Jul 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant