Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure all operations can be run concurrently multiple times #23

Open
mitar opened this issue Dec 24, 2013 · 2 comments
Open

Make sure all operations can be run concurrently multiple times #23

mitar opened this issue Dec 24, 2013 · 2 comments

Comments

@mitar
Copy link
Member

mitar commented Dec 24, 2013

Make sure all operations can be run concurrently multiple times. There are two main issues.

Assuring that concurrent runs of downsampling do the expected thing (not overriding or duplicating work). Probably we could lock streams as they get started being downsampled and other runs skip them. We should make sure that they do not get locked indefinitely. Same for backprocessing of dependent streams.

Assuring that datapoints can be appended concurrently. Mostly this is already so and even for processing of dependent streams this is so. The only known issue is with derive operator which expects reset stream to be processed before data stream, so that it can know if reset happened or not. Maybe we should just document this and require user to assure that? Or should we make it work no matter the order? The issue with the latter path would be that it seems we would have to store not just datapoints when reset happened, but also when it did not.

@kostko
Copy link
Member

kostko commented Dec 28, 2013

We have now implemented the following:

  • Multiple downsample operations can be run concurrently and will use per-stream locking. (7a11b4c) Other downsamplers will not wait for the lock to be released, but will simply skip to the next stream. This introduced two new fields in stream metadata, _lock_mt that holds the timestamp when the lock will expire and downsample_count that holds a monotonically incrementing counter of performed downsample operations. During downsampling, if the lock is near expiry, we lengthen the lock.
  • Interleaving of append and downsample operations is handled properly. (99d6fd3, 7124324) Before inserting the datapoint we update stream metadata to reflect the timestamp of the last inserted datapoint. In order to properly handle cases where multiple appends to the same stream interleave with downsample operations, we use a safety margin of 10 seconds. We maintain a per-stream list of datapoint timestamps inserted (or in the middle of being inserted) in the last 10 seconds which is checked before performing downsampling to select a minimum timestamp of them all. This timestamp is then used as a reference point for downsampling the stream. This guarantees that if append takes less than 10 seconds to complete (between updating stream metadata and actual datapoint insertion) downsampling will be consistent and will not skip datapoints that are pending insertion.

Handling concurrent backprocessing and derived streams is still pending.

@mitar
Copy link
Member Author

mitar commented Dec 28, 2013

Just to add to the comment above. So currently it means that you can downsample only until 10s before the last datapoint. (10s is used for above mentioned safety margin.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants