Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle gran changes #2

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

handle gran changes #2

wants to merge 3 commits into from

Conversation

pjain1
Copy link
Owner

@pjain1 pjain1 commented Nov 9, 2015

Second take on handling tranquility restarts with segment granularity changes.
Now the beams expose method to get the interval which can be used instead of segment start time to decide which beam an event belongs to. Also a reverse sorted list of active beams is maintained which can be used to lookup the beams when events are grouped by beams. It is reverse sorted as most of the times we expect to get the most latest beam.

@pjain1
Copy link
Owner Author

pjain1 commented Nov 10, 2015

A group of events will be sent to propogateEvents which is method in Beam class. A Beam is a structure that exposes Interval and has other task information. Changes to trait Beam done to include getInterval method.

propogateEvents()
Input - List[Events]
Implementation -
    Groups <- groupEvents(List[Events], ReverseSortedList[Beam])
    For each group in Groups
        call beam(group.Pair[Interval, Option[Beam]]) to get Beam to propogate the events
        propogate the events using the Beam

groupEvents()
Input - List[events], ReverseSortedList[Beam]
Output - Map of Pair(Interval, BeamIndex[ReverseSortedList]) -> List[events]
Implementation -
        For each event in events
            Search the list and find the Beam having Interval that can ingest this event
                If we find a beam then
                    Create or get the Pair(Interval, BeamIndex[ReverseSortedList]) with the found Beam and associate event with this Pair
                Else {
                    Create or get the Pair(segmentBucket(eventTimestamp), null) and associate event with this Pair
                    // Use current segment granularity to find out the segmentBucket
                }
        return the Map

beam()
Input - Pair(Interval, Option[Beam])
Output - Beam to be used to propogate data to Druid Tasks
Implementation -
    Check the Interval to see if it is not too old or not to new if it is then drop the events and return a NoopBeam
    If 
        there is a valid input beam in the Pair return that beam
    else
        Check whether the required interval overlaps with any current beam if yes truncate the interval accordingly
        Create a new Beam using the Interval
        Submit a new Indexing task for this Beam
        Update the ReverseSortedList[Beam] data structure
        Find out beams that are expired and need to go away
        Update the ZKMetadata to reflect these changes
        Return the newly created Beam

case e: Throwable =>
// Log Throwables to avoid invisible errors caused by https://github.com/twitter/util/issues/100.
log.error(e, "Failed to sync with zookeeper: %s", identifier)
throw e
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are logging and throwing. Should pick one. Given that this is happening in an initializer, throwing is likely going to mean that the whole process dies, which might be good?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like twitter/util#100 has been fixed, makes sense to let the process die

@cheddar
Copy link

cheddar commented Nov 16, 2015

In general, this lgtm. I can't speak about idiomatic Scala because I'm not that familiar with what idiomatic scala is, but it should be ok to take to the community (maybe after we settle on what to do with the loop part?)

@pjain1
Copy link
Owner Author

pjain1 commented Nov 16, 2015

so the loop part is resolved, right ?

@pjain1 pjain1 mentioned this pull request Nov 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants