ChunkRunner spawning a thousands of chunk-checkpoint-timer threads #120

Zerrossetto · 2019-05-27T20:46:33Z

Hi,
today I found that if time-limit was set to a big number of seconds my JVM struggled keeping track of thousands of threads named chunk-checkpoint-timer on certain job.

Investigating on the source code I found that org.jberet.runtime.runner.ChunkRunner class, when time-limit attribute on chunk tag was set bigger than 0, scheduled a timed thread just to update a boolean flag

            if (timeLimit > 0) {
                final Timer timer = new Timer("chunk-checkpoint-timer", true);
                timer.schedule(new TimerTask() {
                    @Override
                    public void run() {
                        processingInfo.timerExpired = true;
                    }
                }, timeLimit * 1000);
            }

I don't know if can be an intended behavior, but java.util.Timer class was not officially but practically deprecated in favor of java.util.concurrent.ScheduledThreadPoolExecutor. In any case here I tried to totally bypass the usage of threads and checking the expiration of the chunk only with a simpler check on the current millis with System.currentTimeMillis().

named "chunk-checkpoint-timer" when time-limit attribute was defined big enough in very quick chunk executions.

chengfang

Thanks for the PR. IIRC the use of timer was to avoid querying system time for every read, which can add up and be costly. But use of timer to update a flag has its own overhead. We haven't measured exactly which one performs better under which circumstances. Can you elaborate more on the symptom, e.g., why thousands of timer threads, are they all active at the same time?

We may want to use getNano() instead of currentTimeMillis, as the latter is subject to changes in user time adjustment.

This is an area we need to improve. Can you please create a JIRA issue ?

Zerrossetto · 2019-05-28T08:07:20Z

I created the issue JBERET-485, let's move the discussion there and decide how to proceed.

Zerrossetto · 2019-05-28T20:00:19Z

Hi, I'm done with the changes we discussed on Jira, feel free to approve it or add new comments.

Fixed case that caused ChunkRunner to spawn a huge amount of threads

72a63e6

named "chunk-checkpoint-timer" when time-limit attribute was defined big enough in very quick chunk executions.

chengfang reviewed May 27, 2019

View reviewed changes

Lorenzo Formenti added 2 commits May 28, 2019 21:54

Moved from currentTimeMillis() to nanoTime()

421b808

Forced long literal

8ebd7bd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChunkRunner spawning a thousands of chunk-checkpoint-timer threads #120

ChunkRunner spawning a thousands of chunk-checkpoint-timer threads #120

Zerrossetto commented May 27, 2019 •

edited

Loading

chengfang left a comment

Zerrossetto commented May 28, 2019

Zerrossetto commented May 28, 2019

ChunkRunner spawning a thousands of chunk-checkpoint-timer threads #120

Are you sure you want to change the base?

ChunkRunner spawning a thousands of chunk-checkpoint-timer threads #120

Conversation

Zerrossetto commented May 27, 2019 • edited Loading

chengfang left a comment

Choose a reason for hiding this comment

Zerrossetto commented May 28, 2019

Zerrossetto commented May 28, 2019

Zerrossetto commented May 27, 2019 •

edited

Loading