Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChunkRunner spawning a thousands of chunk-checkpoint-timer threads #120

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Zerrossetto
Copy link
Contributor

@Zerrossetto Zerrossetto commented May 27, 2019

Hi,
today I found that if time-limit was set to a big number of seconds my JVM struggled keeping track of thousands of threads named chunk-checkpoint-timer on certain job.

Investigating on the source code I found that org.jberet.runtime.runner.ChunkRunner class, when time-limit attribute on chunk tag was set bigger than 0, scheduled a timed thread just to update a boolean flag

            if (timeLimit > 0) {
                final Timer timer = new Timer("chunk-checkpoint-timer", true);
                timer.schedule(new TimerTask() {
                    @Override
                    public void run() {
                        processingInfo.timerExpired = true;
                    }
                }, timeLimit * 1000);
            }

I don't know if can be an intended behavior, but java.util.Timer class was not officially but practically deprecated in favor of java.util.concurrent.ScheduledThreadPoolExecutor. In any case here I tried to totally bypass the usage of threads and checking the expiration of the chunk only with a simpler check on the current millis with System.currentTimeMillis().

named "chunk-checkpoint-timer" when time-limit attribute was defined big
enough in very quick chunk executions.
Copy link
Contributor

@chengfang chengfang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. IIRC the use of timer was to avoid querying system time for every read, which can add up and be costly. But use of timer to update a flag has its own overhead. We haven't measured exactly which one performs better under which circumstances. Can you elaborate more on the symptom, e.g., why thousands of timer threads, are they all active at the same time?

We may want to use getNano() instead of currentTimeMillis, as the latter is subject to changes in user time adjustment.

This is an area we need to improve. Can you please create a JIRA issue ?

@Zerrossetto
Copy link
Contributor Author

I created the issue JBERET-485, let's move the discussion there and decide how to proceed.

@Zerrossetto
Copy link
Contributor Author

Hi, I'm done with the changes we discussed on Jira, feel free to approve it or add new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants