Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrawlTopologyTest.testAsync now fails for me #151

Open
Schmed opened this issue May 9, 2018 · 5 comments
Open

CrawlTopologyTest.testAsync now fails for me #151

Schmed opened this issue May 9, 2018 · 5 comments
Assignees
Labels
in progress actively being worked on task

Comments

@Schmed
Copy link
Member

Schmed commented May 9, 2018

CrawlTopologyTest.testAsync now fails on my MacBook Air on both Flink 1.4 and 1.5:

Tests in error:
CrawlTopologyTest.testAsync:136 » Runtime Job 'flink-crawler' did not terminat...

I see the following exceptions in the log output soon after the seed URLs source gets closed down:

18/05/09 15:35:27 INFO taskmanager.Task:926 - DomainDBFunction (2/3) (61fa93b752a29f330d09885bf0e78d17) switched from RUNNING to FAILED.
java.lang.IllegalStateException: Trailing data in checkpoint barrier handler.
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:227)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:103)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:307)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
at java.lang.Thread.run(Thread.java:748)

See issue_151.log.zip

@Schmed
Copy link
Member Author

Schmed commented May 9, 2018

This test passes, though, if I add the following line just after checkpointing gets enabled:

env.getCheckpointConfig().setMinPauseBetweenCheckpoints(20_000L);

@kkrugler kkrugler added task in progress actively being worked on labels Jun 27, 2018
@kkrugler
Copy link
Member

Please confirm once #157 has been merged.

@Schmed
Copy link
Member Author

Schmed commented Jun 27, 2018

mvn clean package now succeeds for me, but I'll try again after you get things fixed for @vmagotra

@kkrugler
Copy link
Member

@Schmed - I'd give this another try, now that I've merged in changes to use timers with a ProcessFunction

@vmagotra
Copy link
Contributor

vmagotra commented Jul 18, 2018

@kkrugler - it still fails for me (branch=master)...

CrawlTopologyTest.testFocused:145 URL 'http://domain1.com/page11' not logged by class com.scaleunlimited.flinkcrawler.functions.FetchUrlsFunction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in progress actively being worked on task
Projects
None yet
Development

No branches or pull requests

3 participants