Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic on too many open files error, should retry instead #43

Open
willmhowes opened this issue Aug 26, 2024 · 2 comments
Open

Panic on too many open files error, should retry instead #43

willmhowes opened this issue Aug 26, 2024 · 2 comments

Comments

@willmhowes
Copy link

willmhowes commented Aug 26, 2024

Received the following error more than once:

panic: open jobs/warcs/TCPK-20240826191443122-00001-crawl918.us.archive.org.warc.gz.open: too many open files

goroutine 119 [running]:
github.com/CorentinB/warc.isFileSizeExceeded({0xc8816c7f40?, 0xc00041e3f0?}, 0x408f400000000000)
        /var/www/go/pkg/mod/github.com/!corentin!b/[email protected]/utils.go:240 +0xf5
github.com/CorentinB/warc.recordWriter(0xc00049e0f0, 0xc00014c070, 0xc0001b48c0, 0xc00041e3c0)
        /var/www/go/pkg/mod/github.com/!corentin!b/[email protected]/warc.go:137 +0x45a
created by github.com/CorentinB/warc.(*RotatorSettings).NewWARCRotator in goroutine 1
        /var/www/go/pkg/mod/github.com/!corentin!b/[email protected]/warc.go:70 +0x8a

TODO: as stated in issue title, library should retry instead of panicking (according to @CorentinB)

@willmhowes
Copy link
Author

Here is the output of ulimit -a on the machine receiving the error:

➜  ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         0
-m: resident set size (kbytes)      unlimited
-u: processes                       430368
-n: file descriptors                65536
-l: locked-in-memory size (kbytes)  13778120
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 430368
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15:                              unlimited

@willmhowes
Copy link
Author

Note: I was running 3 Zeno crawls on the same HDD-based machine (meaning the I/O is slow) and with each Zeno instance configured with 125 workers. The solution may just be to run with less workers (maybe 8?)

@github-staff github-staff deleted a comment from Lxx-c Oct 23, 2024
@github-staff github-staff deleted a comment from Lxx-c Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@willmhowes and others