You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
panic: open jobs/warcs/TCPK-20240826191443122-00001-crawl918.us.archive.org.warc.gz.open: too many open files
goroutine 119 [running]:
github.com/CorentinB/warc.isFileSizeExceeded({0xc8816c7f40?, 0xc00041e3f0?}, 0x408f400000000000)
/var/www/go/pkg/mod/github.com/!corentin!b/[email protected]/utils.go:240 +0xf5
github.com/CorentinB/warc.recordWriter(0xc00049e0f0, 0xc00014c070, 0xc0001b48c0, 0xc00041e3c0)
/var/www/go/pkg/mod/github.com/!corentin!b/[email protected]/warc.go:137 +0x45a
created by github.com/CorentinB/warc.(*RotatorSettings).NewWARCRotator in goroutine 1
/var/www/go/pkg/mod/github.com/!corentin!b/[email protected]/warc.go:70 +0x8a
TODO: as stated in issue title, library should retry instead of panicking (according to @CorentinB)
The text was updated successfully, but these errors were encountered:
Note: I was running 3 Zeno crawls on the same HDD-based machine (meaning the I/O is slow) and with each Zeno instance configured with 125 workers. The solution may just be to run with less workers (maybe 8?)
Received the following error more than once:
TODO: as stated in issue title, library should retry instead of panicking (according to @CorentinB)
The text was updated successfully, but these errors were encountered: