Skip to content

[Feature]: Create job "channels" with separate and different numbers af harvesterinstances  #2024

Open
@tuehlarsen

Description

@tuehlarsen

What change would you like to see?

We need different job "channels" to hook our harvest jobs up opon with pools of harvesterinstanses to avoid that some daily deeep jobs (100GB) are hanging on waiting on capacity for long time. That's is what happens if you schedule (via API) many jobs (> 10000 jobs) with only 4 harvesters.

Context

In DK beside our "on premis" single Browsertrix prod server with 3 crawler instances - we have 174 Heritrix crawler instances (installed "on premis" on 23 different physical and virtual servers in 2 locations), divided into different pools/channels for parallel harvesting: 128 allocated for broad crawls (low-priority channel), 42 (High priority channel) for small and big selective crawls and 4 for 24/7 RSS feed harvesting and other different focused harvesting channels).
Each harvester instance is hooked up on one channel in the harvester settings config - and can only be changed by changing the config and a harvester instance restart. Here are some screen dumps from our existing NetarchiveSuite GUI concerning channels - just show how it is implemented in NetarchiveSuite.

image
image
image

see above

Metadata

Metadata

Assignees

No one assigned

    Labels

    back endRequires back end dev workenhancementRequests a change to a feature

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions