-
Notifications
You must be signed in to change notification settings - Fork 91
Configuration
There are two aspects to configuring the crawler, infrastructure settings and runtime configuration. The infrastructure settings identify storage, queuing and redis services, and define crawler instance identifiers. The runtime configuration covers topics such as concurrency, timeouts, and org filters. Most (though not all) aspects of this configuration can be changed dynamically and affect the currently running crawlers. Some configurations and all settings require a restart of the crawler processes to take effect.
The runtime configuration is expressed as JSON and is broken into a discrete object for each of the crawler subsystems, each of which is detailed below. As you can have many instances of the crawler working together at once, the configuration shared in Redis and changed centrally. Each crawler subscribes to changes so changing a configuration for one, changes it for all crawlers with the same name.
{
"NODE_ENV": "localhost",
"CRAWLER_MODE": "Standard",
"CRAWLER_OPTIONS_PROVIDER": ["defaults" | "memory" | "redis"],
"CRAWLER_INSIGHTS_KEY": "[SECRET]",
"CRAWLER_ORGS_FILE": "../orgs",
"CRAWLER_GITHUB_TOKENS": "[SECRET]",
"CRAWLER_REDIS_URL": "peoplesvc-dev.redis.cache.windows.net",
"CRAWLER_REDIS_ACCESS_KEY": "[SECRET]",
"CRAWLER_REDIS_PORT": 6380,
"CRAWLER_QUEUE_PROVIDER": "amqp10",
"CRAWLER_AMQP10_URL": "amqps://RootManageSharedAccessKey:[SECRET]@ghcrawlerdev.servicebus.windows.net",
"CRAWLER_QUEUE_PREFIX": "ghcrawlerdev",
"CRAWLER_STORE_PROVIDER": "azure",
"CRAWLER_STORAGE_NAME": "ghcrawlerdev",
"CRAWLER_STORAGE_ACCOUNT": "ghcrawlerdev",
"CRAWLER_STORAGE_KEY": "[SECRET]",
"CRAWLER_DOCLOG_STORAGE_ACCOUNT": "ghcrawlerdev",
"CRAWLER_DOCLOG_STORAGE_KEY": "[SECRET]"
}