Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Prometheus doc to introduce counting server number #1009

Merged
merged 2 commits into from
Apr 1, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion docs/pages/prometheus.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ All of the Pelican servers have the following metrics:

### `up`

Although a Prometheus built-in metric, Pelican uses this metric to record number of origin/cache servers in the federation, as Pelican director scrape all the storage servers in the federation and get their Promtheus metrics.
The Pelican director scrapes Prometheus metrics from all origins and cache servers that successfully advertise to the director. This metric reflects the Pelican origin or cache servers that are scraped by the director.

#### Label: `server_name`

Expand Down Expand Up @@ -195,6 +195,11 @@ All of the Pelican servers have the following metrics:

The storage server longitute.


### `# of Active Origins and Caches`

With the `up` metric, it is possible to count number of active origin and cache servers in the federation by a simple Promtheus query: `count(up{server_type=<"Origin">})` for counting origin servers, or `count(up{server_type=<"Cache">})` for counting cache servers.

### `pelican_director_total_ftx_test_suite`

The number of file transfer test suite the director issued. In Pelican, director creates a test file and sent to origin servers to as a health test. It issues such test suite when it receives the registration from the origin server. In a test suite, a timer was set to run a cylce of uploading, getting, and deleting the test file every 15 seconds. Such cycle is called a "test run". In theory, director should issue only one test for each origin servers; however, since the registration information was stored in a TTL cache in director, and it expires after certain period of time, and the test suite issued will be cancelled. A new test suite is issued with the new registration. Thus, director _can_ issue multiple test suites to an origin server.
Expand Down
Loading