Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
velox committed Dec 7, 2023
1 parent d32fabf commit b07a747
Show file tree
Hide file tree
Showing 8 changed files with 394 additions and 14 deletions.
2 changes: 1 addition & 1 deletion docs/_sources/index.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Velox Documentation
functions
spark_functions
configs
stats
metrics
bindings/python/README_generated_pyvelox
develop
programming-guide
122 changes: 122 additions & 0 deletions docs/_sources/metrics.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
==============
Runtime Metric
==============

Runtime metric is used to collect the metrics of important velox runtime events
for monitoring purpose. The collected metrics can provide insights into the
continuous availability and performance analysis of a Velox runtime system. For
instance, the collected data can help automatically generate alerts at an
outage. Velox provides a framework to collect the metrics which consists of
three steps:

**Define**: define the name and type for the metric through DEFINE_METRIC and
DEFINE_HISTOGRAM_METRIC macros. DEFINE_HISTOGRAM_METRIC is used for histogram
metric type and DEFINE_METRIC is used for the other types (see metric type
definition below). BaseStatsReporter provides methods for metric definition.
Register metrics during startup using registerVeloxMetrics() API.

**Record**: record the metric data point using RECORD_METRIC_VALUE and
RECORD_HISTOGRAM_METRIC_VALUE macros when the corresponding event happens.
BaseStatsReporter provides methods for metric recording.

**Export**: aggregates the collected data points based on the defined metrics,
and periodically exports to the backend monitoring service, such as ODS used by
Meta, Apache projects `OpenCensus <https://opencensus.io/>`_ and `Prometheus <https://prometheus.io/>`_ provided by OSS. A derived
implementation of BaseStatsReporter is required to integrate with a specific
monitoring service. The metric aggregation granularity and export interval are
also configured based on the actual used monitoring service.

Velox supports five metric types:

**Count**: tracks the count of events, such as the number of query failures.

**Sum**: tracks the sum of event data point values, such as sum of query scan
read bytes.

**Avg**: tracks the average of event data point values, such as average of query
execution time.

**Rate**: tracks the sum of event data point values per second, such as the
number of shuffle requests per second.

**Histogram**: tracks the distribution of event data point values, such as query
execution time distribution. The histogram metric divides the entire data range
into a series of adjacent equal-sized intervals or buckets, and then count how
many data values fall into each bucket. DEFINE_HISTOGRAM_STAT specifies the data
range by min/max values, and the number of buckets. Any collected data value
less than min is counted in min bucket, and any one larger than max is counted
in max bucket. It also allows to specify the value percentiles to report for
monitoring. This allows BaseStatsReporter and the backend monitoring service to
optimize the aggregated data storage.

Memory Management
-----------------

.. list-table::
:widths: 40 10 50
:header-rows: 1

* - Metric Name
- Type
- Description
* - cache_shrink_count
- Count
- The number of times that in-memory data cache has been shrunk under
memory pressure.
* - cache_shrink_ms
- Histogram
- The distribution of cache shrink latency in range of [0, 100s] with 10
buckets. It is configured to report the latency at P50, P90, P99, and
P100 percentiles.
* - memory_reclaim_exec_ms
- Histogram
- The distribution of memory reclaim execution time in range of [0, 600s]
with 20 buckets. It is configured to report latency at P50, P90, P99, and
P100 percentiles.
* - memory_reclaim_bytes
- Sum
- The sum of reclaimed memory bytes.
* - memory_reclaim_wait_ms
- Histogram
- The distribution of memory reclaim wait time in range of [0, 60s] with 10
buckets. It is configured to report latency at P50, P90, P99, and P100
percentiles.
* - memory_reclaim_wait_timeout_count
- Count
- The number of times that the memory reclaim wait timeouts.
* - memory_non_reclaimable_count
- Count
- The number of times that the memory reclaim fails because of
non-reclaimable section which is an indicator that the memory reservation
is not sufficient.

Spilling
--------

.. list-table::
:widths: 40 10 50
:header-rows: 1

* - Metric Name
- Type
- Description
* - spill_max_level_exceeded_count
- Count
- The number of times that a spillable operator hits the max spill level
limit.

Hive Connector
--------------

.. list-table::
:widths: 40 10 50
:header-rows: 1

* - Metric Name
- Type
- Description
* - hive_file_handle_generate_latency_ms
- Histogram
- The distribution of hive file open latency in range of [0, 100s] with 10
buckets. It is configured to report latency at P50, P90, P99, and P100
percentiles.
8 changes: 4 additions & 4 deletions docs/bindings/python/README_generated_pyvelox.html
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
<link rel="index" title="Index" href="../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" />
<link rel="next" title="Developer Guide" href="../../develop.html" />
<link rel="prev" title="Runtime Metric" href="../../stats.html" />
<link rel="prev" title="Runtime Metric" href="../../metrics.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
Expand All @@ -29,7 +29,7 @@ <h3>Navigation</h3>
<a href="../../develop.html" title="Developer Guide"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="../../stats.html" title="Runtime Metric"
<a href="../../metrics.html" title="Runtime Metric"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="../../index.html">Velox documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">PyVelox: Python bindings and extensions for Velox</a></li>
Expand Down Expand Up @@ -125,7 +125,7 @@ <h3><a href="../../index.html">Table of Contents</a></h3>
</div>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="../../stats.html"
<p class="topless"><a href="../../metrics.html"
title="previous chapter">Runtime Metric</a></p>
</div>
<div>
Expand Down Expand Up @@ -164,7 +164,7 @@ <h3>Navigation</h3>
<a href="../../develop.html" title="Developer Guide"
>next</a> |</li>
<li class="right" >
<a href="../../stats.html" title="Runtime Metric"
<a href="../../metrics.html" title="Runtime Metric"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="../../index.html">Velox documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">PyVelox: Python bindings and extensions for Velox</a></li>
Expand Down
8 changes: 4 additions & 4 deletions docs/configs.html
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
<script src="_static/doctools.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Runtime Metric" href="stats.html" />
<link rel="next" title="Runtime Metric" href="metrics.html" />
<link rel="prev" title="Window functions" href="functions/spark/window.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
Expand All @@ -26,7 +26,7 @@ <h3>Navigation</h3>
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="stats.html" title="Runtime Metric"
<a href="metrics.html" title="Runtime Metric"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="functions/spark/window.html" title="Window functions"
Expand Down Expand Up @@ -818,7 +818,7 @@ <h4>Previous topic</h4>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="stats.html"
<p class="topless"><a href="metrics.html"
title="next chapter">Runtime Metric</a></p>
</div>
<div role="note" aria-label="source link">
Expand Down Expand Up @@ -849,7 +849,7 @@ <h3>Navigation</h3>
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="stats.html" title="Runtime Metric"
<a href="metrics.html" title="Runtime Metric"
>next</a> |</li>
<li class="right" >
<a href="functions/spark/window.html" title="Window functions"
Expand Down
8 changes: 4 additions & 4 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -113,10 +113,10 @@ <h1>Velox Documentation<a class="headerlink" href="#velox-documentation" title="
<li class="toctree-l2"><a class="reference internal" href="configs.html#spark-specific-configuration">Spark-specific Configuration</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="stats.html">Runtime Metric</a><ul>
<li class="toctree-l2"><a class="reference internal" href="stats.html#memory-management">Memory Management</a></li>
<li class="toctree-l2"><a class="reference internal" href="stats.html#spilling">Spilling</a></li>
<li class="toctree-l2"><a class="reference internal" href="stats.html#hive-connector">Hive Connector</a></li>
<li class="toctree-l1"><a class="reference internal" href="metrics.html">Runtime Metric</a><ul>
<li class="toctree-l2"><a class="reference internal" href="metrics.html#memory-management">Memory Management</a></li>
<li class="toctree-l2"><a class="reference internal" href="metrics.html#spilling">Spilling</a></li>
<li class="toctree-l2"><a class="reference internal" href="metrics.html#hive-connector">Hive Connector</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="bindings/python/README_generated_pyvelox.html">PyVelox: Python bindings and extensions for Velox</a><ul>
Expand Down
Loading

0 comments on commit b07a747

Please sign in to comment.