From b1c5922d1b7dffe2dd83dee82ba5fe43373cff1c Mon Sep 17 00:00:00 2001 From: Christian Kurze Date: Thu, 16 Nov 2023 11:55:59 +0100 Subject: [PATCH] Performance: Add hints to overload protection and `ANALYZE` command --- docs/handbook/performance/inserts/methods.rst | 5 +++ docs/handbook/performance/inserts/tuning.rst | 33 ++++++++++++++++++- 2 files changed, 37 insertions(+), 1 deletion(-) diff --git a/docs/handbook/performance/inserts/methods.rst b/docs/handbook/performance/inserts/methods.rst index 766f4663..dab9e77d 100644 --- a/docs/handbook/performance/inserts/methods.rst +++ b/docs/handbook/performance/inserts/methods.rst @@ -290,6 +290,11 @@ To test :ref:`bulk operations `, you should: Try out different setups and re-run the test. +Please note that ``INSERT INTO`` statements using a query, and the ``COPY FROM`` +statement, are using overload protection to ensure performance of other queries +in parallel. Refer to the :ref:`Overload Protection ` +documentation on how to modify these parameters. + At the end of this process, you will have a better understanding of the throughput of your cluster with different setups and under different loads. diff --git a/docs/handbook/performance/inserts/tuning.rst b/docs/handbook/performance/inserts/tuning.rst index a5be75c0..d50b350d 100644 --- a/docs/handbook/performance/inserts/tuning.rst +++ b/docs/handbook/performance/inserts/tuning.rst @@ -98,7 +98,22 @@ Translog If `translog.durability`_ is set to ``REQUEST`` (the default), the translog gets flushed after every operation. Setting this to ``ASYNC`` will improve insert performance, but it also worsens durability. If a node crashes before a -translog has been synced, those opperations will be lost. +translog has been synced, those operations will be lost. + +Overload Protection +------------------- + +The :ref:`Overload Protection ` settings +control how many resources operations like ``INSERT INTO FROM ...`` or ``COPY`` +can use. + +The default values serve as a starting point for an algorithm that dynamically +adapts the effective concurrency limit based on the round-trip time of requests. +Whenever one of these settings is updated, the previously calculated effective +concurrency is reset. + +Please update the settings accordingly, especially if you are benchmarking insert +performance. Refresh interval ---------------- @@ -113,6 +128,21 @@ If you know that your client application can tollerate a higher refresh interval, you can expect to see performance improvements if you increase this value. +Calculating statistics +---------------------- + +After loading larger amounts of data into new or existing tables, it is +recommended to re-calculate the statistics by executing the ``ANALYZE`` +command. The statistics will be used by the query optimizer to generate +better execution plans. + +The calculation of statistics happens periodically. The bandwidth used for +collecting statistics is limited by applying throttling based on the maximum +amount of bytes per second that can be read from data nodes. + +Please refer to the `ANALYZE`_ documentation for further information how to +change the calculation interval, and how to configure throttling settings. + Manual optimizing ----------------- @@ -129,6 +159,7 @@ However, if you are doing a lot of inserts, you may want to optimize tables (or even specific partitions) on your own schedule. If so, you can use the `OPTIMIZE`_ command. +.. _ANALYZE: https://cratedb.com/docs/crate/reference/en/latest/sql/statements/analyze.html .. _fulltext indexes: https://crate.io/docs/crate/reference/en/latest/sql/fulltext.html .. _natural primary key: https://en.wikipedia.org/wiki/Natural_key .. _OPTIMIZE: https://crate.io/docs/crate/reference/en/latest/sql/reference/optimize.html