Skip to content
Brian Hawkins edited this page Mar 30, 2016 · 7 revisions

So we are often asked the question of how many tags is too many or why is the query so slow. This post will explain what happens during a query and why it is fast or slow and what the number of tags does to query speed.

How are tags stored:

To understand performance you need to understand how tags are stored in Cassandra with Kairos. For a given metric every tag/value combination is written to a separate partition (using the CQL nomenclature). The partition key is also written to an index partition based on the metric name.

So for example if I had two tags customer and host and each had three values custA, custB, custC and server1, server2, server3 respectively. All the data for this metric will be written to 9 partitions and there will be 9 entries in the index. This assumes that each customer can be on either host. Now because the data is bucketed by 3 weeks it means that every 3 weeks we will get 9 more entries in the index and the new data will be written to 9 different paritions.

What does this mean for writing data?

Not much, the number of tag/values do help to spread the data across the Cassandra cluster but has very little effect on how fast Kairos works.

What does the number of tags do for queries?

To understand this we need to explain how queries are done. Queries are done in two phases.

Phase 1:

Kairos reads the row key index. The way Kairos is currently written it has to read all the keys for a given time frame. So in our above example if I were to query a 5 week period that spans two 3 week partitions of data, for this query I'll read 18 keys from the index. After the keys are read we filter them out based on the tags that are specified in the query, then we are on to phase 2.

Phase 2:

Kairos fetches the data from the partitions. Kairos tries to use multi gets so the data is not retrieved from a single partition at at time.

For phase 1 having a lot of tag/value combinations will adversely effect the query because Kairos has to read more keys back for every query.

For phase 2 a lot of tag/value combinations only have an adverse effect if you do not filter the query. So we don't want to have too many partitions yet on the other hand we don't want to read it all from just one partition. There is kind of a sweet spot depending on the number of Cassandra nodes. If your data is on 4 Cassandra nodes you don't want to read it all from just a single node.

So if you have a million tag/value combinations, phase 1 will always take a few seconds to complete. Phase 2 could be fast if you filter by tags so that only a few partitions are read or it could be slow if you have to read the data from all one million partitions.

How do you know if what you are planning is going to be fast enough? From our experience if you have 10's of thousands of tag combinations you will be fine. Once you start getting towards a million you will be in for trouble. This also depends on the hardware you are using so, setting up a test is a good idea.

Profile Query:

There are a set of metrics you can use to find out what portion of your query is taking the longest.

kairosdb.http.request_time - Measures the entire time it takes to receive, parse and process a query which also includes writing the response to a temp file as json. You will have to filter on the tag 'request' and set the value to '/datapoints/query'. This also includes time waiting in the queue if more than concurrentQueryThreads is running.

kairosdb.http.query_time - Measures the time it takes to send the query to the backend and be processed. This is measured per query. If a request has more than one metric to query this metric only measures each one individually.

kairosdb.datastore.query_time - Measures how long the datastore takes to process the query. Does not include time it takes to aggregate the data.

kairosdb.datastore.cassandra.key_query_time - Measures how long it takes to query the row key information.

Each of the above metrics are reported each time a query is made to the backend. Each metric is decreasing in the amount of work it is measuring. So If I want to know how long it takes to read the data out of Cassandra I could take the kairosdb.datastore.query_time and subtract out the key_query_time.

Future:

We have plans to increase the query speed. Mostly we are looking at ways of reducing the time phase 1 takes.