Read algorithm #137

julienlau · 2021-07-08T09:52:13Z

Hello,

Thanks for this awesome tool.

I try to understand the Read path of the KeyValue stress test.

In particular, I don't understand exactly how the key read are determined.
From what I see the key generation is random and stateless (depends only on local thread).
So it means, if I pre-populate KeyValue table in a first run and then run a stress test in read only mode, there is limited chances (ie nbpreviousWrites/1e+6 with one thread) to read the PrimaryKey from the first run during second run ?
Maybe it would interesting to also report / count the number of null results for read queries ?

I'd like to understand this in order to try to target my tlpstress run to data read from disk / memory.
In addition I was thinking putting up something to perform a stress test on native secondary index.

Note: I saw the csv option but never use it.

I was thinking to switch to a key generation based on sequence because everything seems to be there for this, but I did not know the code well.

julienlau · 2021-07-08T10:57:49Z

In addition regarding the write path due to the key suffix capped at maxId=1000000 in com.thelastpickle.tlpstress.PartitionKeyGenerator#generateKey :

there is a strong chance that when doing a lot of queries with 1 thread, I am doing update instead of proper insert
when I populate using the default "--pg random" and one thread there is 0 chance to have reach 1m keys.
example : a populate using 2M keys generates only 865144 distinct keys:

dsbulk unload -h centos-vm-01 -k tlp_stress -t keyvalue -url ~/dsbulk
Operation directory: /home/jlu/src/tlp-stress/logs/UNLOAD_20210708-104529-163108
  total | failed |  rows/s | p50ms |  p99ms | p999ms
865,144 |      0 | 275,667 | 49.90 | 553.65 | 608.17

awk -F ',' '{print $1}' *.csv | grep -v key > key
cat key | awk -F '.' '{print $NF }' | sort -n
1
3
4
5
6
7
8
9
10
....
999990
999991
999992
999993
999994
999995
999996
999997
999998
999999

Maybe the doc could make this clearer and a warning message could be thrown when using populate with inconsistent values wrt nb threads ?

Maybe this parameter maxId should be more user facing ?

julienlau · 2021-07-12T07:32:06Z

Note for myself for quick'n'dirty secondary index test :

pre-create scheme keyvalue with SI
In the source code change the insert query: Insert value = key where key = key
pre-populate -n 1000000 -pg sequence -t XXX (the number of threads gives the total datasize in million of rows)
In the source code change the select query

--- a/src/main/kotlin/com/thelastpickle/tlpstress/profiles/KeyValue.kt
+++ b/src/main/kotlin/com/thelastpickle/tlpstress/profiles/KeyValue.kt
@@ -19,7 +19,7 @@ class KeyValue : IStressProfile {
 
     override fun prepare(session: Session) {
         insert = session.prepare("INSERT INTO keyvalue (key, value) VALUES (?, ?)")
-        select = session.prepare("SELECT * from keyvalue WHERE key = ?")
+        select = session.prepare("SELECT * from keyvalue WHERE value = ?")
         delete = session.prepare("DELETE from keyvalue WHERE key = ?")
     }
 
@@ -47,7 +47,8 @@ class KeyValue : IStressProfile {
             }
 
             override fun getNextMutation(partitionKey: PartitionKey): Operation {
-                val data = value.getText()
+//                val data = value.getText()
+                val data = partitionKey.getText()
                 val bound = insert.bind(partitionKey.getText(),  data)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read algorithm #137

Read algorithm #137

julienlau commented Jul 8, 2021

julienlau commented Jul 8, 2021 •

edited

Loading

julienlau commented Jul 12, 2021

Read algorithm #137

Read algorithm #137

Comments

julienlau commented Jul 8, 2021

julienlau commented Jul 8, 2021 • edited Loading

julienlau commented Jul 12, 2021

julienlau commented Jul 8, 2021 •

edited

Loading