Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read algorithm #137

Open
julienlau opened this issue Jul 8, 2021 · 2 comments
Open

Read algorithm #137

julienlau opened this issue Jul 8, 2021 · 2 comments

Comments

@julienlau
Copy link

Hello,

Thanks for this awesome tool.

I try to understand the Read path of the KeyValue stress test.

In particular, I don't understand exactly how the key read are determined.
From what I see the key generation is random and stateless (depends only on local thread).
So it means, if I pre-populate KeyValue table in a first run and then run a stress test in read only mode, there is limited chances (ie nbpreviousWrites/1e+6 with one thread) to read the PrimaryKey from the first run during second run ?
Maybe it would interesting to also report / count the number of null results for read queries ?

I'd like to understand this in order to try to target my tlpstress run to data read from disk / memory.
In addition I was thinking putting up something to perform a stress test on native secondary index.

Note: I saw the csv option but never use it.

I was thinking to switch to a key generation based on sequence because everything seems to be there for this, but I did not know the code well.

@julienlau
Copy link
Author

julienlau commented Jul 8, 2021

In addition regarding the write path due to the key suffix capped at maxId=1000000 in com.thelastpickle.tlpstress.PartitionKeyGenerator#generateKey :

  • there is a strong chance that when doing a lot of queries with 1 thread, I am doing update instead of proper insert
  • when I populate using the default "--pg random" and one thread there is 0 chance to have reach 1m keys.
  • example : a populate using 2M keys generates only 865144 distinct keys:
dsbulk unload -h centos-vm-01 -k tlp_stress -t keyvalue -url ~/dsbulk
Operation directory: /home/jlu/src/tlp-stress/logs/UNLOAD_20210708-104529-163108
  total | failed |  rows/s | p50ms |  p99ms | p999ms
865,144 |      0 | 275,667 | 49.90 | 553.65 | 608.17
awk -F ',' '{print $1}' *.csv | grep -v key > key
cat key | awk -F '.' '{print $NF }' | sort -n
1
3
4
5
6
7
8
9
10
....
999990
999991
999992
999993
999994
999995
999996
999997
999998
999999

Maybe the doc could make this clearer and a warning message could be thrown when using populate with inconsistent values wrt nb threads ?

Maybe this parameter maxId should be more user facing ?

@julienlau
Copy link
Author

Note for myself for quick'n'dirty secondary index test :

  • pre-create scheme keyvalue with SI
  • In the source code change the insert query: Insert value = key where key = key
  • pre-populate -n 1000000 -pg sequence -t XXX (the number of threads gives the total datasize in million of rows)
  • In the source code change the select query
--- a/src/main/kotlin/com/thelastpickle/tlpstress/profiles/KeyValue.kt
+++ b/src/main/kotlin/com/thelastpickle/tlpstress/profiles/KeyValue.kt
@@ -19,7 +19,7 @@ class KeyValue : IStressProfile {
 
     override fun prepare(session: Session) {
         insert = session.prepare("INSERT INTO keyvalue (key, value) VALUES (?, ?)")
-        select = session.prepare("SELECT * from keyvalue WHERE key = ?")
+        select = session.prepare("SELECT * from keyvalue WHERE value = ?")
         delete = session.prepare("DELETE from keyvalue WHERE key = ?")
     }
 
@@ -47,7 +47,8 @@ class KeyValue : IStressProfile {
             }
 
             override fun getNextMutation(partitionKey: PartitionKey): Operation {
-                val data = value.getText()
+//                val data = value.getText()
+                val data = partitionKey.getText()
                 val bound = insert.bind(partitionKey.getText(),  data)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant