Welcome to pgSimload !
pgSimload is a tool written in Go, and accepts 3 different modes of execution:
-
SQL-Loop mode to execute a script infintely on a given schema of a given database with a given user
-
Patroni-Watcher mode to execute a monitoring on a given Patroni cluster. This is usefull only if... you run Patroni
-
Kube-Watcher mode to have a minimal monitoring of a given PostgreSQL cluster in Kubernetes
Given the mode you choose, some parameters are mandatory or not. And the contexts of executions are different. Please refer to the complete documentation in docs/pgSimload.doc.md.
Alternatively, you can download the documentation in PDF format.
This is very straightforward if you have Go installed on your system. You can run the tool with Go from the main directory of the project like:
go run . <parameters...>
go run . -h
If you don't have Go installed on your system, you can also just use one of the binaries provided in the lastest release. I really recommend you download the latest release!
If you want to build your own binary you can build it too, as described in the next paragraph.
Feedback is welcome in any cases!
You can use the provided script build.sh.
sh build.sh
We've started tests to build those packages but at the moment, the work hasn't finish yet. But do you really need this, since pgSimload is a standalone binary ? Feel free to contribute to pgSimload, opening a Pull request, with your packaging code ;-)
This tool can be used in different infrastructures:
-
on the localhost, if a PostgreSQL is running on it
-
on any distant stand-alone PostgreSQL or PostgreSQL cluster, in bare-metal of VMs
-
on any PostgreSQL stand-alone PostgreSQL or PostgreSQL cluster running in a Kubernetes environment.
This tool as different usages, and you probably think of some that I haven't listed here:
-
just initiate a plain
select 1
, aselect count(*) from...
, whatever you find usefull. But pgSimload won't get you results back from those executions -
insert dummy data (mostly randomly if you know about, mostly,
generate_series()
andrandom()
PostgreSQL functions) any DB with the schema of your choice and the SQL script of your choice-
if your database doesn't have a schema yet, you can create in a
create.json
file. Look for examples on how to do that in theexamples/SQL-Loop/
directory. It should straightforward. That file is not mandatory, as pgSimload need at least a-config <file>
and a-script <file>
to run, in SQL-Loop mode. -
the SQL script of your choice. For that purpose you create a plain SQL file, where you put everything you want in it. It will be run in an implicit transaction, and can contain multiple statements. If you want details on how pgSimload runs those statements at once, please read chapter Multiple Statements in a Simple Query in the PostgreSQL's documentation.
-
you can set special parameters to the session like
SET synchronous_commit TO 'on'
orSET work_mem TO '12MB'
if you want the SQL script's sessions to be tweaked depending your needs. This is usefull to compare the performances or behaviour in replication or others things. For that you'll have to use the-session_parameters <session_parameters.json>
parameter for pgSimpload. Otherwise, without this, every DEFAULT values will of course apply. -
if you're too lazy to gather those session parameters, you can create a template file you can letter modify and adapt to your needs. For that pgSimload will create a template file in the name you want, based on a given connection. Look for
-create_gucs_template
in this documentation. -
this "dummy data insertion" is most often used to simulate some write work on a PostgreSQL server (standalone or the primary of a PostgreSQL cluster with a(some) replica(s).
-
the SQL-Loop mode execution can be limitated to:
-
a number of loop exections you define thanks to the
-loops <int64>
parameter and/or -
a given execution time of your choice you can define thanks to the
-time duration
parameter, where that duration is expressed with or without simple or double-quotes like in "10s", 1m30s or '1h40m' -
if both parameters are used at the same time, the SQL-Loop will end whenever one or the other condition is satisfied
-
-
the rate of the iterations can be slowed down since version 1.2.0 thanks to the
-sleep duration
parameter, where a duration is expressed the same way-time duration
is (see upper). If this parameter is set to anything else that 0, pgSimload will sleep for that amount of time. This is usefull if you want to slow down the SQL-Loop process. It also avoid the user to manually add like aselect pg_sleep(1);
at the end of theSQL script
used with-script
. So it's faster to test different values of "sleeping" by recalling the command line and changing the value there instead of editing that SQL script... -
since version 1.4.1, the parameter
-rsleep duration
allows to set a maximum random sleep time of the duration in parameter. This is usefull if you want not all your-clients <integer>
to be executing the-script
at the exact same time. OR if you prefer the sleep time between iterations to be somewhat random and not fixed. If-sleep
and-rsleep
are used both, then the random sleep time will be added to the fixed sleep time. As an example a-sleep 1s
with a-rsleep 1s
will result of a total sleep time between 1 and 2 seconds.
-
-
the SQL-loop mode execution is by default executed with one unique PostgreSQL connection to the server. You can execute it with as many clients in parallel you want thanks to the
-clients <integer>
parameter added in version 1.4.0. So if you want the same SQL script be executed by 3 parallel clients, that is simple as adding-clients 3
to the command line. If you use limitations (-loops
and/or-time
and/or-sleep
) and/or special session parameters (-session_parameters
), those will be applied to all clients the same way. -
test failovers, or what happens when a DB is down: pgSimLoad handles those errors. Give it a try: simply shuting down your PostgreSQL server while it runs... You'll see it throwing errors, then restarting to load once the PostgreSQL server ("primary" if you use replication) is back.
-
monitor a PostgreSQL cluster that uses Patroni, with the special
--patroni <config.json>
parameter, that has to come with a--config <config.json>
where the later will use mandatorily thepostgres
user, because, on that mode, we use a special trick to get the primary's name, and this trick can only be done by a superuser in PostgreSQL (so it can be something else thanpostgres
, if you set another superuser). -
so when testing a PostgreSQL cluster using Patroni, with multiple hosts (a primary and a given number of replicas, synchronous or not), usually, pgSimload is run in 2 separate terminals, one to load data, and the other, to monitor things in Patroni.
- note the Patroni-Watcher mode can have added information thanks
to the
Replication_info
set tonogucs
or<list of gucs separated by a comma
(e.g "synchronous_standby_names, synchronous_commit, work_mem") in thepatroni.json
config file passed as an argument to-patroni <patroni.json>
parameter. If set tonogucs
, no extra GUCs are shown, only the info frompg_stat_replication
will be
- note the Patroni-Watcher mode can have added information thanks
to the
-
monitor a PostgreSQL cluster that runs in Kubernetes, wheter this solution uses Patroni or not for HA: this mode only uses some
kubectl
commands to gather only the relevant information to monitor things, like who's primary, who's replica, the status of each, etc. This mode has been tested against the Postgres Operator (aka PGO), from CrunchyData, and the operator from CloudNativePG. You'll find in theexample/Kube-Watcher/
directory proper configuration JSON to use in both cases -
demo Crunchy Postgres, a fully Open Source based PostgreSQL distribution using extensively Ansible
-
demo Crunchy Postgres for Kubernetes, a fully Open Source based PostgreSQL distribution to run production workloads in Kubernetes