-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.txt
188 lines (145 loc) · 7.34 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
# SHUTDOWN DUE TO COVID-19
# baxx.dev
check it out `ssh [email protected]`
[ work in progress ]
* https://baxx.dev/help
* TODO.txt
* infra and pricing.txt
* stat (disk usage, mem, mdadm) https://baxx.dev/stat
# backup service
(also i am learning how to build a product without a website haha)
# screenshots
┌───────────────────────────────────────────────┐
│ │
│ ██████╗ █████╗ ██╗ ██╗██╗ ██╗ │
│ ██╔══██╗██╔══██╗╚██╗██╔╝╚██╗██╔╝ │
│ ██████╔╝███████║ ╚███╔╝ ╚███╔╝ │
│ ██╔══██╗██╔══██║ ██╔██╗ ██╔██╗ │
│ ██████╔╝██║ ██║██╔╝ ██╗██╔╝ ██╗ │
│ ╚═════╝ ╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝ │
│ │
│ Storage 10G │
│ Trial 1 Month 0.1 EUR │
│ Subscription: 5 EUR per Month │
│ Availability: ALPHA │
│ │
│ Contact Us: │
│ * Slack https://baxx.dev/join/slack │
│ * Google Groups https://baxx.dev/join/groups │
│ │
│ E-mail │
│ │
│ Password │
│ │
│ Confirm Password │
│ │
│ │
│ Registering means you agree with │
│ the terms of service! │
│ │
│ [Register] [Login] │
│ │
│ [Help] [What/Why/How] [Terms Of Service] │
│ │
│ [Quit] │
└───────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────┐
│ │
│ ██████╗ █████╗ ██╗ ██╗██╗ ██╗ │
│ ██╔══██╗██╔══██╗╚██╗██╔╝╚██╗██╔╝ │
│ ██████╔╝███████║ ╚███╔╝ ╚███╔╝ │
│ ██╔══██╗██╔══██║ ██╔██╗ ██╔██╗ │
│ ██████╔╝██║ ██║██╔╝ ██╗██╔╝ ██╗ │
│ ╚═════╝ ╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝ │
│ │
│ │
│ Email: [email protected] │
│ Verification pending. │
│ Please check your spam folder. │
│ │
│ Subscription: │
│ Activate at https://baxx.dev/sub/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX │
│ │
│ Refreshing.. - │
│ │
│ [█Help] [Resend Verification Email] [Quit] │
└──────────────────────────────────────────────────────────────────────────┘
# who watches the watchers
the current baxx infra progress is
2 machines, each running only docker and ssh
[ b.baxx.dev ]
* ssh
* docker
+ postgres-master
+ nginx + letsencrypt
+ who watches the watchers [👹job]
+ run notification rules [👹job]
+ process email queue [👹job]
+ collect memory/disk/mdadam stats [privileged] [👹job] (priv because mdadm)
+ baxx-api
+ judoc [localhost]
+ scylla [privileged] (priv because of io tunning)
[ a.baxx.dev ]
* ssh
* docker
+ postgres-slave
+ nginx + letsencrypt
+ who watches the watchers [👹job]
+ process email queue [👹job]
+ collect memory/disk/mdadam stats [privileged] [👹job] (priv because mdadm)
+ baxx-api
+ judoc [localhost]
+ scylla [privileged] (priv because of io tunning)
as you can see both machines are in the scylla cluster, and both of
them are sending the notification emails (using select for update locks)
and only one of them is running the notification rules.
I have built quite simple yet effective monitoring system for baxx.
Each process with [👹job] tag is something like:
(using 👹 because of daemon)
for {
work
sleep X
}
What I did is:
setup("monitoring key", X+5)
for {
work
tick("monitoring key")
sleep X
}
Then the 'who watches the watchers' programs check if "monitoring key"
is executed at within X+5 seconds per node(), and if not they send
slack message
The 'who watches the watchers' then sends notifications (both watchers
send notifications on their own, so i receive the notification twice
but that is ok)
The watchers themselves also use the system, so if one of them dies,
the other one will send notification.
# testing
all the ✓ checks are tested (manually) and the alerts are performing
really good
## shut down postgres
* ✓ shutdown postgres and see if notifications are sent
## shut down one machine
* ✓ aa.baxx.dev
* ✓ bb.baxx.dev
## mdadm
* ✓ make it fail
mdadm -f /dev/md2 /dev/nvme1n1p3
* ✓ wait for panic message
* ✓ remove the disk
mdadm --remove /dev/md2 /dev/nvme1n1p3
* ✓ add the disk back
mdadm --add /dev/md2 /dev/nvme1n1p3
* ✓ wait to see it is acknowledged
works really nice
## test disk thresh
* ✓ start the status tool with with 1% disk threshold
and wait for alert
## test memory thresh
* start the status tool with with 1% memory threshold
and wait for alert
## test health of baxx api
* query /status which should
+ query postgres
+ query judoc