-
Notifications
You must be signed in to change notification settings - Fork 27
RTR Server keeps reporting not ready when fetching validated ROA prefixes #283
Comments
Hi, The json contains Are you running the most recent release (last friday ) of the validator? Over the last months there have been a number of stability and memory consumption improvements (and, less fortunately, bug fixes). Have you restarted the validator recently? |
curl http://localhost:8080/api/healthcheck Previously checked on the validator GUI and everything seemed OK. Currently running latest release but this issue was noticeable on previous one. Have restarted both validator, rtr-server and host machine at least twice, but do notice that current situation is persistent over 2 days, so it does not derive from restarting any of the components or recently restarting them. |
It looks good (now). There were recent changes to the behaviour when the validator has been running for a longer period of time or on machines with a fast connection/many cores. Do you still encounter the issue you explained above? |
Yes I do. That is the point. This has been like these over two days now. Sep 28 14:46:15 rpki01 rpki-rtr-server.sh: 2020-09-28 14:46:15.767 INFO 2148 --- [eduler_Worker-5] n.r.r.r.a.v.RefreshCacheController : fetching validated roa prefixes from http://localhost:8080/api/objects/validated |
Hi --- ok, that's clear. In this situation it is best to try to reset the local database. I just tested our latest release (provisioned using an automated install in a clean vm) and did get to a stable state, where the rtr server is ready. Please let us know if that helps! If that resolves the issue, it would help if you can share the current database with us so we can investigate, because at the moment we do not have an copy of a database that gets stuck with the latest release. |
Hello, reset to the local database worked. Thank you. How can I share the backup database with you ? It is quite big... ~2.5GB. |
The download size is no problem for me if you have a place to store it. We can also skip further investigation and only do so if this happens again. There was a bug in |
When running a validator using the copy of the database from before the reset the validator converged for me after some time. However then I noticed that the validated reports "not ready" quite often. I think you were hitting the situation I've described in #284. I'm closing this issue for now but please re-open this if you hit it later. |
Do not think it is a time issue. |
Ok... Thanks for the clarification. In hindsight it would have been interesting to see how long ago each trust anchor had updated (from the web interface or It could also be a deadlock due to a high number of threads – we have a report of this in #277. Are you running it on a machine with a large number of cores? |
Initially we had 4 vCPUs when the issue first came up. In the following days we "feed" it 4 more vCPUs trying to cope with observed high CPU usage and load thinking it could be a time/ processing issue. It did not help. |
That number of cores sounds fine. My personal long-running instance works fine with two (skylake) vCPUs. It will use a higher number of cores; the peaks will use all cores but last shorter. I would recommend 2 cores as a minimum, four (or eight) should work well. The issue I described with a high number of cores occurred on a machine with 48 cores was reproducible on a machine with 56 cores so I don't think that situation applies (we don't hit it on machines with 16 hyperthreads). |
Sep 28 12:11:15 rpki01 rpki-rtr-server.sh: 2020-09-28 12:11:15.767 INFO 2148 --- [eduler_Worker-3] n.r.r.r.a.v.RefreshCacheController : fetching validated roa prefixes from http://localhost:8080/api/objects/validated
Sep 28 12:11:19 rpki01 rpki-rtr-server.sh: 2020-09-28 12:11:19.482 INFO 2148 --- [eduler_Worker-3] n.r.r.r.a.v.RefreshCacheController : validator http://localhost:8080/api/objects/validated not ready yet, will retry later
But If I curl this info out, it comes clean:
curl http://localhost:8080/api/objects/validated | (head; tail)
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 16193 { 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
"data" : {
"ready" : false,
0 "trustAnchors" : [ {
"type" : "trust-anchor",
1 "id" : 1,
6 "name" : "AfriNIC RPKI Root",
1 "locations" : [ "https://rpki.afrinic.net/repository/AfriNIC.cer", "rsync://rpki.afrinic.net/repository/AfriNIC.cer" ],
93 "subjectPublicKeyInfo" : "MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxsAqAhWIO+ON2Ef9oRDMpKxv+AfmSLIdLWJtjrvUyDxJPBjgR+kVrOHUeTaujygFUp49tuN5H2C1rUuQavTHvve6xNF5fU3OkTcqEzMOZy+ctkbde2SRMVdvbO22+TH9gNhKDc9l7Vu01qU4LeJHk3X0f5uu5346YrGAOSv6AaYBXVgXxa0s9ZvgqFpim50pReQe/WI3QwFKNgpPzfQL6Y7fDPYdYaVOXPXSKtx7P4s4KLA/ZWmRL/bobw/i2fFviAGhDrjqqqum+/9w1hElL/vqihVnV18saKTnLvkItA/Bf5i11Yhw2K7qv573YWxyuqCknO/iYLTR1DToBZcZUQIDAQAB",
"rsyncPrefetchUri" : "rsync://rpki.afrinic.net/repository/",
100 15.7M 0 15.7M 0 0 13.1M 0 --:--:-- 0:00:01 --:--:-- 13.1M
"prefix" : "45.11.116.0/22",
"maxLength" : 24
}, {
"asn" : "4214120002",
"prefix" : "185.168.163.0/24",
"maxLength" : 24
} ],
"routerCertificates" : [ ]
}
}
The text was updated successfully, but these errors were encountered: