Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion in operations - mft files which won't validate and roas which won't load in remote repositories #838

Open
sraustein opened this issue Jun 24, 2016 · 10 comments
Labels

Comments

@sraustein
Copy link
Contributor

howdy!
somehow I think I got myself into a weird state, that I wasn't aware of until after my friendly other repository operators told me :(

I built an rpki repository/publication server (gogl-rpki.rarc.net), I published some data:
rsync://gogl-rpki.rarc.net/rpki/GOGL-morrowc/

and then oddly the manifest went stale/etc, seen in:
https://www.hactrn.net/opaque/rcynic/gogl-rpki.rarc.net.html

as: (55 roa like this)
2016-06-24T16:49:36Z current Skipped because not in manifest rsync://gogl-rpki.rarc.net/rpki/GOGL-morrowc/0o4I8AxZy_FfvJQmrOYtMiSgDW4.roa

It's unclear to me where:

  1. anything like this would get logged
  2. what steps I would do to extract myself from this hole

I mashed around a bunch on:
rpkic> update_bpki

rpkic> synchronize
rpkic> force_publication
rpkic> force_reissue

I'm not sure any of that actually helped though...

Trac ticket #832 component rpkid priority minor, owner None, created by morrowc on 2016-06-24T17:48:56Z, last modified 2016-06-26T20:38:27Z

@sraustein
Copy link
Contributor Author

anything in logs? [ heading off to dinner ]

Trac comment by randy on 2016-06-24T17:51:11Z

@sraustein
Copy link
Contributor Author

At Fri, 24 Jun 2016 17:51:14 -0000,
Trac Ticket System wrote:

#832: Confusion in operations - mft files which won't validate and roas which
won't load in remote repositories
----------------------+-------------------
Reporter: morrowc | Owner:
Type: defect | Status: new
Priority: minor | Component: rpkid
Resolution: | Keywords:
Blocked By: | Blocking:
----------------------+-------------------

Comment (by randy):

anything in logs? [ heading off to dinner ]

part of my query was: "Which logs should I look at?" :)

Trac comment by morrowc on 2016-06-24T17:55:43Z

@sraustein
Copy link
Contributor Author

Some potentially useful logs:

{{{
2016-06-24 00:04:09 pubd[26584]: Database synchronization error processing PDU <Element {http://www.hactrn.net/uris/rpki/publication-spec/}publish at 0x7f268c792e60> hash None uri rsync://gogl-rpki.rarc.net/rpki/GOGL-morrowc/-DzHWG17mgHzD3gml34Q4wYlCNc.crl: Object already published at rsync://gogl-rpki.rarc.net/rpki/GOGL-morrowc/-DzHWG17mgHzD3gml34Q4wYlCNc.crl
2016-06-24 00:04:09 pubd[26584]: 127.0.0.1:33721 "POST /client/GOGL-morrowc HTTP/1.1" 200 -
2016-06-24 00:04:09 pubd[26584]: Client {http://www.hactrn.net/uris/rpki/publication-spec/}list request
2016-06-24 00:04:09 pubd[26584]: 127.0.0.1:33723 "POST /client/GOGL-morrowc HTTP/1.1" 200 -
2016-06-24 00:34:09 pubd[26584]: Client {http://www.hactrn.net/uris/rpki/publication-spec/}publish request for rsync://gogl-rpki.rarc.net/rpki/GOGL-morrowc/-DzHWG17mgHzD3gml34Q4wYlCNc.crl
2016-06-24 00:34:09 pubd[26584]: Publishing rsync://gogl-rpki.rarc.net/rpki/GOGL-morrowc/-DzHWG17mgHzD3gml34Q4wYlCNc.crl 2016-06-23T23:34:09Z 9F31A9FE5708F1F79335D67C7B63C3689478833D
2016-06-24 00:34:09 pubd[26584]: Database synchronization error processing PDU <Element {http://www.hactrn.net/uris/rpki/publication-spec/}publish at 0x7f268c78e7a0> hash None uri rsync://gogl-rpki.rarc.net/rpki/GOGL-morrowc/-DzHWG17mgHzD3gml34Q4wYlCNc.crl: Object already published at rsync://gogl-rpki.rarc.net/rpki/GOGL-morrowc/-DzHWG17mgHzD3gml34Q4wYlCNc.crl
2016-06-24 00:34:09 pubd[26584]: 127.0.0.1:34580 "POST /client/GOGL-morrowc HTTP/1.1" 200 -
2016-06-24 00:34:09 pubd[26584]: Client {http://www.hactrn.net/uris/rpki/publication-spec/}list request
}}}

Trac comment by morrowc on 2016-06-24T18:55:16Z

@sraustein
Copy link
Contributor Author

More logs, after the bpki_update process... (yes, some routes won't get signed, fine, I know why 8.8.8.0 is failing)

{{{
2016-06-24 11:04:10 rpkid[26582]: <rpki.rpkid_tasks.UpdateROAsTask {GOGL-morrowc}>: No covering certificate for <ROA: GOGL-morrowc 15169 8.8.4.0/24>, skipping
2016-06-24 11:04:10 rpkid[26582]: <rpki.rpkid_tasks.UpdateROAsTask {GOGL-morrowc}>: No covering certificate for <ROA: GOGL-morrowc 26910 207.126.144.0/20>, skipping
2016-06-24 11:04:10 rpkid[26582]: <rpki.rpkid_tasks.UpdateROAsTask {GOGL-morrowc}>: No covering certificate for <ROA: GOGL-morrowc 15169 104.132.0.0/14>, skipping
2016-06-24 11:04:10 rpkid[26582]: <rpki.rpkid_tasks.UpdateROAsTask {GOGL-morrowc}>: No covering certificate for <ROA: GOGL-morrowc 26910 64.79.152.0/23>, skipping
2016-06-24 11:04:10 rpkid[26582]: <rpki.rpkid_tasks.UpdateROAsTask {GOGL-morrowc}>: No covering certificate for <ROA: GOGL-morrowc 15169 193.200.222.0/24>, skipping
2016-06-24 11:04:10 rpkid[26582]: <rpki.rpkid_tasks.UpdateROAsTask {GOGL-morrowc}>: No covering certificate for <ROA: GOGL-morrowc 36492 192.119.28.0/24>, skipping
2016-06-24 11:04:10 rpkid[26582]: <rpki.rpkid_tasks.UpdateROAsTask {GOGL-morrowc}>: No covering certificate for <ROA: GOGL-morrowc 15169 185.25.28.0/23>, skipping
2016-06-24 11:04:10 rpkid[26582]: <rpki.rpkid_tasks.UpdateROAsTask {GOGL-morrowc}>: No covering certificate for <ROA: GOGL-morrowc 26910 64.18.0.0/20>, skipping
2016-06-24 11:04:10 rpkid[26582]: <rpki.rpkid_tasks.UpdateROAsTask {GOGL-morrowc}>: No covering certificate for <ROA: GOGL-morrowc 36384 104.132.0.0/14>, skipping
2016-06-24 11:04:10 rpkid[26582]: <rpki.rpkid_tasks.UpdateROAsTask {GOGL-morrowc}>: No covering certificate for <ROA: GOGL-morrowc 15169 8.8.8.0/24>, skipping
2016-06-24 11:04:10 rpkid[26582]: <rpki.rpkid_tasks.UpdateROAsTask {GOGL-morrowc}>: No covering certificate for <ROA: GOGL-morrowc 41264 104.132.0.0/14>, skipping
2016-06-24 11:04:11 rpkid[26582]: Sending <Element {http://www.hactrn.net/uris/rpki/publication-spec/}publish at 0x7fabf2279b48> hash = None uri = rsync://gogl-rpki.rarc.net/rpki/GOGL-morrowc/-DzHWG17mgHzD3gml34Q4wYlCNc.crl to pubd
2016-06-24 11:04:11 rpkid[26582]: Sending <Element {http://www.hactrn.net/uris/rpki/publication-spec/}publish at 0x7fabf1e97d88> hash = None uri = rsync://gogl-rpki.rarc.net/rpki/GOGL-morrowc/-DzHWG17mgHzD3gml34Q4wYlCNc.mft to pubd
2016-06-24 11:04:11 rpkid[26582]: Received <Element {http://www.hactrn.net/uris/rpki/publication-spec/}report_error at 0x7fabf211b8c0> hash = None uri = None from pubd
2016-06-24 11:04:11 rpkid[26582]: Lost synchronization with <Repository: GOGL-morrowc.altCA http://localhost:4402/client/GOGL-morrowc>: Object already published at rsync://gogl-rpki.rarc.net/rpki/GOGL-morrowc/-DzHWG17mgHzD3gml34Q4wYlCNc.crl
2016-06-24 11:04:11 rpkid[26582]: Attempting resynchronization with <Repository: GOGL-morrowc.altCA http://localhost:4402/client/GOGL-morrowc>
2016-06-24 11:04:11 rpkid[26582]: Sending <Element {http://www.hactrn.net/uris/rpki/publication-spec/}list at 0x7fabf254dfc8> hash = None uri = None to pubd
2016-06-24 11:04:11 rpkid[26582]: Received <Element {http://www.hactrn.net/uris/rpki/publication-spec/}list at 0x7fabf254d7a0> hash = ae3609259abea07087e3ff0fff16ddc7054177039ef0dcfca0d08462bbdf88b6 uri = rsync://gogl-rpki.rarc.net/rpki/GOGL-morrowc/-DzHWG17mgHzD3gml34Q4wYlCNc.crl from pubd
}}}

Trac comment by morrowc on 2016-06-24T18:56:05Z

@sraustein
Copy link
Contributor Author

So the "No covering certificate" messages mean what they say: you've requested ROAs for which you do not currently have a valid certificate from your parent, so rpkid can't issue those ROAs.

The "Lost synchronization" errors may be unrelated: those are rpkid sending a change request to pubd, pubd saying "your patch does not apply cleanly", and rpkid saying "oh crap, OK, what's on file?" so that it can reissue the request. I don't see the part where rpkid decides what to do next based on the report from pubd, the second log snippet cuts off while rpkid is receiving that report (or perhaps right after the end of it).

Trac comment by sra on 2016-06-25T22:12:09Z

@sraustein
Copy link
Contributor Author

At Sat, 25 Jun 2016 22:12:09 -0000,
Trac Ticket System wrote:

#832: Confusion in operations - mft files which won't validate and roas which
won't load in remote repositories
----------------------+-------------------
Reporter: morrowc | Owner:
Type: defect | Status: new
Priority: minor | Component: rpkid
Resolution: | Keywords:
Blocked By: | Blocking:
----------------------+-------------------

Comment (by sra):

So the "No covering certificate" messages mean what they say: you've
requested ROAs for which you do not currently have a valid certificate
from your parent, so rpkid can't issue those ROAs.

The "Lost synchronization" errors may be unrelated: those are rpkid
sending a change request to pubd, pubd saying "your patch does not apply
cleanly", and rpkid saying "oh crap, OK, what's on file?" so that it can
reissue the request. I don't see the part where rpkid decides what to do
next based on the report from pubd, the second log snippet cuts off while
rpkid is receiving that report (or perhaps right after the end of it).

I sent log locale to private email thread about this... I can attach
the tarball to the ticket, if that'd help?

Trac comment by morrowc on 2016-06-26T01:57:01Z

@sraustein
Copy link
Contributor Author

I have the logs. I have been staring at them.

What I am seeing is a long repeated sequence of rpkid trying to publish something, pubd saying "sorry, you're out of sync buddy", rpkid trying to recover from that, wait half an hour, rinse, repeat.

I suspect that there is something wrong with the resynchronization code (pas merde, Hercule), so I am looking at that now.

Trac comment by sra on 2016-06-26T18:25:19Z

@sraustein
Copy link
Contributor Author

Nothing obviously wrong with the resynchronization code. It's a bit cumbersome, but looks correct.

Checked for SQL transactional nonsense that could be causing the resynchronization code to give a different result than the original request would have, found none.

Clearly something is still wrong, but don't know what.

Added a bit of additional logging to the resynchronization code so it will report what it thinks it's doing, will push that as soon as I've tested that I didn't break something obvious.

Trac comment by sra on 2016-06-26T19:23:58Z

@sraustein
Copy link
Contributor Author

Added a bit of additional logging to the resynchronization code so it will
report what it thinks it's doing, will push that as soon as I've tested
that I didn't break something obvious.

[6445], binaries should pop out of the automatic build process sometime in the next half hour.

Only other thing I can think to suggest would be enabling full debug logging (/etc/rpki.conf myrpki::log-level = "debug").

But I'd try seeing whether cleaning up old rpki.conf ("localhost" issue, etc) fixes anything first.

Trac comment by sra on 2016-06-26T20:09:20Z

@sraustein
Copy link
Contributor Author

more and more i am smelling an identity crisis at install time. not
knowing when puppet does what, hard to diagnose remotely.

Trac comment by randy on 2016-06-26T20:38:27Z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant