Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic on BGP4MP_ET File #31

Closed
SZenglein opened this issue Dec 4, 2021 · 17 comments
Closed

Panic on BGP4MP_ET File #31

SZenglein opened this issue Dec 4, 2021 · 17 comments
Labels
bug Something isn't working

Comments

@SZenglein
Copy link

Example file: http://archive.routeviews.org/route-views.amsix/bgpdata/2020.06/UPDATES/updates.20200617.1830.bz2

This file crashes the provided example. Bgpdump can read it and reports it as BGP4MP_ET, while bgpscanner also struggles parsing it.

@SZenglein
Copy link
Author

Probably related: CAIDA/libbgpstream#211

You (@digizeph ) seem to have encountered this issue before.

@SZenglein
Copy link
Author

I see, this has been fixed in the latest version, sorry.

@digizeph
Copy link
Member

digizeph commented Dec 4, 2021

I see, this has been fixed in the latest version, sorry.

No worries! Thanks for check it out!

@digizeph
Copy link
Member

digizeph commented Dec 4, 2021

After some checking, there seems to be other issues regarding parsing this file. I am investigating now, and will reopen this issue.

The problem I see:
When reading the file with bgpkit-parser-cli, the stream terminates too early, and the count does not make much sense.

➜  ~ RUST_LOG=info bgpkit-parser-cli http://archive.routeviews.org/route-views.amsix/bgpdata/2020.06/UPDATES/updates.20200617.1830.bz2 -r -e
[2021-12-04T16:36:59Z ERROR bgpkit_parser::parser::iters] Custom { kind: Other, error: "Invalid byte length for IPv4 prefix" }
total records: 1507
total elems:   103

@digizeph digizeph reopened this Dec 4, 2021
@digizeph
Copy link
Member

digizeph commented Dec 4, 2021

There seems to be many messages where NLRI field is somehow having strange bit_len values. I have wrote a fix in place (see #33) to skip problematic messages and produce warnings for now (wait for the v0.6.0 release). I will continue investigate this issue.

[2021-12-04T16:56:58Z WARN  bgpkit_parser::parser::iters] Parsing error: Invalid byte length for IPv4 prefix. byte_len: 29, bit_len: 229
[2021-12-04T16:56:58Z WARN  bgpkit_parser::parser::iters] Parsing error: Invalid byte length for IPv4 prefix. byte_len: 32, bit_len: 252
[2021-12-04T16:56:58Z WARN  bgpkit_parser::parser::iters] Parsing error: Invalid byte length for IPv4 prefix. byte_len: 29, bit_len: 229
[2021-12-04T16:56:58Z WARN  bgpkit_parser::parser::iters] Parsing error: Invalid byte length for IPv4 prefix. byte_len: 29, bit_len: 22

@digizeph digizeph added the bug Something isn't working label Dec 4, 2021
@digizeph digizeph added this to the V0.7 milestone Dec 4, 2021
@SZenglein
Copy link
Author

Yes, I ran some more tests and the count really does not make much sense. However, no parser I tested seems to agree on the output. The text output of bgpdump and bgpreader are very large (955MB and 1.3GB). bgpscanner is much smaller with only 170MB and finally bgpkit only returns 103 elements as you already wrote.

The line counts are also all different.

6658205 bgpdump_out.txt
    103 bgpkit_out.txt
5328070 bgpreader_out.txt
1398555 bgpscanner_out.txt

Maybe this file is just unrecoverably malformed?

@digizeph
Copy link
Member

digizeph commented Dec 4, 2021

I am still investigating the message, but I do see a lot of warnings from my side, mostly on the NLRI values. The attributes parsing, on the other hand, seems to be OK.

Note that the attributes are placed before NLRI in the MRT files, and both have dedicated 2-byte field indicating the length of each. So it is unlikely that error during parsing of attributes will propagate to parsing of NLRI.

The just released v0.6.0 has the following count:

total records: 1395192
total elems:   952645

@digizeph
Copy link
Member

digizeph commented Dec 5, 2021

Able to isolate one problematic message. The attached is a self-containing MRT record that should be parsable by other parsers as well like a regular MRT file (but only containing one record):
mrt_core_dump.gz

@SZenglein
Copy link
Author

SZenglein commented Dec 5, 2021

So, is there any way to handle invalid messages inside the iterator, or are they just silently skipped? Maybe the iterator should be over an Option<BgpElem> (or Result).

@digizeph
Copy link
Member

digizeph commented Dec 6, 2021

So, is there any way to handle invalid messages inside the iterator, or are they just silently skipped? Maybe the iterator should be over an Option<BgpElem> (or Result).

I am not sure how you would handle a invalid message. Currently, the iterator checks errors from the parser, and if the parser is not critical IO issues, it'll skip the current one and continue parsing the next one.

For records that cannot be parsed correctly, we cannot really create BgpElem or even knowing how many BgpElems should there be, so it is pretty hard to wrap them into Option<BgpElem>. In other words, I don't know what it means when it returns a None. But, we can definitely do it for MrtRecords, which is the parent struct that the BgpElem is derived from. I think it makes sense to wrap it in Result<MrtRecord, ParserError>, but I would like to create a dedicated iterator for this behavior, alongside the default iterators.

Able to isolate one problematic message. The attached is a self-containing MRT record that should be parsable by other parsers as well like a regular MRT file (but only containing one record):
mrt_core_dump.gz

I have run bgpdump against the bytes dump that bgpkit-parser is having trouble with, and the results look definitely wrong (see thoese 0.0.0.0/0s). This also explains why there are so many elems from bgpdump from your previous test, since this one single corrupted record would turn out to be 6 different elems.

@SZenglein, I'd like to seek your input on this finding. Thanks in advance!

➜  bgpkit-parser git:(feature_bytes_in_error) ✗ bgpdump mrt_core_dump.gz 
2021-12-05 18:44:52 [info] logging to syslog
TIME: 06/17/20 18:30:01.460683
TYPE: BGP4MP_ET/MESSAGE/Update
FROM: 80.249.212.84 AS42541
TO: 80.249.214.5 AS6447
ORIGIN: IGP
ASPATH: 42541 2603 1103 12654
NEXT_HOP: 80.249.212.84
AGGREGATOR: AS64739 10.22.28.40
COMMUNITY: 42541:62
LARGE_COMMUNITY: 42541:1:2
   INCOMPLETE PACKET: 4 bytes cutted
   INCOMPLETE PART: 2f 
ANNOUNCE
  0.0.0.0/0
  0.0.0.0/0
  0.0.0.0/0
  24.84.205.66/40
  0.0.0.0/0
  0.0.0.0/0

@digizeph
Copy link
Member

digizeph commented Dec 6, 2021

A few observation:

  • we see attributes of origin, aspath, next_hop, aggregator
  • we don't see community or large_community because they were marked as partial which we then skipped parsing
  • in the parsing of NLRI fields, the first 3 nlri has bit length 0, and the 4th one has bit length 40 (invalid for a ipv4 address)

@digizeph
Copy link
Member

digizeph commented Dec 6, 2021

Has confirmed with RouteViews people that this is an issue with one AMSIX peer sending ADD-PATH data while the MRT collector software does not support ADD-PATH, thus causing the corrupted data. As far as I know, the issue is resolved in more recent MRT files. For the behaviors you see in the corrupted files, I believe our behavior is fine in that that we skip corrupted MRT records instead of producing misleading elements. The massive number of elems you see from other parsers could contain a lot of 0.0.0.0/0 announcements, which could be potentially more misleading.

However, I do believe that better error message is needed, and therefore I will add more information regarding parsing issues in the next release. With RUST_LOG=info set, the warnings will show up and at least users can see what was going on.

@digizeph
Copy link
Member

digizeph commented Dec 6, 2021

Found a relevant issue with more discussion in depth of the issue:
RIPE-NCC/bgpdump#9

@SZenglein
Copy link
Author

I'm not sure if I can help you all that much in debugging these files. I also noticed lots of "null" messages and generally weird results.
But I believe the issue you linked cleared up where this problem originates.

Reconsidering, I agree that the crate API is fine. A Result can be obtained without much issue by using the parser directly, and I think wrapping the parser in another Iterator that allows for manual error handling is the way to go.

digizeph added a commit that referenced this issue Dec 9, 2021
This refactor enables parsing of NLRI block with potentially wrong
add-path type. In some cases, the BGP message with add-path might be
wrapped in a non-add-path message and causing trouble parsing NLRI block
in BGP message and in attributes.

This patch implements a workaround that can more intelligently "guess"
the add-path intention and revert back to regular parsing when guess
fails. Here is a brief description of the workaround:

- If add-path is not set (i.e. not the add-path BGP msgs) and the first
byte of an NLRI entry is 0, it is likely an add-path msg wrapped in
non-add-path msg, treat as add-path. This covers both IPv4 and IPv6
cases.
- If an error happens when parsing, revert back to the original add-path
setting (this only happens for `0.0.0.0/0` or `0::/0` prefixes without
add-path, which is pretty rare) and re-parse the whole NLRI
block. (needs to go back a few bytes to the beginning).
- The unhandled cases are the ones where the first byte of path_id is
not zero. In those cases, we will, unfortunately, parse the msg without
path-id, and depending on the msg, we might see parsing errors
later (e.g. not enough bytes, or extra bytes).

Test run show that the parsing of one problematic file is resolved back
to normal now. See #31 for
more details.
@digizeph digizeph added the fixed label Dec 9, 2021
@digizeph
Copy link
Member

digizeph commented Dec 9, 2021

Hey @SZenglein, I've added a fix to this issue in #42, and have been merged into main branch.

Could you give in a try when you have some time? Just checkout the main branch and do

cd cli
cargo run --release -- http://archive.routeviews.org/route-views.amsix/bgpdata/2020.06/UPDATES/updates.20200617.1830.bz2

Records and elems count now are:

total records: 1398004
total elems:   3679857

And there is no /0 prefixes (these are definitely wrong prefixes to see there) any more.

@SZenglein
Copy link
Author

Looks fine to me at a glance.

@digizeph
Copy link
Member

I'll close this issue now. Feel free to reopen it if you find further issues on handling similar MRT files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

2 participants