Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting payload hash from network traffic #12

Open
arunppsg opened this issue May 12, 2021 · 6 comments
Open

Extracting payload hash from network traffic #12

arunppsg opened this issue May 12, 2021 · 6 comments

Comments

@arunppsg
Copy link

I was wondering whether it would be possible to extract payload or the payload hash from network traffic along with the fingerprints using mercury. Are there any options for it? We can do it with tcpdump but it does not give fingerprints. Any pointers will be helpful. Thanks.

@davidmcgrew
Copy link
Member

Hi Arun, we did experiment with hashing of the TCP or UDP Data field of a packet as a way to detect retransmissions and duplicated packets. In some branch or another, I think there is is code to print out the data field as a hex number. Is this the sort of thing you had in mind? Thx!

@arunppsg
Copy link
Author

Exactly, that was what I was looking for. If I could get the data field, then I could compute hash of it - in my case, a sha256 hash of the payload will suffix.

@davidmcgrew
Copy link
Member

Since there is no need for cryptographic collision resistance, and there is a need for speed, I had used the xxhash library https://github.com/Cyan4973/xxHash. It performed quite well in tests. I can't find the code that I had experimented with; I think it was never committed into the git repo. It added a new JSON element that holds the xxhash of the entire TCP data field of packet, something like this:

{"tcp":{"data_hash":"474554202f20485454502"}, "src_ip":"192.168.113.237", "dst_ip":"35.224.99.156", "protocol":6, "src_port":53560, "dst_port":80, "event_start":1565200503.658237}

The hash provides a practical way to detect duplicated packets, which seem to happen all the time in network capture environments, by detecting duplicate data_hash values in whatever JSON processing is being done. I think the data_hash output could be a useful aid in debugging network capture systems, especially ones with multiple capture interfaces. However, what I'd personally find more useful would be a mercury option that detected duplicate packets and ignored them (by only processing and reporting on the first packet, and ignoring any following ones). Does that line up with your thinking, or do you have some other use cases in mind?

Thanks!

@arunppsg
Copy link
Author

arunppsg commented May 14, 2021

Yes, that is my requirement - to detect duplicate packet based on the payload hash value. One reason for using mercury is that it is able to handle high amount of traffic. Is there any way I could help or contribute to integrate that feature in mercury?

Thanks!

@davidmcgrew
Copy link
Member

Thanks for the offer to help. I have a bunch of other changes in progress. After those are done, how about I add a hash-based deduplicator as a compile-time option, and you can build it with that option and test it out in your environment.

@arunppsg
Copy link
Author

Sure, that will be great. Thanks for your help. In the meantime, I will also work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants