Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementation for dns log #606

Open
wdweng opened this issue Nov 21, 2024 · 2 comments
Open

implementation for dns log #606

wdweng opened this issue Nov 21, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@wdweng
Copy link

wdweng commented Nov 21, 2024

Request

my graduation is about dns log compression and search
i have read your paper and found that the json version very suitable for dns log
so i want to develop one for dns log

Possible implementation

dns log is a txt file and is semi structured which has format
time--CIP--RIP--QType--QName--Resource Records very similar to json files
resource records have different length, and most values in each field are not repetitive
i want to change the code in clp-s to fit dns log input

@wdweng wdweng added the enhancement New feature or request label Nov 21, 2024
@gibber9809
Copy link
Contributor

gibber9809 commented Nov 21, 2024

Hi @wdweng,

The simplest thing you could do is convert your data to newline-delimited JSON then ingest that. That way everything should work for you out of the box without having to change code.

If you do want to directly ingest DNS logs there is a way to do that (talked about in my masters thesis) but it isn't very user friendly at the moment. You will have to write a parser and a serializer for your DNS logs following a certain programming model. Additionally you will have to change some parts of the code that currently assume every record is a JSON object, in particular here at ingestion, here during serialization, and here during search (and also here during search). Note that this is purely an issue with how the code is written right now -- the archive format itself can handle cases where records are not JSON.

When it comes to actually writing your parser and serializer you will first have to add a type to this enum -- this is the type that gets encoded into the Merged Parse Tree and indicates what type of structure is being represented.

For writing the parser hopefully this can act as reference -- in particular note the start_unordered_object and end_unordered_object calls that mark the start and end of parsing for this custom type. Between those calls you are free to call the "unordered" version of the functions to manipulate the schema and values in a record -- performing parsing in this way will guarantee that you see the same values in the same order at decompression time. You should hopefully just be able to follow what we do in the parse() function in that same file and just call your parser directly.

For serialization it might be a bit more difficult to replicate what we do since the code is very optimized for serializing JSON. Here in the code the variable m_global_id_to_unordered_object should have everything you need to initialize the serializer for your special type. You can see how we use this information to prepare to serialize structurized arrays here. After preparing to serialize objects from a given table the actual serialization code is here. I expect the details of what you'll want to do to initialize your serializer and actually serialize your data will be fairly different from what we do here.

Going forwards this should all become much simpler, but unfortunately support for custom parsing and serialization is not very mature right now.

@wdweng
Copy link
Author

wdweng commented Nov 28, 2024

thank you very much, very helpful
i will try it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants