implementation for dns log #606

wdweng · 2024-11-21T13:30:08Z

Request

my graduation is about dns log compression and search
i have read your paper and found that the json version very suitable for dns log
so i want to develop one for dns log

Possible implementation

dns log is a txt file and is semi structured which has format
time--CIP--RIP--QType--QName--Resource Records very similar to json files
resource records have different length, and most values in each field are not repetitive
i want to change the code in clp-s to fit dns log input

gibber9809 · 2024-11-21T16:55:35Z

Hi @wdweng,

The simplest thing you could do is convert your data to newline-delimited JSON then ingest that. That way everything should work for you out of the box without having to change code.

If you do want to directly ingest DNS logs there is a way to do that (talked about in my masters thesis) but it isn't very user friendly at the moment. You will have to write a parser and a serializer for your DNS logs following a certain programming model. Additionally you will have to change some parts of the code that currently assume every record is a JSON object, in particular here at ingestion, here during serialization, and here during search (and also here during search). Note that this is purely an issue with how the code is written right now -- the archive format itself can handle cases where records are not JSON.

When it comes to actually writing your parser and serializer you will first have to add a type to this enum -- this is the type that gets encoded into the Merged Parse Tree and indicates what type of structure is being represented.

For writing the parser hopefully this can act as reference -- in particular note the start_unordered_object and end_unordered_object calls that mark the start and end of parsing for this custom type. Between those calls you are free to call the "unordered" version of the functions to manipulate the schema and values in a record -- performing parsing in this way will guarantee that you see the same values in the same order at decompression time. You should hopefully just be able to follow what we do in the parse() function in that same file and just call your parser directly.

For serialization it might be a bit more difficult to replicate what we do since the code is very optimized for serializing JSON. Here in the code the variable m_global_id_to_unordered_object should have everything you need to initialize the serializer for your special type. You can see how we use this information to prepare to serialize structurized arrays here. After preparing to serialize objects from a given table the actual serialization code is here. I expect the details of what you'll want to do to initialize your serializer and actually serialize your data will be fairly different from what we do here.

Going forwards this should all become much simpler, but unfortunately support for custom parsing and serialization is not very mature right now.

wdweng · 2024-11-28T18:04:02Z

thank you very much, very helpful
i will try it

wdweng added the enhancement New feature or request label Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implementation for dns log #606

implementation for dns log #606

wdweng commented Nov 21, 2024

gibber9809 commented Nov 21, 2024 •

edited

Loading

wdweng commented Nov 28, 2024

implementation for dns log #606

implementation for dns log #606

Comments

wdweng commented Nov 21, 2024

Request

Possible implementation

gibber9809 commented Nov 21, 2024 • edited Loading

wdweng commented Nov 28, 2024

gibber9809 commented Nov 21, 2024 •

edited

Loading