How to use nprintml's model to predict pcap and get accurate Label #92

wibin86 · 2023-09-22T05:40:06Z

What's up?

Hello, I encountered some problems when using nprintml machine learning to train the ids data set. There is relatively little information in this area. I hope you can help analyze it.

First, I downloaded the ids datasets under netml case studies under nprint-datases from https://drive.google.com/drive/folders/15Axxx-5d4HLHjPJb9dudyPQKGfRaoxQz
I compressed traffic.pcapng.gz to get traffic.pcapng, I used pcapml to extract pcap, pcapml -M traffic.pcapng -O ids/
Extracted about 30,000 pieces of ids pcap data to the dataset directory for training, and obtained the model nprintml -L labels.txt -a pcapng --pcap_dir dataset/ -4 -t -u -c 5, if you do not add -c here If 5 parameters are used, the error

error: ValueError: Number of classes in y_true not equal to the number of columns y_score
may occur.

After the training is completed, an nprintml directory will be generated in the current directory, with the training results of run-xxxxxx in the directory.
Next, we need to use the model to predict the test data. At this point, I don’t know how to import the test data for testing, such as pcap, npt. I refer to other people's methods to use a pcap file that has not participated in training, convert it into an npt file (nprint -P xxxx.pcap -4 -t -u -W out.npt), and then use pandas.read_csv to import it, maybe from this step started wrong.
Use the following code for data prediction

from autogluon.tabular import TabularPredictor
import pandas as pd

if __name__ == '__main__':
	predictor = TabularPredictor.load('/home/hj/nprintml/run-xxxxxxxxxx-xxxx/model')
	data = pd.read_csv('out.npt',index_col=0)
	result = predictor.predict(data)
	print(result)

KeyError error occurs when running, None of Index

The head of my out.npt file is "

src_ip,ipv4_ver_0,ipv4_ver_1,ipv4_ver_2, ..., udp_cksum_14, udp_cksum_15

", and the prediction data requires the head format to be "

src_ip,pkt_0_ipv4_ver_0,pkt_0_ipv4_ver_1,pkt_0_ipv4_ver_2,...,pkt_ 4_udp_cksum_14,pkt_4_udp_cksum_15

" , pkt_x here is related to the previous prediction parameter -c 5
7. I use a script to replace the head of out.npt with "

src_ip,pkt_0_ipv4_ver_0,pkt_0_ipv4_ver_1,pkt_0_ipv4_ver_2,...,pkt_4_udp_cksum_14,pkt_4_udp_cksum_15

". At this time, the prediction results can be obtained, and there will be a warning "

WARNING: Int features without null values" at train time contain null values at inference time! Imputing nulls to 0. To avoid this, pass the features as floats during fit!
”

The prediction results obtained are not consistent with the label of pcap.

Can you provide the training result model of ids and guide how to predict the pcap or npt data files exported by nprint. The prediction results should be as accurate as possible. Thank you.

Due diligence

I have read the docs

The text was updated successfully, but these errors were encountered:

JordanHolland · 2024-02-26T20:46:07Z

Have you tried this in the docker image, or is this done on your machine?

wibin86 added question Further information is requested triage Unconfirmed and as yet undiscussed issues labels Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use nprintml's model to predict pcap and get accurate Label #92

How to use nprintml's model to predict pcap and get accurate Label #92

wibin86 commented Sep 22, 2023 •

edited

Loading

JordanHolland commented Feb 26, 2024

How to use nprintml's model to predict pcap and get accurate Label #92

How to use nprintml's model to predict pcap and get accurate Label #92

Comments

wibin86 commented Sep 22, 2023 • edited Loading

What's up?

Due diligence

JordanHolland commented Feb 26, 2024

wibin86 commented Sep 22, 2023 •

edited

Loading