Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use nprintml's model to predict pcap and get accurate Label #92

Open
1 task done
wibin86 opened this issue Sep 22, 2023 · 1 comment
Open
1 task done
Labels
question Further information is requested triage Unconfirmed and as yet undiscussed issues

Comments

@wibin86
Copy link

wibin86 commented Sep 22, 2023

What's up?

Hello, I encountered some problems when using nprintml machine learning to train the ids data set. There is relatively little information in this area. I hope you can help analyze it.

  1. First, I downloaded the ids datasets under netml case studies under nprint-datases from https://drive.google.com/drive/folders/15Axxx-5d4HLHjPJb9dudyPQKGfRaoxQz
  2. I compressed traffic.pcapng.gz to get traffic.pcapng, I used pcapml to extract pcap, pcapml -M traffic.pcapng -O ids/
  3. Extracted about 30,000 pieces of ids pcap data to the dataset directory for training, and obtained the model nprintml -L labels.txt -a pcapng --pcap_dir dataset/ -4 -t -u -c 5, if you do not add -c here If 5 parameters are used, the error

error: ValueError: Number of classes in y_true not equal to the number of columns y_score
may occur.

  1. After the training is completed, an nprintml directory will be generated in the current directory, with the training results of run-xxxxxx in the directory.
  2. Next, we need to use the model to predict the test data. At this point, I don’t know how to import the test data for testing, such as pcap, npt. I refer to other people's methods to use a pcap file that has not participated in training, convert it into an npt file (nprint -P xxxx.pcap -4 -t -u -W out.npt), and then use pandas.read_csv to import it, maybe from this step started wrong.
  3. Use the following code for data prediction
from autogluon.tabular import TabularPredictor
import pandas as pd

if __name__ == '__main__':
	predictor = TabularPredictor.load('/home/hj/nprintml/run-xxxxxxxxxx-xxxx/model')
	data = pd.read_csv('out.npt',index_col=0)
	result = predictor.predict(data)
	print(result)

KeyError error occurs when running, None of Index
微信截图_20230922103358

The head of my out.npt file is "

src_ip,ipv4_ver_0,ipv4_ver_1,ipv4_ver_2, ..., udp_cksum_14, udp_cksum_15

", and the prediction data requires the head format to be "

src_ip,pkt_0_ipv4_ver_0,pkt_0_ipv4_ver_1,pkt_0_ipv4_ver_2,...,pkt_ 4_udp_cksum_14,pkt_4_udp_cksum_15

" , pkt_x here is related to the previous prediction parameter -c 5
7. I use a script to replace the head of out.npt with "

src_ip,pkt_0_ipv4_ver_0,pkt_0_ipv4_ver_1,pkt_0_ipv4_ver_2,...,pkt_4_udp_cksum_14,pkt_4_udp_cksum_15

". At this time, the prediction results can be obtained, and there will be a warning "

WARNING: Int features without null values" at train time contain null values at inference time! Imputing nulls to 0. To avoid this, pass the features as floats during fit!

  1. The prediction results obtained are not consistent with the label of pcap.

Can you provide the training result model of ids and guide how to predict the pcap or npt data files exported by nprint. The prediction results should be as accurate as possible. Thank you.

Due diligence

  • I have read the docs
@wibin86 wibin86 added question Further information is requested triage Unconfirmed and as yet undiscussed issues labels Sep 22, 2023
@JordanHolland
Copy link
Collaborator

Have you tried this in the docker image, or is this done on your machine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested triage Unconfirmed and as yet undiscussed issues
Projects
None yet
Development

No branches or pull requests

2 participants