GitHub - dpgon/DGA: Study of the different algorithms and techniques for generating DGA and discrimination against non-DGA domains · GitHub
Skip to content

dpgon/DGA

Folders and files

Repository files navigation

DGA

Study of the different algorithms and techniques for generating DGA and discrimination against non-DGA domains. I developed three tools in python 3:

  1. lookdga.py to create DGA domains and detect them by reg.exp.
  2. statsdga.py to calculate some statistics.
  3. detectdga.py to train a model and detect DGA domains using machine learning stuff.

Almost all DGAs algorithms were taken from https://github.com/baderj/domain_generation_algorithms and were adapted in the DGAs directory.

Johannes Bader's work with DGA malware is very valuable. Look it at https://johannesbader.ch/

The yara rules for the malware were extracted from https://malpedia.caad.fkie.fraunhofer.de/

Install

Just clone the repository and install the dependencies with pip. It is preferable to use of a virtual environment of python3.7 or above:

$ pip install -r requirements.txt

lookdga.py

This tool can calculate the domains of various malware that use DGA. It can also detect by domain possible DGA using regular expressions, and finally, it can bruteforce it in order to detect the day and order of generation.

List all DGA available:

$ python lookdga.py -L

View info about a DGA:

$ python lookdga.py -m zloader -I

Generate 5 domains of a DGA:

$ python lookdga.py -m tinba -n 5 -G

Detect possible DGA using reg.exp:

$ python lookdga.py -D google.com nvfowikhevmy.net

Bruteforce previous domains (it is also detected before of bruteforce):

$ python lookdga.py -B google.com nvfowikhevmy.net

statsdga.py

This tool can calculate some domain statistics generated by lookdga.py. It can also calculate statistics for domains from the Alexa top million list.

Its use is self-explanatory:

$ python statsdga.py

detectdga.py

In order to be able to use it, some domain list files are needed first. These files have to be saved in the ml-data directory.

First, download Tranco wordlist from https://tranco-list.eu/list/9Q72/full, and then split the list into three files:

$ sed -n '1,1500000p' tranco.csv | cut -d ',' -f 2 > tranco-main.dom
$ sed -n '1500001,2000000p' tranco.csv | cut -d ',' -f 2 > tranco-test2.dom
$ sed -n '2000001,20000000p' tranco.csv | cut -d ',' -f 2 > tranco-ngram.dom

Second, download the DGA domains for the main dataset from https://data.netlab.360.com/feeds/dga/dga.txt, and save it in the ml-data directory.

To create to second test dataset:

$ python lookdga.py -d 2019-12-31 -n 3000 -C > ml-data/mydgas.dom

The n-gram dictionary was create and saved in the repository, but it's possible to recreate it with:

$ python3 detectdga.py --ngram

Now, we can create the main dataset:

$ python detectdga.py --main

To create the secondary dataset or the alexa dataset use --secondary or --alexa options.

The repository has the full model, but it's possible to train it or train a different model (ngram, nosyll...):

$ python detectdga.py --train ngram

To check it with secondary model, use (alexa with --test_alexa):

$ python detectdga.py --test_secondary ngram

Finally, to test a domains with the full model: (-b option to try bruteforce, remove it to use only the ML model)

$ python detectdga.py -b --check ughdnmmgdpscliraqnpl.com nvfowikhevmy.net uquslaigwaannie.ddns.net google.com facebook.com

About

Study of the different algorithms and techniques for generating DGA and discrimination against non-DGA domains

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors