Subwiz: A Lightweight GPT Model for Subdomain Discovery

Introduction

Subwiz is a powerful, yet lightweight, GPT model meticulously trained to discover subdomains. Developed by Hadrian Security, this tool provides an advanced approach to subdomain enumeration, utilizing a transformer architecture to predict potential subdomains based on known inputs. It's a valuable asset for cybersecurity professionals and researchers aiming to enhance their reconnaissance efforts.

Installation

Installing Subwiz is straightforward, allowing you to quickly integrate it into your workflow. You can install it using pipx or pip:

pipx install subwiz

pip install subwiz

Examples

Subwiz is designed to work effectively, often in conjunction with other tools, or as a standalone Python library.

Recommended Command-Line Use

For optimal results, it's recommended to first use a tool like Subfinder to gather subdomains from passive sources, then feed those into Subwiz:

subfinder -d example.com -o subdomains.txt
subwiz -i subdomains.txt

Using Subwiz in Python

Subwiz can also be seamlessly integrated into your Python scripts, offering programmatic control over its subdomain discovery capabilities:

import subwiz

known_subdomains = ['test1.example.com', 'test2.example.com']
new_subdomains = subwiz.run(input_domains=known_subdomains)
print(new_subdomains)

Supported Switches

Subwiz offers various command-line options to fine-tune its behavior:

usage: cli.py [-h] -i INPUT_FILE [-o OUTPUT_FILE] [-n NUM_PREDICTIONS] [--no-resolve]
              [--force-download] [--max-recursion MAX_RECURSION] [-t TEMPERATURE]
              [-d {auto,cpu,cuda,mps}] [-m MAX_NEW_TOKENS]
              [--resolution-concurrency RESOLUTION_CONCURRENCY] [--multi-apex] [-q] [-s]

options:
  -h, --help            show this help message and exit
  -i, --input-file INPUT_FILE
                        file containing new-line-separated subdomains. (default: None)
  -o, --output-file OUTPUT_FILE
                        output file to write new-line separated subdomains to. (default: None)
  -n, --num-predictions NUM_PREDICTIONS
                        number of subdomains to predict. (default: 500)
  --no-resolve          do not resolve the output subdomains. (default: False)
  --force-download      download model and tokenizer files, even if cached. (default: False)
  --max-recursion MAX_RECURSION
                        maximum number of times the inference process will recursively re-run
                        after finding new subdomains. (default: 5)
  -t, --temperature TEMPERATURE
                        add randomness to the model (recommended ? 0.3). (default: 0.0)
  -d, --device {auto,cpu,cuda,mps}
                        hardware to run the transformer model on. (default: auto)
  -m, --max-new-tokens MAX_NEW_TOKENS
                        maximum length of predicted subdomains in tokens. (default: 10)
  --resolution-concurrency RESOLUTION_CONCURRENCY
                        number of concurrent resolutions. (default: 128)
  --multi-apex          allow multiple apex domains in the input file. runs inference for each
                        apex separately. (default: False)
  -q, --quiet           useful for piping into another tool. (default: False)
  -s, --silent          do not print any output. requires --output-file. (default: False)

Why Use Subwiz

Subwiz stands out due to its innovative application of a lightweight GPT model for subdomain discovery. Unlike traditional brute-forcing or dictionary-based methods, Subwiz learns patterns from existing subdomains to intelligently predict new ones, potentially uncovering subdomains that other tools might miss.

Model Architecture

Subwiz is built upon an ultra-lightweight transformer model, inspired by nanoGPT. It boasts 17.3 million parameters and was trained on 26 million tokens, derived from extensive lists of subdomains from passive sources. Its tokenizer was also trained on these same lists, comprising 8192 tokens.

Intelligent Inference

While many generative transformer models predict a single output sequence, Subwiz employs a beam search algorithm to predict the N most likely sequences. This approach enhances its ability to generate a diverse and relevant set of potential subdomains, making it a highly effective tool for comprehensive enumeration.

Hugging Face Integration

The Subwiz model is conveniently hosted on Hugging Face as HadrianSecurity/subwiz. The model and tokenizer files are automatically downloaded the first time you run Subwiz, ensuring a smooth setup process.