Subwiz: A Lightweight GPT Model for Subdomain Discovery

Summary
Subwiz is an innovative, lightweight GPT model designed specifically for discovering subdomains. It leverages a transformer architecture, trained on extensive subdomain lists, to predict new subdomains efficiently. This tool is ideal for security researchers and developers looking to expand their subdomain enumeration capabilities.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
Subwiz is a powerful, yet lightweight, GPT model meticulously trained to discover subdomains. Developed by Hadrian Security, this tool provides an advanced approach to subdomain enumeration, utilizing a transformer architecture to predict potential subdomains based on known inputs. It's a valuable asset for cybersecurity professionals and researchers aiming to enhance their reconnaissance efforts.
Installation
Installing Subwiz is straightforward, allowing you to quickly integrate it into your workflow. You can install it using pipx or pip:
pipx install subwiz
OR
pip install subwiz
Examples
Subwiz is designed to work effectively, often in conjunction with other tools, or as a standalone Python library.
Recommended Command-Line Use
For optimal results, it's recommended to first use a tool like Subfinder to gather subdomains from passive sources, then feed those into Subwiz:
subfinder -d example.com -o subdomains.txt
subwiz -i subdomains.txt
Using Subwiz in Python
Subwiz can also be seamlessly integrated into your Python scripts, offering programmatic control over its subdomain discovery capabilities:
import subwiz
known_subdomains = ['test1.example.com', 'test2.example.com']
new_subdomains = subwiz.run(input_domains=known_subdomains)
print(new_subdomains)
Supported Switches
Subwiz offers various command-line options to fine-tune its behavior:
usage: cli.py [-h] -i INPUT_FILE [-o OUTPUT_FILE] [-n NUM_PREDICTIONS] [--no-resolve]
[--force-download] [--max-recursion MAX_RECURSION] [-t TEMPERATURE]
[-d {auto,cpu,cuda,mps}] [-m MAX_NEW_TOKENS]
[--resolution-concurrency RESOLUTION_CONCURRENCY] [--multi-apex] [-q] [-s]
options:
-h, --help show this help message and exit
-i, --input-file INPUT_FILE
file containing new-line-separated subdomains. (default: None)
-o, --output-file OUTPUT_FILE
output file to write new-line separated subdomains to. (default: None)
-n, --num-predictions NUM_PREDICTIONS
number of subdomains to predict. (default: 500)
--no-resolve do not resolve the output subdomains. (default: False)
--force-download download model and tokenizer files, even if cached. (default: False)
--max-recursion MAX_RECURSION
maximum number of times the inference process will recursively re-run
after finding new subdomains. (default: 5)
-t, --temperature TEMPERATURE
add randomness to the model (recommended ? 0.3). (default: 0.0)
-d, --device {auto,cpu,cuda,mps}
hardware to run the transformer model on. (default: auto)
-m, --max-new-tokens MAX_NEW_TOKENS
maximum length of predicted subdomains in tokens. (default: 10)
--resolution-concurrency RESOLUTION_CONCURRENCY
number of concurrent resolutions. (default: 128)
--multi-apex allow multiple apex domains in the input file. runs inference for each
apex separately. (default: False)
-q, --quiet useful for piping into another tool. (default: False)
-s, --silent do not print any output. requires --output-file. (default: False)
Why Use Subwiz
Subwiz stands out due to its innovative application of a lightweight GPT model for subdomain discovery. Unlike traditional brute-forcing or dictionary-based methods, Subwiz learns patterns from existing subdomains to intelligently predict new ones, potentially uncovering subdomains that other tools might miss.
Model Architecture
Subwiz is built upon an ultra-lightweight transformer model, inspired by nanoGPT. It boasts 17.3 million parameters and was trained on 26 million tokens, derived from extensive lists of subdomains from passive sources. Its tokenizer was also trained on these same lists, comprising 8192 tokens.
Intelligent Inference
While many generative transformer models predict a single output sequence, Subwiz employs a beam search algorithm to predict the N most likely sequences. This approach enhances its ability to generate a diverse and relevant set of potential subdomains, making it a highly effective tool for comprehensive enumeration.
Hugging Face Integration
The Subwiz model is conveniently hosted on Hugging Face as HadrianSecurity/subwiz. The model and tokenizer files are automatically downloaded the first time you run Subwiz, ensuring a smooth setup process.
Links
- GitHub Repository: hadriansecurity/subwiz
- Hugging Face Model: HadrianSecurity/subwiz