Web Content Discovery
Last modified: 2024-03-17
If we want to find hidden directories or files, we can enumerate them manually/automatically.
Manual Discovery
# Settings files
/robots.txt
/security.txt
/.well-known/security.txt
/.well-known/apple-app-site-association
/.well-known/assetlinks.json
/sitemap.xml
/sitemaps.xml
# JavaScript files
/main.js
/script.js
/js/jquery.min.js
/js/main.js
/js/script.js
# CGI scripts
/cgi-bin/example.cgi
# Wave dashes
/~files/
/~hidden/
# PHP files
/index.php
/config.php
/403.php
/404.php
# Python files
/main.py
/module.py
/module/__init__.py
/modules/__init__.py
__init__.py
config.ini
project.wsgi
# Archives
/example.zip
/backup.zip
/backups.zip
# Backup files
/example.bak
/example.jpg.bak
/images/example.jpg.bak
# Directories
/admin/
/blog/
# Sensitive information
/.env
# GitHub
/README.md
/.git
/.github
/.gitignore
# Apache Tomcat
/manager
# ASP.NET
/trace.axd
/example.asp
/example.aspx
/example.aspx/trace.axd
/web.config
# If you know the users manage the website, try the usernames
/admin
/administrator
/john
/michael
# API endpoints
/api/login
/api/signin
/api/user
/api/user/1
/api/users
/api/v1/
/api/v2/
# If we have the secret keyword found when investigating, we can attempt to access following contents.
/<keyword>
/<keyword>.html
/<keyword>.txt
/<keyword>.php
/<keyword>.py
/?<keyword>=test
# We might be able to access directories by using keywords we found.
/<site_title>
/<site_theme>
/<site_author>
/<image_theme>
/?<post_param>=test
Wordlists
CeWL
CeWL is a curstom wordlist generator from websites.
# -d: Depth (default: 2)
# -w: Write the output to the file
cewl -d 3 https://example.com/ -w output.txt
SecLists
SecLists is a collection of multiple types of lists.
They are usually located in /usr/share/seclsits/ in Linux.
less /usr/share/seclists/Discovery/Web-Content/common.txt
Automation
Ffuf
For bug bounty programs, set the ‘-t’ flag and the ‘-p’ flag to decrease requests per second.
# Avoid rate limiting
# -rate: Request per second
# -t: The number of threads
ffuf -u https://example.com/FUZZ -w wordlist.txt -rate 1 -t 1
# FUZZ Variations
ffuf -u https://example.com/FUZZ -w wordlist.txt
ffuf -u https://example.com/.FUZZ -w wordlist.txt
ffuf -u https://example.com/FUZZ.txt -w wordlist.txt
ffuf -u https://example.com/FUZZ.php -w wordlist.txt
ffuf -u https://example.com/index.php?FUZZ=test -w wordlist.txt
# -X POST: Send POST requests
ffuf -u https://example.com/FUZZ -X POST -w wordlist.txt
# -t: Threads e.g. 5 threads
# -p: Pause N seconds per request e.g. 0.1 seconds
ffuf -u http://example.com/FFUF -w wordlist.txt -t 5 -p 0.1
# Custom header (-H)
ffuf -H "Cookie: key=value" -u https://example.com/FUZZ -w wordlist.txt
# -mc: Match HTTP statuc code
ffuf -u http://example.com/FUZZ -w wordlist.txt -mc 200
# 422 status code
ffuf -u https://example.com/FUZZ -w wordlist.txt -mc 422
# -ms: Match HTTP response size
ffuf -u http://example.com/FUZZ -w wordlist.txt -ms 1234
ffuf -u http://example.com/FUZZ -w wordlist.txt -ms 50-300
# -fc: Filter HTTP statuc code
ffuf -u http://example.com/FUZZ -w wordlist.txt -fc 302
# -fs: Filter HTTP response size
ffuf -u http://example.com/FUZZ -w wordlist.txt -fs 1234
ffuf -u http://example.com/FUZZ -w wordlist.txt -fs 50-300
# File extensions
ffuf -u https://example.com/FUZZ -e .html,.txt,.js,.php,.py,.asp,.json -w wordlist.txt
For fuzzing with numbers, we can use the following commands.
for i in {0..255}; do echo $i; done | ffuf -u 'http://example.com/?id=FUZZ' -w -
seq 0 255 | ffuf -u 'http://example.com/?id=FUZZ' -w -
Dirsearch
Dirsearch is a web path scanner.
For bug bounty programs, set the flag “-t” and “—max-rate” to decrease requests per second.
dirsearch -u https://example.com/
# -w: wordlist
dirsearch -u https://example.com/ -w wordlist.txt
# -t: number of threads
# --max-rate: max requests per second
dirsearch -u https://example.com/ -t 1 --max-rate=1
# -m: Method
dirsearch -m POST -u https://example.com/
# Extensions
dirsearch -u https://example.com -e html,txt,js,php,py,asp,json -w wordlist.txt
Gobuster
gobuster dir -u https://example.com -w wordlist.txt
Dirb
dirb https://example.com/
dirb https://example.com/ wordlist.txt
# Custom header (-H)
dirb https://example.com/ -H "Authorization: Basic {token}" wordlist.txt
# File Extensions (-X)
dirb https://example.com/ -X .txt
FeroxBuster
FeroxBuster is a recursive content discovery.
feroxbuster -u https://vulnerable.com
# Specify extensions (-x)
feroxbuster -u https://vulnerable.com -x html,js,php
# No recursion (-n)
feroxbuster -u https://vulnerable.com -n
# Custom header (-H)
feroxbuster -u https://vulnerable.com -H "Authorization: Bearer {token}"
Hakrawler
Hakrawler is a simple web crawler designed for quick discovery of endpoints and assets within a web application.
echo https://vulnerable.com | hakrawler
Wfuzz
# -w: wordlist (alias for -z file,wordlist)
wfuzz -w wordlist.txt https://example.com/FUZZ
# -z: payload
wfuzz -z file,wordlist.txt https://example.com/FUZZ
Framework Detection from Favicon
Get the information of the used framework from favicon.
curl https://vulnerable.com/images/favicon.ico | md5sum
Then check what is the framework used in the website with the OWASP Favicon Database.
Parsing .DS_Store
ds_store_exp is a tool that parses .DS_Store file and downloads files recursively.
pip3 install ds-store
python3 ds_store_exp.py https://example.com/.DS_Store