Web Content Discovery

Last modified: 2024-03-17

Web

If we want to find hidden directories or files, we can enumerate them manually/automatically.

Manual Discovery

# Settings files
/robots.txt
/security.txt
/.well-known/security.txt
/.well-known/apple-app-site-association
/.well-known/assetlinks.json
/sitemap.xml
/sitemaps.xml

# JavaScript files
/main.js
/script.js
/js/jquery.min.js
/js/main.js
/js/script.js

# CGI scripts
/cgi-bin/example.cgi

# Wave dashes
/~files/
/~hidden/

# PHP files
/index.php
/config.php
/403.php
/404.php

# Python files
/main.py
/module.py
/module/__init__.py
/modules/__init__.py
__init__.py
config.ini
project.wsgi

# Archives
/example.zip
/backup.zip
/backups.zip

# Backup files
/example.bak
/example.jpg.bak
/images/example.jpg.bak

# Directories
/admin/
/blog/

# Sensitive information
/.env

# GitHub
/README.md
/.git
/.github
/.gitignore

# Apache Tomcat
/manager

# ASP.NET
/trace.axd
/example.asp
/example.aspx
/example.aspx/trace.axd
/web.config

# If you know the users manage the website, try the usernames
/admin
/administrator
/john
/michael

# API endpoints
/api/login
/api/signin
/api/user
/api/user/1
/api/users
/api/v1/
/api/v2/

# If we have the secret keyword found when investigating, we can attempt to access following contents.
/<keyword>
/<keyword>.html
/<keyword>.txt
/<keyword>.php
/<keyword>.py
/?<keyword>=test

# We might be able to access directories by using keywords we found.
/<site_title>
/<site_theme>
/<site_author>
/<image_theme>
/?<post_param>=test

Wordlists

CeWL

CeWL is a curstom wordlist generator from websites.

# -d: Depth (default: 2)
# -w: Write the output to the file
cewl -d 3 https://example.com/ -w output.txt

SecLists

SecLists is a collection of multiple types of lists.
They are usually located in /usr/share/seclsits/ in Linux.

less /usr/share/seclists/Discovery/Web-Content/common.txt

Automation

Ffuf

For bug bounty programs, set the ‘-t’ flag and the ‘-p’ flag to decrease requests per second.

# Avoid rate limiting
# -rate: Request per second
# -t: The number of threads
ffuf -u https://example.com/FUZZ -w wordlist.txt -rate 1 -t 1

# FUZZ Variations
ffuf -u https://example.com/FUZZ -w wordlist.txt 
ffuf -u https://example.com/.FUZZ -w wordlist.txt
ffuf -u https://example.com/FUZZ.txt -w wordlist.txt
ffuf -u https://example.com/FUZZ.php -w wordlist.txt
ffuf -u https://example.com/index.php?FUZZ=test -w wordlist.txt

# -X POST: Send POST requests
ffuf -u https://example.com/FUZZ -X POST -w wordlist.txt

# -t: Threads e.g. 5 threads
# -p: Pause N seconds per request e.g. 0.1 seconds
ffuf -u http://example.com/FFUF -w wordlist.txt -t 5 -p 0.1

# Custom header (-H)
ffuf -H "Cookie: key=value" -u https://example.com/FUZZ -w wordlist.txt 

# -mc: Match HTTP statuc code
ffuf -u http://example.com/FUZZ -w wordlist.txt -mc 200
# 422 status code
ffuf -u https://example.com/FUZZ -w wordlist.txt -mc 422
# -ms: Match HTTP response size
ffuf -u http://example.com/FUZZ -w wordlist.txt -ms 1234
ffuf -u http://example.com/FUZZ -w wordlist.txt -ms 50-300

# -fc: Filter HTTP statuc code
ffuf -u http://example.com/FUZZ -w wordlist.txt -fc 302
# -fs: Filter HTTP response size
ffuf -u http://example.com/FUZZ -w wordlist.txt -fs 1234
ffuf -u http://example.com/FUZZ -w wordlist.txt -fs 50-300

# File extensions
ffuf -u https://example.com/FUZZ -e .html,.txt,.js,.php,.py,.asp,.json -w wordlist.txt

For fuzzing with numbers, we can use the following commands.

for i in {0..255}; do echo $i; done | ffuf -u 'http://example.com/?id=FUZZ' -w -

seq 0 255 | ffuf -u 'http://example.com/?id=FUZZ' -w -

Dirsearch

Dirsearch is a web path scanner.
For bug bounty programs, set the flag “-t” and “—max-rate” to decrease requests per second.

dirsearch -u https://example.com/

# -w: wordlist
dirsearch -u https://example.com/ -w wordlist.txt

# -t: number of threads
# --max-rate: max requests per second
dirsearch -u https://example.com/ -t 1 --max-rate=1

# -m: Method
dirsearch -m POST -u https://example.com/

# Extensions
dirsearch -u https://example.com -e html,txt,js,php,py,asp,json -w wordlist.txt

Gobuster

gobuster dir -u https://example.com -w wordlist.txt

Dirb

dirb https://example.com/
dirb https://example.com/ wordlist.txt

# Custom header (-H)
dirb https://example.com/ -H "Authorization: Basic {token}" wordlist.txt
# File Extensions (-X)
dirb https://example.com/ -X .txt

FeroxBuster

FeroxBuster is a recursive content discovery.

feroxbuster -u https://vulnerable.com

# Specify extensions (-x)
feroxbuster -u https://vulnerable.com -x html,js,php
# No recursion (-n)
feroxbuster -u https://vulnerable.com -n
# Custom header (-H)
feroxbuster -u https://vulnerable.com -H "Authorization: Bearer {token}"

Hakrawler

Hakrawler is a simple web crawler designed for quick discovery of endpoints and assets within a web application.

echo https://vulnerable.com | hakrawler

Wfuzz

# -w: wordlist (alias for -z file,wordlist)
wfuzz -w wordlist.txt https://example.com/FUZZ
# -z: payload
wfuzz -z file,wordlist.txt https://example.com/FUZZ

Framework Detection from Favicon

Get the information of the used framework from favicon.

curl https://vulnerable.com/images/favicon.ico | md5sum

Then check what is the framework used in the website with the OWASP Favicon Database.


Parsing .DS_Store

ds_store_exp is a tool that parses .DS_Store file and downloads files recursively.

pip3 install ds-store
python3 ds_store_exp.py https://example.com/.DS_Store