x86 Assembly

Last modified: 2023-08-03

x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor.

Registers

Consists of 8 bytes.
Also, it can be broken down into small segments.
For example, RAX (64 bits) → EAX (32 bits) → AX (16 bits) → AH (high 8 bits), AL (low 8 bits).

General Purpose Registers

They are used for temporarily storing data.

EAX/RAX (Accumulator Register)
It is used to store values (especially, a return value). It's like a variable in high-level programming languages.
It’s usually used to pass the system call(e.g. sys_exit, sys_write) number.
```
# AT&T syntax
mov 4, %eax

# Intel syntax
mov eax, 4
```
EBX/RBX (Base Register)
ECX/RCX (Counter Register)
EDX/RDX (Data Register)
ESI/RSI (Source Index Register)
It is used as the source pointer.
EDI/RDI (Destination Index Register)
It is used as the destination pointer.
EBP/RBP (Base Pointer Register)
It holds the address of the base (bottom) of the stack.
ESP/RSP (Stack Pointer Register)
It is also called as the frame pointer. It holds the address of the top of the stack.
EIP/RIP (Instruction Pointer)
It is the most important register in reverse engineering. It keeps track of the next instruction code to execute. EIP points to the next instruction to execute. It holds the address of the next line of code which will be executed.

Segment Registers

They are used for referencing memory locations.

CS (Code Segment Register)
It contains all the instructions to be executed.
DS (Data Segment Register)
It contains data, constants and work areas.
ES, FS, GS (Extra Segment Register)
SS (Stack Segment Register)
It contains data and return addresses of procedures or subroutines.

Control Registers

A processor register which changes or controls the general behavior of a CPU or other digital device.

CR0
Has various control flags that modify the basic operation of the processor.
CR1
Reserved.
CR2
Contains a value called Page Fault Linear Address (PFLA).
CR3
Used when virtual addressing is enabled.
CR4
Used in protected mode to control operations.

Status/Flags Registers

AF (Adjust Flag)
It is also called as the Auxiliary flag and the Auxiliary Carry flag. The AF is set when a 1-bytes arithmetic operation causes a carry from bit 3 into bit 4.
AC (Alignment Check Flag)
CF (Carry Flag)
It contains the carry of 0 and 1 from a high-order bit (leftmost) after an arithmetic operation. It also stores the contents of last bit of a shift or rotate operation.
DF (Direction Flag)
When the DF is 0, the string operation takes left-to-right direction and when the DF is 1, the string operation takes right-to-left direction.
ID (Identification Flag)
IF (Interrupt Enable Flag)
When the IF is 0, it disables the external interrupt and when the IF is 1, it enables the interrupt.
IOPL (I/O Privilege Level Flag)
NT (Nested Task Flag)
OF (Overflow Flag)
It indicates the overflow of a high-order bit (leftmost bit) of data after a signed arithmetic operation.
PF (Parity Flag)
It indicates the total number of 1-bits in the result obtained from an arithmetic operation. An even number of 1-bits clears the parity flag to 0 and an odd number of 1-bits clears the parity flag to 1.
RF (Resume Flag)
SF (Sign Flag)
It shows the sign of the result of an arithmetic operation. A positive result clears the value of SF to 0 and negative result sets it to 1.
TF (Trap Flag)
It allows setting the operation of the processor in the single-step mode for debugging purposes.
VM (Virtual-8086 Mode Flag)
VIF (Virtual Interrupt Flag)
VIP (Virtual Interrupt Pending Flag)
ZF (Zero Flag)
It indicates the result of an arithmetic or comparison operation. A nonzero result clears the zero flag to 0, and a zero result sets it to 1.

Instructions

Basic Instructions

LEA
Load Effective Address. It loads addresses (not data). It is almost the same as the MOV instruction but does not dereference.

# AT&T syntax - store [esp+0x18] in eax
lea 0x18(%esp), %eax

# Intel syntax - store [esp+0x18] in eax
lea eax, [esp+0x18]

CALL
Call procedure

MOV
Move

# AT&T syntax - move esp into ebp
movl %esp, %ebp

# Intel syntax - move ebp into esp
mov esp, ebp

MOV DWORD
Copy (double word)
MOV QWORD
Copy (quad word)
NOP
No operation
PUSH
Push onto stack. It is used to push data on the top of the stack. The pushed data is often to be restored using the ‘POP’ instruction.
```
push rax
```
POP
Pop stack. It is used to pop data from the top of the stack and store it to the destination address.
```
pop rax
```

Arithmetic Instructions

INC

Increment data by one.

# eax will be 5.

# AT&T syntax
mov 4, %eax
inc %eax

# Intel syntax
mov eax, 4
inc eax

DEC

Decrement data by one.

# eax will be 3.

# AT&T syntax
mov 4, %eax
dec %eax

# Intel syntax
mov eax, 4
dec eax

ADD

Add data and restore the destination.

# eax will be 9.

# AT&T syntax
mov 4, %eax
add 5, %eax

# Intel syntax
mov eax, 4
add eax, 5

SUB

Subtract data and restore the destination.

# eax will be 2.

# AT&T syntax
mov 4, %eax
sub %eax, 2

# Intel syntax
mov eax, 4
sub eax, 2

DIV

Divide EAX/RAX by the source and restore the result to the EDX/RDX (and EAX/RAX).

# edx and eax will be 5.

# AT&T syntax
mov 10, %eax
mov 2, %ebx
div %ebx

# Intel syntax
mov eax, 10
mov ebx, 2
div ebx

IDIV

Divide (signed)

MUL

Multiply (unsigned) EAX/RAX with the source and restore the result to the EDX/RDX (and EAX/RAX).

# edx and eax will be 50.

# AT&T syntax
mov 10, %eax
mov 5, %ebx
mul %ebx

# Intel syntax
mov eax, 10
mov ebx, 5
mul ebx

IMUL

Multiply (signed)

Conditional Instructions

CMOVA

Move if above (CF=0 and ZF=0)
CMOVAE

Move if above or equal (CF=0)
CMOVB

Move if below (CF=1)
CMOVBE

Move if below or equal (CF=1)
CMOVC

Move if carry (CF=1)
CMOVE

Move if equal (ZF=1)
CMOVG

Move if greater (ZF=0 and SF=OF)
CMOVGE

Move if greater or equal (SF=OF)
CMOVL

Move if less (SF≠OF)
CMOVLE

Move if less or equal (ZF=1 or SF≠OF)
CMOVO

Move if overflow (OF=1)
CMOVP

Move if parity (PF=1)
CMOVPE

Move if parity even (PF=1)
CMOVPO

Move if parity odd (PF=0)
CMOVS

Move if sign (SF=1)
CMOVZ

Move if zero (ZF=0)
CMOVNA

Move if not above (CF=1 or ZF=1)
CMOVNAE

Move if not above or equal (CF=1)
CMOVNB

Move if not below (CF=0)
CMOVNBE

Move if not below or equal (CF=0 and ZF=0)
CMOVNC

Move if not carry (CF=0)
CMOVNE

Move if not equal (ZF=0)
CMOVNG

Move if not greater (ZF=1 or SF≠OF)
CMOVNGE

Move if not greater or equal (SF≠OF)
CMOVNL

Move if not less (SF=OF)
CMOVNLE

Move if not less or equal (ZF=0 and SF=OF)
JMP

Jump
JA

Jump if above (CF = 0 and ZF = 0)
JAE

Jump if above or equal
JB

Jump if below
JBE

Jump if below or equal
JC

Jump if carry
JE

Jump if equal
JG

Jump if greater
JGE

Jump if greater or equal
JL

Jump if less
JLE

Jump if less or equal
JO

Jump if overflow
JP

Jump if parity
JPE

Jump if parity even
JPO

Jump if parity odd
JS

Jump if sign
JZ

Jump if zero
JNC

Jump if not carry

JNE

Jump if not equal

# AT&T syntax
mov x, %eax
cmp 4, %eax
jne func_x_is_not_4
call func_x_is_4

# Intel syntax
mov eax, x
cmp eax, 4
jne func_x_is_not_4
call func_x_is_4

JNO

Jump if not overflow
JNP

Jump if not parity
JNS

Jump if not sign

TEST

Set ZF (Zero Flag) to 1 if a bitwise AND is 0.

test %eax,%eax ; set ZF to 1 if eax == 0
je 0xf7eb0f70  ; jump if ZF == 1

Other Instructions

CMP

Compare two operands. It subtracts the destination from the source internally.

# It compares if the eax is 4.

# AT&T syntax
mov 4, %eax
cmp 4, %eax

# Intel syntax
mov eax, 4
cmp eax, 4

RET

Return from procedure.
UD2

Undefined instruction (invalid opcode). It is same as NOP instruction.

Create 32bit Program

To examine 32bit programs or assembly language, you need to prepare a 32bit executable program.
First off, create a sample program with C.

#include <stdio.h>
#include <stdlib.h>

int main(void) {
	printf("Hello world");
	return 0;
}

Then run “gcc” to compile it to the executable file.

# If needed
sudo apt install libc6-dev-i386

# -m32: 32bit
# -ggdb: generate the debug information for GDB (GNU debugger)
gcc -m32 -gdb -o sample sample.c

You can use this for debugging.

chmod 700 sample
gdb sample

If you want to convert the C program to assembly, run this command.

# -S: AT&T syntax
# -O0: No optimization
gcc -m32 -S -O0 sample.c

The above command generates "sample.s".
Then compile it to binary object.

Finally you need to use a linker to create the actual binary executable file.

gcc -m32 sample.o -o sample

Information of Executable File

objdump -d <executable-file>
objdump -d sample

# -M: specific disassemble option
objdump -d -M att sample
objdump -d -M intel sample

Create Assembly Programs

Every assembly language program is divided into three sections.

Data section is used for declaring initialized data or constants. The data does not change at runtime.
BSS section is the block starting symbol. used for declaring uninitialized data or variables.
Text section is used for the actual code sections as it begins with a global _start which tells the kernel where execution begins.

Example (AT&T syntax)

It is often used in Linux. First, create "sample.s"

.section .data
    result:
        .asciz "The smallest value is "
    lr:
        .ascii ".\n"
    constant:
        .int 10
    constants:
        .int  5, 8, 17, 44, 50, 52, 60, 65, 70, 77, 80      # array

.section .bss
    .comm answer, 1
    .lcomm buffer 1

.section .text
    .globl _start

_start:
    nop                                         # used for debugging purposes

mov_data_to_registers:
    movl $100, %eax                             # mov 100 into the EAX register
    movl $0x50, buffer                          # mov 0x50 into buffer memory location

mov_data_between_memory_and_registers:
    movl constant, %ecx

indirect_addressing:
    movl constants, %eax                        # mov constants value into eax
    movl constants, %edi                        # mov memory address into edi
    movl $25, 4(%edi)                           # mov immediate val 4b after edi ptr
    movl $1, %edi                               # load 2nd index constants label
    movl constants(, %edi, 4), %ebx

find_smallest_value:
    movl constants(, %edi, 4), %eax
    cmp %ebx, %eax
    cmovb %eax, %ebx
    inc %edi
    cmp $8, %edi
    jne find_smallest_value
    addl $0x30, %ebx
    movl %ebx, answer

    movl $4, %eax
    movl $1, %ebx
    movl $result, %ecx
    movl $23, %edx
    int $0x80

    movl $4, %eax
    movl $1, %ebx
    movl $answer, %ecx
    movl $1, %edx
    int $0x80                                   # call sys_write

    movl $4, %eax
    movl $1, %ebx
    movl $lr, %ecx
    movl $2, %edx
    int $0x80                                   # call sys_write

exit:
    movl $1, %eax                               # sys_exit system call
    movl $0, %ebx                               # exit code 0 successful execution
    int $0x80                                   # call sys_exit

To compile it, run the following two commands.

# assembler
as -32 -o sample.o sample.s
# linker
ld -m elf_i386 -o sample sample.o

Example (Intel syntax)

It is often used in Windows. Create "hello_world.asm".

section .data
    msg db 'Hello, World!', 0xa ;string to be printed. db means the Define Byte.
    len equ $ - msg  ;length of the string. equ means 'equate'. '$' means the current address.

section .text
    global _start  ;linker (ld)

_start:
    mov edx,len  ;message length
    mov ecx,msg  ;message to write
    mov ebx,1    ;file descriptor (stdout)
    mov eax,4    ;system call number (sys_write)
    int 0x80     ;call kernel

    mov eax,1    ;system call number (sys_exit)
    int 0x80     ;call kernel

To assemble the program, run the following command. Then the object file will be created.

# 32-bit
nasm -f elf32 hello_world.asm -o hello_world.o
ld -m elf_i386 hello_world.o -o hello_world

# 64-bit
nasm -f elf64 hello_world.asm -o hello_world.o
ld hello_world.o -o hello_world

Assembly

Cheatsheet

Reversing

Debugger

Others

x86 Assembly

ON THIS PAGE