x86 Assembly
Last modified: 2023-08-03
x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor.
Registers
Consists of 8 bytes.
Also, it can be broken down into small segments.
For example, RAX (64 bits) → EAX (32 bits) → AX (16 bits) → AH (high 8 bits), AL (low 8 bits).
General Purpose Registers
They are used for temporarily storing data.
-
EAX/RAX (Accumulator Register)
It is used to store values (especially, a return value). It's like a variable in high-level programming languages.
It’s usually used to pass the system call(e.g.sys_exit
,sys_write
) number.# AT&T syntax mov 4, %eax # Intel syntax mov eax, 4
-
EBX/RBX (Base Register)
-
ECX/RCX (Counter Register)
-
EDX/RDX (Data Register)
-
ESI/RSI (Source Index Register)
It is used as the source pointer. -
EDI/RDI (Destination Index Register)
It is used as the destination pointer. -
EBP/RBP (Base Pointer Register)
It holds the address of the base (bottom) of the stack. -
ESP/RSP (Stack Pointer Register)
It is also called as the frame pointer. It holds the address of the top of the stack. -
EIP/RIP (Instruction Pointer)
It is the most important register in reverse engineering. It keeps track of the next instruction code to execute. EIP points to the next instruction to execute. It holds the address of the next line of code which will be executed.
Segment Registers
They are used for referencing memory locations.
- CS (Code Segment Register)
It contains all the instructions to be executed. - DS (Data Segment Register)
It contains data, constants and work areas. - ES, FS, GS (Extra Segment Register)
- SS (Stack Segment Register)
It contains data and return addresses of procedures or subroutines.
Control Registers
A processor register which changes or controls the general behavior of a CPU or other digital device.
- CR0
Has various control flags that modify the basic operation of the processor. - CR1
Reserved. - CR2
Contains a value called Page Fault Linear Address (PFLA). - CR3
Used when virtual addressing is enabled. - CR4
Used in protected mode to control operations.
Status/Flags Registers
- AF (Adjust Flag)
It is also called as the Auxiliary flag and the Auxiliary Carry flag. The AF is set when a 1-bytes arithmetic operation causes a carry from bit 3 into bit 4. - AC (Alignment Check Flag)
- CF (Carry Flag)
It contains the carry of 0 and 1 from a high-order bit (leftmost) after an arithmetic operation. It also stores the contents of last bit of a shift or rotate operation. - DF (Direction Flag)
When the DF is 0, the string operation takes left-to-right direction and when the DF is 1, the string operation takes right-to-left direction. - ID (Identification Flag)
- IF (Interrupt Enable Flag)
When the IF is 0, it disables the external interrupt and when the IF is 1, it enables the interrupt. - IOPL (I/O Privilege Level Flag)
- NT (Nested Task Flag)
- OF (Overflow Flag)
It indicates the overflow of a high-order bit (leftmost bit) of data after a signed arithmetic operation. - PF (Parity Flag)
It indicates the total number of 1-bits in the result obtained from an arithmetic operation. An even number of 1-bits clears the parity flag to 0 and an odd number of 1-bits clears the parity flag to 1. - RF (Resume Flag)
- SF (Sign Flag)
It shows the sign of the result of an arithmetic operation. A positive result clears the value of SF to 0 and negative result sets it to 1. - TF (Trap Flag)
It allows setting the operation of the processor in the single-step mode for debugging purposes. - VM (Virtual-8086 Mode Flag)
- VIF (Virtual Interrupt Flag)
- VIP (Virtual Interrupt Pending Flag)
- ZF (Zero Flag)
It indicates the result of an arithmetic or comparison operation. A nonzero result clears the zero flag to 0, and a zero result sets it to 1.
Instructions
Basic Instructions
-
LEA
Load Effective Address. It loads addresses (not data). It is almost the same as the MOV instruction but does not dereference.# AT&T syntax - store [esp+0x18] in eax lea 0x18(%esp), %eax # Intel syntax - store [esp+0x18] in eax lea eax, [esp+0x18]
-
CALL
Call procedure -
MOV
Move# AT&T syntax - move esp into ebp movl %esp, %ebp # Intel syntax - move ebp into esp mov esp, ebp
-
MOV DWORD
Copy (double word) -
MOV QWORD
Copy (quad word) -
NOP
No operation -
PUSH
Push onto stack. It is used to push data on the top of the stack. The pushed data is often to be restored using the ‘POP’ instruction.push rax
-
POP
Pop stack. It is used to pop data from the top of the stack and store it to the destination address.pop rax
Arithmetic Instructions
-
INC
Increment data by one.
# eax will be 5. # AT&T syntax mov 4, %eax inc %eax # Intel syntax mov eax, 4 inc eax
-
DEC
Decrement data by one.
# eax will be 3. # AT&T syntax mov 4, %eax dec %eax # Intel syntax mov eax, 4 dec eax
-
ADD
Add data and restore the destination.
# eax will be 9. # AT&T syntax mov 4, %eax add 5, %eax # Intel syntax mov eax, 4 add eax, 5
-
SUB
Subtract data and restore the destination.
# eax will be 2. # AT&T syntax mov 4, %eax sub %eax, 2 # Intel syntax mov eax, 4 sub eax, 2
-
DIV
Divide EAX/RAX by the source and restore the result to the EDX/RDX (and EAX/RAX).
# edx and eax will be 5. # AT&T syntax mov 10, %eax mov 2, %ebx div %ebx # Intel syntax mov eax, 10 mov ebx, 2 div ebx
-
IDIV
Divide (signed)
-
MUL
Multiply (unsigned) EAX/RAX with the source and restore the result to the EDX/RDX (and EAX/RAX).
# edx and eax will be 50. # AT&T syntax mov 10, %eax mov 5, %ebx mul %ebx # Intel syntax mov eax, 10 mov ebx, 5 mul ebx
-
IMUL
Multiply (signed)
Conditional Instructions
-
CMOVA
Move if above (CF=0 and ZF=0)
-
CMOVAE
Move if above or equal (CF=0)
-
CMOVB
Move if below (CF=1)
-
CMOVBE
Move if below or equal (CF=1)
-
CMOVC
Move if carry (CF=1)
-
CMOVE
Move if equal (ZF=1)
-
CMOVG
Move if greater (ZF=0 and SF=OF)
-
CMOVGE
Move if greater or equal (SF=OF)
-
CMOVL
Move if less (SF≠OF)
-
CMOVLE
Move if less or equal (ZF=1 or SF≠OF)
-
CMOVO
Move if overflow (OF=1)
-
CMOVP
Move if parity (PF=1)
-
CMOVPE
Move if parity even (PF=1)
-
CMOVPO
Move if parity odd (PF=0)
-
CMOVS
Move if sign (SF=1)
-
CMOVZ
Move if zero (ZF=0)
-
CMOVNA
Move if not above (CF=1 or ZF=1)
-
CMOVNAE
Move if not above or equal (CF=1)
-
CMOVNB
Move if not below (CF=0)
-
CMOVNBE
Move if not below or equal (CF=0 and ZF=0)
-
CMOVNC
Move if not carry (CF=0)
-
CMOVNE
Move if not equal (ZF=0)
-
CMOVNG
Move if not greater (ZF=1 or SF≠OF)
-
CMOVNGE
Move if not greater or equal (SF≠OF)
-
CMOVNL
Move if not less (SF=OF)
-
CMOVNLE
Move if not less or equal (ZF=0 and SF=OF)
-
JMP
Jump
-
JA
Jump if above (CF = 0 and ZF = 0)
-
JAE
Jump if above or equal
-
JB
Jump if below
-
JBE
Jump if below or equal
-
JC
Jump if carry
-
JE
Jump if equal
-
JG
Jump if greater
-
JGE
Jump if greater or equal
-
JL
Jump if less
-
JLE
Jump if less or equal
-
JO
Jump if overflow
-
JP
Jump if parity
-
JPE
Jump if parity even
-
JPO
Jump if parity odd
-
JS
Jump if sign
-
JZ
Jump if zero
-
JNC
Jump if not carry
-
JNE
Jump if not equal
# AT&T syntax mov x, %eax cmp 4, %eax jne func_x_is_not_4 call func_x_is_4 # Intel syntax mov eax, x cmp eax, 4 jne func_x_is_not_4 call func_x_is_4
-
JNO
Jump if not overflow
-
JNP
Jump if not parity
-
JNS
Jump if not sign
-
TEST
Set ZF (Zero Flag) to 1 if a bitwise AND is 0.
test %eax,%eax ; set ZF to 1 if eax == 0 je 0xf7eb0f70 ; jump if ZF == 1
Other Instructions
-
CMP
Compare two operands. It subtracts the destination from the source internally.
# It compares if the eax is 4. # AT&T syntax mov 4, %eax cmp 4, %eax # Intel syntax mov eax, 4 cmp eax, 4
-
RET
Return from procedure.
-
UD2
Undefined instruction (invalid opcode). It is same as NOP instruction.
Create 32bit Program
To examine 32bit programs or assembly language, you need to prepare a 32bit executable program.
First off, create a sample program with C.
#include <stdio.h>
#include <stdlib.h>
int main(void) {
printf("Hello world");
return 0;
}
Then run “gcc” to compile it to the executable file.
# If needed
sudo apt install libc6-dev-i386
# -m32: 32bit
# -ggdb: generate the debug information for GDB (GNU debugger)
gcc -m32 -gdb -o sample sample.c
You can use this for debugging.
chmod 700 sample
gdb sample
If you want to convert the C program to assembly, run this command.
# -S: AT&T syntax
# -O0: No optimization
gcc -m32 -S -O0 sample.c
The above command generates "sample.s".
Then compile it to binary object.
Finally you need to use a linker to create the actual binary executable file.
gcc -m32 sample.o -o sample
Information of Executable File
objdump -d <executable-file>
objdump -d sample
# -M: specific disassemble option
objdump -d -M att sample
objdump -d -M intel sample
Create Assembly Programs
Every assembly language program is divided into three sections.
- Data section is used for declaring initialized data or constants. The data does not change at runtime.
- BSS section is the block starting symbol. used for declaring uninitialized data or variables.
- Text section is used for the actual code sections as it begins with a global _start which tells the kernel where execution begins.
Example (AT&T syntax)
It is often used in Linux. First, create "sample.s"
.section .data
result:
.asciz "The smallest value is "
lr:
.ascii ".\n"
constant:
.int 10
constants:
.int 5, 8, 17, 44, 50, 52, 60, 65, 70, 77, 80 # array
.section .bss
.comm answer, 1
.lcomm buffer 1
.section .text
.globl _start
_start:
nop # used for debugging purposes
mov_data_to_registers:
movl $100, %eax # mov 100 into the EAX register
movl $0x50, buffer # mov 0x50 into buffer memory location
mov_data_between_memory_and_registers:
movl constant, %ecx
indirect_addressing:
movl constants, %eax # mov constants value into eax
movl constants, %edi # mov memory address into edi
movl $25, 4(%edi) # mov immediate val 4b after edi ptr
movl $1, %edi # load 2nd index constants label
movl constants(, %edi, 4), %ebx
find_smallest_value:
movl constants(, %edi, 4), %eax
cmp %ebx, %eax
cmovb %eax, %ebx
inc %edi
cmp $8, %edi
jne find_smallest_value
addl $0x30, %ebx
movl %ebx, answer
movl $4, %eax
movl $1, %ebx
movl $result, %ecx
movl $23, %edx
int $0x80
movl $4, %eax
movl $1, %ebx
movl $answer, %ecx
movl $1, %edx
int $0x80 # call sys_write
movl $4, %eax
movl $1, %ebx
movl $lr, %ecx
movl $2, %edx
int $0x80 # call sys_write
exit:
movl $1, %eax # sys_exit system call
movl $0, %ebx # exit code 0 successful execution
int $0x80 # call sys_exit
To compile it, run the following two commands.
# assembler
as -32 -o sample.o sample.s
# linker
ld -m elf_i386 -o sample sample.o
Example (Intel syntax)
It is often used in Windows. Create "hello_world.asm".
section .data
msg db 'Hello, World!', 0xa ;string to be printed. db means the Define Byte.
len equ $ - msg ;length of the string. equ means 'equate'. '$' means the current address.
section .text
global _start ;linker (ld)
_start:
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
To assemble the program, run the following command. Then the object file will be created.
# 32-bit
nasm -f elf32 hello_world.asm -o hello_world.o
ld -m elf_i386 hello_world.o -o hello_world
# 64-bit
nasm -f elf64 hello_world.asm -o hello_world.o
ld hello_world.o -o hello_world