When we need to quickly share a file or serve a local directory, we often rely on handy, one-line tools like python -m http.server. But what exactly happens under the hood of this “magical” command at the operating system level?
To answer this question, I decided to strip away all the abstractions provided by modern programming languages. Without using any external libraries—not even the standard C library (libc)—I built a fully functional HTTP server communicating directly with the Linux Kernel using pure x86_64 Assembly.
Here is the anatomy of a web server in the “bare-metal” world.
The Foundation: The “Syscall” Dance
While opening a socket takes just one line of code in modern languages, in the Assembly world, you have to invoke System Calls (Syscalls) directly and arrange the CPU registers (RAX, RDI, RSI, etc.) exactly as the Kernel expects them.
The core lifecycle of the server looks like this:
sys_socket(41): Laying the foundation for our TCP (IPv4) bridge.sys_bind(49): Binding our server to the0.0.0.0:8080address. Here, we used thebswapinstruction to convert the port number into network byte order (Big-Endian).sys_listen(50) &sys_accept(43): Listening for incoming browsers orcurlrequests, and generating a new dedicated File Descriptor (FD) for every connected guest.sys_read(0): Reading the rawGET /filename HTTP/1.1request into memory and parsing it. At this stage, to ensure server security, I used manual string operations (SCASB/LODSB) to detect and block Path Traversal (../) attacks.
However, there are two specific areas where this project truly shines: the IP conversion algorithm and Zero-Copy file transfer.
Binary to ASCII: Speaking “Human” IP
When a client connects, sys_accept provides their IP address within a sockaddr_in structure. But this IP isn’t a friendly string like “192.168.1.5”; it’s a 4-byte binary value sitting sequentially in memory. Since Assembly doesn’t have a print() function or inet_ntoa(), printing these 4 bytes to the terminal in a readable format is a serious task.
I had to write a custom routine for this:
- We fetch each octet (byte) of the IP one by one.
- We repeatedly divide the number by 10 (using the
divinstruction) to extract its individual digits. - We convert each digit into text by adding
0x30(the ASCII value of the ‘0’ character). - We use
sys_writeto print them to the terminal, manually inserting dots (.) in between to create that familiarIPv4format.
Peak Performance: sys_sendfile (40)
When sending a file to a client, simple servers usually follow this path: Read the file from disk (sys_read) -> Copy it to User-Space memory -> Write it from memory to the socket (sys_write). This process forces the CPU to waste cycles copying data back and forth.
In this project, we implemented the Zero-Copy architecture used by high-performance modern web servers like Nginx.
After opening the file from disk with sys_open (2) and retrieving its exact size with sys_fstat (5), the real star of the show steps in: sys_sendfile (40). This brilliant system call “teleports” the data directly from the disk to the Network Interface (via Kernel-Space), completely bypassing our User-Space memory.
; Zero-Copy File Transfer
mov rax, 40 ; sys_sendfile
mov rdi, [socketfd_no] ; Destination: Browser/Client Socket
mov rsi, [file_fdno] ; Source: The file on the disk
xor rdx, rdx ; Offset: 0 (Start from the beginning)
mov r10, [statbuff + 48] ; The exact file size returned from fstat
syscall
The result? Maximum I/O performance and minimal CPU overhead.
Conclusion
Hundreds of lines of Assembly code, hours wrestling with segmentation fault errors, and strictly adhering to the unforgiving rules of the HTTP protocol… Was it worth all this effort just to serve a file to a browser?
Absolutely.
Sometimes, “reinventing the wheel” is the only true way to understand the physics of how it turns.
Bare-Metal Assembly HTTP Server source code on github: https://github.com/JM00NJ/http.server