Research Context

As the development of my ICMP-based Network Communication Project continues at full throttle, today I want to talk about the most “diplomatic” part of the operation: the Checksum. If you don’t stamp this seal correctly on the packet you’re sending, the Target host’s operating system treats your packet as a “Malformed data” and dumps it in the trash before it even gets through the door.

So, how exactly is this “seal” calculated in a low-level language? Let’s examine it step-by-step through the very algorithm I wrote and currently use in my project.

🛠️ The Heart of the Algorithm: perform_checksum

The ICMP protocol uses a 16-bit One’s Complement sum to ensure data integrity. This means you have to add up the entire packet in 16-bit (2-byte) chunks.

Here is what this mathematical operation looks like in the x64 Assembly realm:

Note: In this context, rdi represents the starting address of our data buffer, r14 is the starting offset, and r15 is the ending offset.


perform_checksum:

    ; RFC 1071 standard 16-bit one's complement sum algorithm

    xor eax, eax                ; Clear eax (Accumulator for the sum)

    mov r10, r14               ; r10 = Current offset

.loop:

    mov r11, r15               ; r11 = End offset

    sub r11, r10                ; Remaining bytes to process

    cmp r11, 1                  ; Check if only 1 byte is left (odd length)

    jle .last                       ; If <= 1 byte left, jump to final block

    

    movzx r12d, word [rdi + r10]; Read 2 bytes (1 word) zero-extended

    add eax, r12d              ; Add to accumulator

    add r10, 2                   ; Move offset forward by 2 bytes

    jmp .loop                   ; Repeat 

🧩 Part 1: Gathering the Pieces

We are essentially telling the CPU: “Fetch me a 16-bit (word) chunk from memory, add it to the eax register, and move to the next 2 bytes.” This loop runs smoothly until we hit the end of the packet.

⚖️ Part 2: The “Odd Byte” Paradox

If the total length of the packet is an odd number (e.g., 11 bytes), the very last byte won’t have a pair to form a 16-bit word. In this scenario, our algorithm elegantly dives into the .final block:

.last:
    je .final                   ; If exactly 1 byte left, handle it
    jmp .wrap                   ; If 0 bytes left, finalize calculation
.final:
    movzx r12d, byte [rdi + r10]; Read the last remaining single byte
    add eax, r12d               ; Add it to the accumulator

🔄 Part 3: The Wrap and Carry

Mathematically, this continuous addition might exceed a 16-bit boundary. This is where the most critical aspect of RFC 1071 comes into play: Adding the overflowing bits (the carry) back into the main sum.

.wrap:
    mov r11d, eax               ; Copy sum to r11d
    shr r11d, 16                ; Shift right to isolate the carry bits
    and eax, 0xFFFF             ; Mask eax to keep only the lower 16 bits
    add ax, r11w                ; Add the carry bits back to the sum
    adc ax, 0                   ; Add any final carry (add with carry)
    not ax                      ; One's complement (invert bits) for final checksum
    ret

🎯 Why not ax?

The not instruction at the very end is the final requirement of the One’s Complement logic. By inverting the bits (0 -> 1, 1 -> 0), we ensure that when the receiving end takes our packet and performs the exact same addition, the result will be 0xFFFF. If it is, the data is clean, and our seal is valid!

Conclusion

Writing this algorithm in Assembly is a fantastic exercise to truly understand how data is laid out in memory and how the CPU crunches bytes. Thanks to this algorithm, our custom ICMP packets can bypass kernel-level drops and roam the network like “official documents”.

When I integrate dynamic targeting and fileless execution (memfd_create) into my Distributed management architecture, this checksum engine will remain the most reliable gear in the machine.

Stay Coded!