Linux

From SizeCoding
Revision as of 10:53, 29 November 2021 by Byteobserver (talk | contribs) (Adding Sound)

Jump to: navigation, search

Introduction

This section of the sizecoding.org wiki is about creating very small (<=256byte) 32-bit X86 based Linux binaries (ELF format). For X86 related information, please check the main pages on this website, as a lot of the same tricks will also work with X86 Linux sizecoding.

A huge thanks goes out to byteobserver (Xorchitecture (2021) - https://www.pouet.net/prod.php?which=88982) as well as some early work by frag/fsqrt (Lintro (2012) - https://www.pouet.net/prod.php?which=58560) for all their research and hard work in producing tiny ELF binaries for linux.

Alternative methods and expectations

As the development of actual tiny ELF assembler executables on linux is still in its early days, with about a handful of actual <256 byte tiny ELF binary productions, lets look at some of the other methods of getting tiny intros onto linux.

1) Self-compilation tricks (using gcc or python): The executable executes a gcc (or python) compilation of the embedded code and executes it. This requires GCC and/or specific version of Python and potentially dynamically linked libraries to be installed.

2) Linking a piece of compiled C code to a stripped ELF header + dev/fb0 setup: This method has been used by The Orz to create several sizecoded procedural graphics entries. For more information about this check out https://github.com/grz0zrg/tinycelfgraphics

So what can we realistically expect from a 256 intro on Linux?

Expect about ~100 byte cost for the ELF header, setting up fb0 , some form of update loop, framecounter and using either mmap setup or copying via pwrite64 to get you started. If you want audio as well, the avaialble byte-budget will shrink even more.

Additionally, since we're dealing with 32-bit code, expect some instructions (especially when dealing with direct values) to take up bit more space.

Lets hope this wiki page will inspire and help people to get started and create newer, better Linux tiny intros ;-)

Setting up

Setting up your development platform for Linux development:

  • Suggested Distributions : Any X86-based Linux distribution that allows for execution of 32-bit executables.
  • Assembler: NASM (or any other linux compatible 32-bit X86 assembler)

Furthermore, it is important that the user has access to the dev/fbo framebuffer. This can be achieved by launching a virtual (fullscreen) console using CTRL-F3/F4 in most distributions, login and making sure the user has access to the video group. If this is not the case for some reason, you can add your user to the videogroup like so:

sudo usermod -a -G video username

Note: Make sure your binary is executable for everyone using the chmod 777 command after compilation :D

System Calls

Interaction with the Linux OS is mostly done via int 0x80 system calls. This usually includes dealing with opening files/framebuffer/audio and handling timers.

A full list of system calls and their expected register arguments is available at: https://syscalls32.paolostivanin.com/


ELF Header Information

Like a 32-bit windows executable, a 32-bit binary for linux comes with a pretty hefty ELF header.

  org     0x00010000  
  ehdr:                                                 ; Elf32_Ehdr
                db      0x7F, "ELF", 1, 1, 1, 0         ;   e_ident
        times 8 db      0
                dw      2                               ;   e_type
                dw      3                               ;   e_machine
                dd      1                               ;   e_version
                dd      _start                          ;   e_entry
                dd      phdr - $$                       ;   e_phoff
                dd      0                               ;   e_shoff
                dd      0                               ;   e_flags
                dw      ehdrsize                        ;   e_ehsize
                dw      phdrsize                        ;   e_phentsize
                dw      1                               ;   e_phnum
                dw      0                               ;   e_shentsize
                dw      0                               ;   e_shnum
                dw      0                               ;   e_shstrndx
  
  
  phdr:                                                 ; Elf32_Phdr
                dd      1                               ;   p_type
                dd      0                               ;   p_offset
                dd      $$                              ;   p_vaddr
                dd      $$                              ;   p_paddr
                dd      filesize                        ;   p_filesz
                dd      filesize                        ;   p_memsz
                dd      5                               ;   p_flags
                dd      0x1000                          ;   p_align
  
  
  _start:
  
  ; your program here

Luckily some parts of the ELF header can be repurposed and used to store some data and code. There is quite an extensive journey about some header optimisations available at http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html for those that are interested.

After merging the ehdr and phdr parts and changing your entry point, we can get the header down to about the 48 bytes range with a nifty /dev/fb0 string inserted which we'll be able to use later for setting up the framebuffer.

org $00010000
    db $7F,"ELF"    ; e_ident
    dd 1            ; p_type
    dd 0            ; p_offset
    dd $$           ; p_vaddr
    dw 2            ; e_type, p_paddr
    dw 3            ; e_machine
    dd entry        ; e_version, p_filesz
    dd entry        ; e_entry, p_memsz
    dd 4            ; e_phoff, p_flags
fname:
    db "/dev/fb0",0 ; e_shoff, p_align, e_flags, e_ehsize
entry:
    ; this next instruction overlaps with a critical part of the elf header
    ; it needs to look like XX YY YY YY YY where YYYYYYYY=fname
    ; so you can change the register to something else or use push
    ; but the four byte pointer to fname cannot be changed.
    mov ebx,fname   ; e_phentsize, e_phnum

    ; e_shentsize, e_shnum, e_shstrndx are below but we can put whatever code/bytes we want there
    mov cl,1 ; set read/write mode (1 or inc ecx is sufficient for pcopy method, read/write (2) is needed for mmap)
    mov al,5 ; 5 = open syscall
    int 0x80 ; open /dev/fb0 = 3

Displaying Graphics

Graphics can be produced by using and accessing the linux /dev/fb0 framebuffer. First the framebuffer has to be opened at the intro initialisation, and can then be used to either copy a piece of memory over using the pwrite64 syscall (0xb5) or using map a piece of memory directly to the framebuffer using syscall mmap.

Setting up the framebuffer

The dev/fb0 framebuffer can best be accessed from a virtual console (ctrl-f3/f4 in most distributions).

To make sure your dev/fb0 framebuffer is set up properly, you can apt get the fbset tool and display and/or alter the framebuffer resolution as most intros will make an assumption about the resolution of your framebuffer.

To test access to dev/fb0 framebuffer, you can use the following cat command:

cat /dev/urandom > /dev/fb0

which should produce random noise to the screen (ignorning the out of memory error that is expected from cat)

Alternatively, if you don't like to use the virtual console during the development of your intro, or the framebuffer setup is somehow giving you problems, there is a smalle fbe.c / fbe binary supplied with the xorchitecture intro by byteobserver that has a SDL windows mmap'ed to tmp/fb0 which you can launch alongside your intro (don't forget to redirect the dev/fb0 pointer in your intro to tmp/fb0).

Getting something on screen

First we need to fill up our local memorybuffer with pixeldata, so lets start doing that using the old AND pattern

    mov ecx,width*height
setpixels:
    mov ebx,width
    mov eax,ecx
    cdq
    div ebx               ; edx = x-coord , eax=y coord
    and eax,edx           ; xor pattern

    mov [esp+ecx*4+0],al ; b
    mov [esp+ecx*4+1],al ; g
    mov [esp+ecx*4+2],al ; r
    loop setpixels

Once your buffer (in this case marked by the esp stackpointer) is all filled up with pixeldata, you can copy it to the /dev/fb0 using the pwrite64 syscall like so:

   ; copy memorybuffer to screen (/dev/fb0) using the pwrite64 syscall
   mov ecx,esp  ; buffer ptr
   mov edx,ebp  ; screen size
   xor esi,esi  ; seek to beginning of screen
   xor edi,edi  
   mov ebx,3    ; fd of framebuffer
   mov eax,0xb5 ; pwrite64
   int 0x80     ; pwrite64 to framebuffer

As an alternative to using pwrite64 you can also mmap )check out intros by The Orz for an example with mmap) to map a piece of memory to dev/fb0. However using mmap because you can get tearing, and you can't realistically do feedback effects without implementing a second buffer, as reading from the mmaped memory is VERY slow.

 ;mmap(NULL, buflen, PROT_WRITE, MAP_SHARED, fd, 0);
 push edx	      ;edx = 0
 push eax	      ;fd
 push byte 1	      ;MAP_SHARED
 mov al, 90
 push eax	      ;we need to set second bit for PROT_WRITE, 90 = 01011010 and setting PROT_WRITE automatically set PROT_READ
 push width*height*4  ;buffer size
 push edx	      ;NULL
 mov ebx, esp	  ;args pointer
 int 80h		      ;eax <- buffer pointer

Example Framework

Munching squares

So when we put all the above together, we can get a minimal kind of framework running that will look something like this munching square example provided to us by byteobserver:

; byte.observer's munching square linux example
; assembles with nasm -fbin munch.asm -o munch
width equ 1024
height equ 768

bits 32
org $00010000
    db $7F,"ELF" ; e_ident
    dd 1         ; p_type
    dd 0         ; p_offset
    dd $$        ; p_vaddr
    dw 2         ; e_type, p_paddr
    dw 3         ; e_machine
    dd entry     ; e_version, p_filesz
    dd entry     ; e_entry, p_memsz
    dd 4         ; e_phoff, p_flags
fname:
    db "/dev/fb0",0 ; e_shoff, p_align, e_flags, e_ehsize
entry:
    mov ebx,fname     ; e_phentsize, e_phnum
    inc ecx           ; = 1 = O_WRONLY
    mov al,5          ; 5 = open syscall
    int 0x80          ; open /dev/fb0 = 3

    mov ebp,width*height*4  ; ebp = screen size
    sub esp,ebp             ; make room on the stack for the video memory

mainloop:
    mov ecx,ebp    ; init pixel index
    shr ecx,2      ; divide by bits per pixel
    inc edi        ; frame counter

setpixels:
    mov ebx,width
    mov eax,ecx
    cdq
    div ebx               ; edx = x-coord , eax=y coord
    xor eax,edx           ; xor pattern
    add eax,edi           ; make it munch
    mov [esp+ecx*4+0],al ; b
    mov [esp+ecx*4+1],al ; g
    mov [esp+ecx*4+2],al ; r
    mov [esp+ecx*4+3],al ; a
    loop setpixels

    ; dump the whole thing to the screen using pwrite64 syscall
    mov ecx,esp  ; buffer ptr
    mov edx,ebp  ; screen size
    push edi     ; save frame counter
    xor esi,esi  ; seek to beginning of screen
    xor edi,edi  
    mov ebx,3    ; fd of framebuffer
    mov eax,0xb5 ; pwrite64
    int 0x80     ; pwrite64 to framebuffer
    pop edi

    jmp mainloop

Adding Sound

It is possible to output digital audio by binding the aplay command into your intro. aplay is available on almost all Linux setups. You can test it by running the following, which should produce some white noise:

   $ aplay /dev/urandom

By default, aplay will play 8-bit mono audio at 8000Hz, but the format can be changed easily by specifying arguments. If no filename is passed to aplay, it will read audio data from standard input, which we will use to our advantage.

To use aplay in the context of an intro, there is a bit of setup work involved. One method uses 4 syscalls to start aplay as a child process, so that audio data can then be simply written to the appropriate file descriptor to send it to the speakers.

Method 1: pipe,fork,dup2,execve

This approach is as follows. First, we create a pipe using the pipe syscall (0x2a). This syscall takes a pointer to an array of 2 ints, which it fills with the file descriptors of the two ends of the pipe. In the following, we simply overwrite the top of the stack with the file descriptors. The first file descriptor is the read only/output side and the second is the write only/input side.

    mov ebx,esp
    xor eax,eax
    mov al,0x2a ; pipe
    int 0x80

Next, we fork the process (syscall 0x2). The child process will be used to exec aplay. If you do this right after creating the pipe, you don't need to zero eax before setting it to 2, because eax should already be zero (indicating that the pipe was created successfully).

    mov al,2 ; fork
    int 0x80 ; returns eax=0 in child process and eax=1 in parent process
    dec eax
    js child
    
parent:
    ; code for the rest of your intro goes here

Now, we bind the standard input of the child (which aplay receives audio data from) to the output of the pipe, using the dup2 syscall (0x3f).

child:
    xor eax,eax
    mov al,0x3f ; dup2
    pop ebx ; get file descriptor of output side of pipe
    xor ecx,ecx ; stdin is file descriptor 0
    int 0x80

The following is optional. aplay will usually print a message saying some parameters of the stream that it is playing. If this interferes with your intro, you can close stderr to stop it from printing, with the close syscall (0x6).

    xor eax,eax
    mov al,6 ; close
    mov bl,2 ; stderr
    int 0x80

Finally, we just have to execute aplay with the execve syscall (0xb). Constructing the arguments to this syscall takes a bit of work. Here we are doing it in a simple way which is a bit wasteful. You can save some bytes by constructing the arguments array on the stack.

    xor eax,eax ; shouldn't be necessary given the above
    mov al,0xb ; execve
    mov ebx,aplay ; pointer to aplay filename
    mov ecx,args ; pointer to null terminated array of arguments
    lea edx,[esp+12] ; get pointer to environ. this assumes nothing has been
                     ; pushed/popped yet, and there are no args passed to your program.
                     ; see here: http://www.mindfruit.co.uk/2012/01/initial-stack-reading-process-arguments.html
                     ; (we are trying to get the beginning of "Environment pointers")
    int 0x80 ; nothing after this point will be executed

args:
    dd aplay+5
    dd 0
aplay:
    db "/bin/aplay", 0

Now everything should be set up, and we can start writing audio data with the write syscall (0x4). The following will produce a buzzing sound.

parent:

audioloop:
    xor eax,eax
    mov al,4 ; write
    mov ebx,[esp+4] ; input side of pipe created earlier
    mov ecx,esp ; pointer to audio data
    mov edx,1 ; length of audio data (in bytes)
    int 0x80
    inc byte [esp] ; increment the sample
    jmp audioloop

Putting it all together

Combining the above snippets and optimizing a bit, we can arrive at the following 118 byte program which plays a familiar bytebeat track.

bits 32
org $00010000
    db $7F,"ELF" ; e_ident
    dd 1 ; p_type
    dd 0 ; p_offset
    dd $$ ; p_vaddr
    dw 2 ; e_type, p_paddr
    dw 3 ; e_machine
    dd entry ; e_version, p_filesz
    dd entry ; e_entry, p_memsz
    dd 4
entry:
    mov al,0x2a ; pipe
    mov ebx,esp ; store output of pipe on stack
    int 0x80
    lea edx,[ebx+12] ; environ pointer, to be used later
    mov ebp, entry ; e_phentsize, this must be here for the ELF header
    mov al,2 ; fork
    int 0x80 ; returns eax=0 in child process and eax=childpid in parent process
    dec eax
    js child

audioloop:
    pusha
    xor eax,eax
    mov al,4 ; write
    mov ebx,eax ; input side of pipe created earlier
    lea ecx,[edx-12] ; pointer to audio data
    xor edx,edx
    inc edx ; set size to one byte
    int 0x80
    popa

    ; some bytebeat
    inc esi
    mov eax,esi
    pop ebx
    mov ebx,eax
    shr eax,5
    or ebx,eax
    shr eax,5
    and ebx,eax
    push ebx

    jmp audioloop

child:
    inc eax
    mov al,0x3f ; dup2
    pop ebx ; get file descriptor of output side of pipe
    ; ecx is already zero
    int 0x80

    mov al,0xb ; execve
    lea ebx,[ebp+((aplay+5-entry)&0xff)] ; pointer to "aplay"
    push 0 ; null terminator for args list
    push ebx ; pointer to "aplay" aka argv[0]
    mov ecx,esp ; pointer to null terminated array of arguments
    mov bl,(aplay-$$)&0xff ; pointer to "/bin/aplay"
    ; edx is already set up as the environ pointer
    int 0x80 ; nothing after this point will be executed

aplay:
    db "/bin/aplay"
    ; no null terminator is necessary because memory past the end of the file is always zero

Can you make this smaller? Feel free to edit it!

Additional Resources

Larger productions (1k and 4k intros)

Creating 1k and 4k intros on linux usually requires a different setup, for more information on this check out the following links: