MUZAK58

From SizeCoding
Jump to: navigation, search

MUZAK58 was created by wiRe/Napalm and is 58 bytes in size. You can watch the video here. It won 4th place at the Lovebyte 2024 demoscene party and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by it's inventor. This page describes how this player works and how it can be adopted for other releases. Feel free to use these ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Commercial use is not allowed.

Before you continue, make sure to read Bytebeat and Output#Producing_sound for the basics.

COVOX LPT-DAC

The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data output lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion to an analog signal, so it was very cheap to build such a hardware device on your own at that time. Playing back an 8-bit sample, such as the output of a bytebeat algorithm, through COVOX LPT DAC is a very simple task. Assuming the next sample value is in register AL, then this is all you need to do:

            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device

HINT: It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.

HINT: It is also possible to play a bytebeat over the PC speaker in lower quality, as described here: Output#PC_Speaker_variant

But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate quite well. Also, the bytebeat algorithm needs a time counter as input that reflects the current sample number. Therefore we need a good time source.

Time Source

To play data through the COVOX LPTDAC, we need a fairly accurate timer. A typical sample rate would be 8 kHz, but higher values can also be used. Lower values may also work in some special cases, but then very lo-fi. There are several ways to get such a timer:

  • Timer Interrupt
  • Poll BIOS Counter
  • Poll PIT Counter
  • Alternative Options

Timer Interrupt

As described here: Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt. While this is the most accurate way to drive the COVOX, it is probably also the most expensive. Setting up the new interrupt handler (ignoring even the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between the handler and the non-interrupt code. All of this will cost bytes. In most cases, it will take less size to poll the timer instead, as in all the other variants described next. But it must be also be clear that the polling approach makes it necessary to perform this task at a higher frequency than the actual sampling rate, i.e. 8kHz. This requires the polling to be done in an inner loop, e.g. after every pixel update, which can eat up quite a bit of performance.

Poll BIOS Counter

The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically, INT8 runs at a frequency of 18.2 Hz. On each call the default interrupt handler increments this value by 1. Reusing this default handler avoids the cost of writing a custom handler just to implement the counter incrementing logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. As soon as its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:

            mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)
            out     40h, al                   ;E640       ;
            salc                              ;D6         ;  AL := 0 (if CF=0)
            out     40h, al                   ;E640       ;

            ; ...

  suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0
  _tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?
            jz      ntick                     ;74xx       ;  no -> skip audio
            mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)

            inc     ebp                       ;6645       ;increment 32-bit timer counter

            ; ... set AL to next audio sample based on EBP ...

            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device

  ntick:
            ; ...

            jmp     short suplp

HINT: Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever suits you better.

Poll PIT Counter

(...more soon...)

Programmable Interval Timer


This solution may result in the shortest code. A disadvantage is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.

Alternative Options

Another way to get an accurate time is to read the processor's timestamp counter using the RDTSC instruction.

The Bytebeat

Bytebeat is simply an 8-bit uncompressed audio wave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function f(t), where t is the time represented by the number of the sample, which is also equal to the byte offset within the stream. It will start generating the first sample for t=0 and will play the sample f(8000) after exactly 1 second if the sampling rate is 8kHz. Since this is actually a softsynth (music synthesis done by software), in theory any sound or music can be approximated in this way. There are no limits except the increasing complexity of the resulting function.

In general, any bytebeat algorithm can be implemented to generate the next sample to be written to the COVOX LPT1. But in terms of size, a bytebeat algorithm is better suited if it's formula can be implemented in as few bytes as possible. MUZAK58 is to a certain extent a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the music played comes from sequence tables stored in memory. Changing these words will result in completely new music being played. If you use use more words for this table, the song becomes more complex so that it does not repeat itself so quickly. The sequence table of this reference example is 10 bytes long and looks like this:

  seqtbl:   dw      0x1413
            dw      0x6C66
            dw      0x2242
            dw      0x6495
            dw      0x4484

The method used to achieve a size-optimized, yet flexible bytebeat is described next.

Music Sequencer

As you can read in many bytebeat tutorials, like Steady_On_Tim by Gasman or in the paper published by Viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function o(t), like sawtooth, square, triangle or sine waveforms, by multiplying the time parameter t by a scale factor p: f(t) = o(t*p). This multiplication factor modulates the pitch. If we then use a sequence table s(t) to replace p, which will change the pitch of our oscillator over time, we can already play a simple melody using this formula: f(t) = o(t*s(t)).

Accordingly, we implement a single pitch-modulated oscillator with sawtooth waveform:

  (t*[1,2,4,8,16,8,4,2][(t>>11)%8])&255

(listen to this bytebeat here)

To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as it is defined by a sequence table. This example demonstrates a sequence of 8 steps, where S=8 specifies the number of steps. Each step changes the pitch of the resulting sawtooth waveform.

Replacing the trailing "&255" (implicit for a bytebeat) by "&128" would change the sawtooth waveform to a square wave function:

  (t*[1,2,4,8,16,8,4,2][(t>>11)%8])&128

(listen to this bytebeat here)

Other waveforms are also possible. Here we use the sine function:

  sin(t*[1,2,4,8,16,8,4,2][(t>>11)%8]/14)*127+127

(listen to this bytebeat here)

Or distortion-like effects can be applied, as shown here using the XOR operator in the last step:

  (t*[1,2,4,8,16,8,4,2][(t>>11)%8])^64

(listen to this bytebeat here)

HINT: Instead of using only one modulated oscillator or one sequence, also 2 or more can be used and combined, e.g.: f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2

So far, these are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with a sequence table containing as many steps S as we need for our composition, or at least as many as we can handle due to size constraints. The more steps S we use, the longer the song will last before it repeats. The larger the value of each sequence step can be, with a value range limited by log2(M) bits per step, the larger the range of notes we can end up using. Both parameters S and M will define the final byte size of our sequence table.

Cascaded Sequences

The problem we will face with this approach in sizecoding products is, that such a sequence table will grow quickly and end up consuming quite a few bytes. Our reference example MUZAK58 uses 10 bytes for all it's song data. Using our knowledge at this point, we would be able to divide these 10 bytes into a sequence of 40 steps (S = 40), as long as the limited range per step given by 4 bits (M = 2^4 = 16) is sufficient for the music composition we have in mind. 40 steps is not less, but the severely limited range of less than 1 octave will limit us to what we would most likely end up calling a children's song. Instead, the reference muzak sounds like it is made up of at least a multiple of 32 steps before it starts to repeat. And the octave range does not seem to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table like this?

The trick wiRe discovered here is to cascade multiple sequencers and combine all their outputs into a single sequence with a much longer sequence duration (before repetition) and a wider pitch range per sequence step: s(t) = (s0(t) * s1(t) * s2(t) * ...) / C

But this limits the composer's freedom, you might think. This is true! But you will see that the results you get are not as bad as you might think at first, in fact the resulting limitation can even give new impulses to creativity; something we already know as the sizecoding effect.

Here is an attempt to visualize how such an cascaded sequence evolves over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequence step counter. O is the time divider to derive the step count stepcnt = t / O with O = log2(ticks_per_step) to avoid any integer division.

  +--------+--------------+---------------------------------------------+
  | stpcnt | (t>>O)       | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... |
  +--------+--------------+---------------------------------------------+
  | seq0ix | (t>>(O+0))%S | 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ... |
  | seq1ix | (t>>(O+1))%S | 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 ... |
  | seq2ix | (t>>(O+2))%S | 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 ... |
  | seq3ix | (t>>(O+3))%S | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 ... |
  | seq4ix | (t>>(O+4))%S | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 ... |
  +--------+--------------+---------------------------------------------+
                                                               with S=4

In combination with our oscillator function, the whole bytebeat will look like this: f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )

Final Bytebeat Implementation

The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple sawtooth oscillator, we have an easy game with our oscillator function being as simple as o(t) = t. As we found out, the function f(t) is then only comprised of N+1 terms, all multiplied together like this: f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C. On each loop iteration of the final bytebeat player, the current sequencer sN(t) is evaluated by calculating the current sequencer index and looking it up in the sequencer table. The value stored there for this step is then multiplied towards the total result of f(t). If we keep M low, then even a 16-bit multiplication is sufficient. The final scaling factor C depends on the range of the values derived from the sequencer functions sN(t). Scaling is done as a shift-right operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift-right by 8.

The reference implementation uses a total of 5 cascaded sequencers: N=5. The table of each sequencer was chosen to store 4 sequence steps: S=4. Which sequencer step to index is then based on 2 bits of the parameter t. The shortest sequencer step time for this song was chosen to be 2^10 samples, which gives us O=10. This means that the lookup index for each sequencer i with 0 <= i < N is derived by (t>>(O+i))%S. Each step value is limited by M=16. Putting all this together, we can now start composing a song in this way:

  static constexpr unsigned O = 10;
  static constexpr unsigned N = 5;
  static constexpr unsigned S = 4;
  static constexpr unsigned C = 256;

  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };

  uint8_t get_next_sample( uint16_t t ) {
    for( unsigned i = 0; i < N; i++ ) t *= seqtbl[i][(t>>(O+i))%S];
    return static_cast<uint8_t>(t / C);
  }

(listen to this bytebeat here)

The Sourcecode

With all these parameters carefully chosen, the final bytebeat implementation and sequence tables will be very small. Here is the commented source code of MUZAK58:

           ;-----------------------------------
           ; MUZAK58 by wiRe/NpM
           ;-----------------------------------
            section .text
            org     100h

           ;--------------------------------- ;---------- ;muzak sequence table
  seqtbl:   dw      0x1413                    ;1314       ;  t * [3,1,4,1][3&t>>10]       ;! 1314       adc dx,[si]
            dw      0x6C66                    ;666C       ;    * [6,6,12,6][3&t>>11]      ;! 666C       o32 insb
            dw      0x2242                    ;4222       ;    * [2,4,2,2][3&t>>12]       ;! 42         inc dx
            dw      0x6495                    ;9564       ;    * [5,9,4,6][3&t>>13]       ;! 22956484   and dl,[di-0x7b9c]
            dw      0x4484                    ;8444       ;    * [4,8,4,4][3&t>>14] >> 8  ;! 44         inc sp

            mov     al, 0b00010000            ;B010       ;write 8253/8254 PIT command/mode register: resets PIT channel #0
            out     43h, al                   ;E643       ;  [7:6] channel #0, [5:4] LSB only, [3:1] mode0 (one-shot), [0] 16-bit binary

           ;--------------------------------- ;---------- ;present next audio sample (DX:BX = 32-bit sample counter)
  bbeat:    add     al, 149                   ;04xx       ;  calculate new timer period (AL = 42..148)
            out     40h, al                   ;E640       ;  rearm timer

            inc     bx                        ;43         ;  increment 16-bit timer counter

            pusha                             ;60         ;  store all registers
           ;mov     si, seqtbl                ;BExxxx     ;  load address of sequence table into SI (here SI already points to seqtbl by default)
            mov     dx, bx                    ;89DA       ;  load start value into DX
            mov     cl, 5                     ;B1xx       ;  init index counter inside CX (CH must be zero already!)
  bbeat_lp: push    cx                        ;51         ;  store CX counter
            mov     cl, 01100b                ;B1xx       ;  get bit sequence from time into CL
            and     cl, bh                    ;20F9       ;    CL := offset to 1 out of 4 entries
            lodsw                             ;AD         ;  load next sequence table entry (AX := DS:[SI]; SI := SI + 2)
            ror     ax, cl                    ;D3C8       ;  select sequence entry at bit-offset 0, 4, 8 or 12
            and     ax, 01111b                ;83E00F     ;  each sequence entry is 4 bits only (AX &= 15)
            mul     dx                        ;F7E2       ;  multiply (DX:AX := AX ∗ DX)
            xchg    ax, dx                    ;92         ;    DX := updated 16-bit sample
            pop     cx                        ;59         ;  restore CX counter
            shr     bx, 1                     ;D1ED       ;  get next bit sequence from time
            loop    bbeat_lp                  ;E2xx       ;  loop until all bits are out

            mov     al, dh                    ;88F0       ;  get sample data into AL
            mov     dx, 0378h                 ;BA7803     ;  load LPT1 port address into DX
            out     dx, al                    ;EE         ;  send 8-bit sample data to COVOX device
            popa                              ;61         ;  restore all registers (especially BX, CX, DX, SI)

  main_lp: ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)
            in      al, 40h                   ;E440       ;  read low-byte
            cmp     al, 148                   ;3Cxx       ;  did timer counter overflowed to 149..0FFh?
            jo      short bbeat               ;71xx       ;    yes -> play

           ;... the place for your intro

            jmp     short main_lp             ;75xx       ;  loop forever