<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>http://www.sizecoding.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=WiRe</id>
		<title>SizeCoding - User contributions [en]</title>
		<link rel="self" type="application/atom+xml" href="http://www.sizecoding.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=WiRe"/>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/wiki/Special:Contributions/WiRe"/>
		<updated>2026-05-03T05:37:23Z</updated>
		<subtitle>User contributions</subtitle>
		<generator>MediaWiki 1.27.0</generator>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1581</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1581"/>
				<updated>2024-06-12T08:06:19Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by [https://www.pouet.net/user.php?who=106446 wiRe/Napalm] and is 58 bytes in size. You can watch the video [https://www.youtube.com/watch?v=AX5OYQzzi1g here]. It won 4th place at the Lovebyte 2024 demoscene party and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by it's inventor. This page describes how this player works and how it can be adopted for other releases. Feel free to use these ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Commercial use is not allowed.&lt;br /&gt;
&lt;br /&gt;
Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data output lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion to an analog signal, so it was very cheap to build such a hardware device on your own at that time. Playing back an 8-bit sample, such as the output of a bytebeat algorithm, through COVOX LPT DAC is a very simple task. Assuming the next sample value is in register AL, then this is all you need to do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to play a bytebeat over the PC speaker in lower quality, as described here: [[Output#PC_Speaker_variant]]&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate quite well. Also, the bytebeat algorithm needs a time counter as input that reflects the current sample number. Therefore we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
To play data through the COVOX LPTDAC, we need a fairly accurate timer. A typical sample rate would be 8 kHz, but higher values can also be used. Lower values may also work in some special cases, but then very lo-fi. There are several ways to get such a timer:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter&lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is probably also the most expensive. Setting up the new interrupt handler (ignoring even the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between the handler and the non-interrupt code. All of this will cost bytes. In most cases, it will take less size to poll the timer instead, as in all the other variants described next. But it must be also be clear that the polling approach makes it necessary to perform this task at a higher frequency than the actual sampling rate, i.e. 8kHz. This requires the polling to be done in an inner loop, e.g. after every pixel update, which can eat up quite a bit of performance.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically, INT8 runs at a frequency of 18.2 Hz. On each call the default interrupt handler increments this value by 1. Reusing this default handler avoids the cost of writing a custom handler just to implement the counter incrementing logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. As soon as its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
            salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
  suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
  _tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
            jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
            mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
            inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
  ntick:&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
            jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever suits you better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
[http://wiki.osdev.org/Programmable_Interval_Timer Programmable Interval Timer]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This solution may result in the shortest code. A disadvantage is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another way to get an accurate time is to read the processor's timestamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply an 8-bit uncompressed audio wave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the time represented by the number of the sample, which is also equal to the byte offset within the stream. It will start generating the first sample for ''t=0'' and will play the sample ''f(8000)'' after exactly 1 second if the sampling rate is 8kHz. Since this is actually a softsynth (music synthesis done by software), in theory any sound or music can be approximated in this way. There are no limits except the increasing complexity of the resulting function.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented to generate the next sample to be written to the COVOX LPT1. But in terms of size, a bytebeat algorithm is better suited if it's formula can be implemented in as few bytes as possible. [https://www.pouet.net/prod.php?which=96071 MUZAK58] is to a certain extent a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the music played comes from sequence tables stored in memory. Changing these words will result in completely new music being played. If you use more words for this table, the song becomes more complex so that it does not repeat itself so quickly. The sequence table of this reference example is 10 bytes long and looks like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
  seqtbl:   dw      0x1413&lt;br /&gt;
            dw      0x6C66&lt;br /&gt;
            dw      0x2242&lt;br /&gt;
            dw      0x6495&lt;br /&gt;
            dw      0x4484&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The method used to achieve a size-optimized, yet flexible bytebeat is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequencer ==&lt;br /&gt;
As you can read in many bytebeat tutorials, like [[Steady_On_Tim]] by Gasman or in the paper published by Viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function ''o(t)'', like sawtooth, square, triangle or sine waveforms, by multiplying the time parameter ''t'' by a scale factor ''p'': ''f(t) = o(t*p)''. This multiplication factor modulates the pitch. If we then use a sequence table ''s(t)'' to replace ''p'', which will change the pitch of our oscillator over time, we can already play a simple melody using this formula: ''f(t) = o(t*s(t))''.&lt;br /&gt;
&lt;br /&gt;
Accordingly, we implement a single pitch-modulated oscillator with sawtooth waveform:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;255&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9473119236434ffff34df800 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as it is defined by a sequence table. This example demonstrates a sequence of 8 steps, where ''S=8'' specifies the number of steps. Each step changes the pitch of the resulting sawtooth waveform.&lt;br /&gt;
&lt;br /&gt;
Replacing the trailing &amp;quot;&amp;amp;255&amp;quot; (implicit for a bytebeat) by &amp;quot;&amp;amp;128&amp;quot; would change the sawtooth waveform to a square wave function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;128&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9dc1d9b391be7fffcfb76000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Other waveforms are also possible. Here we use the sine function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  sin(t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8]/14)*127+127&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d000001002f0000000000000000399a4a1a8bae05d329e28520c901366398da262860ce3ea49cc63383ad4015395d56ced153c2b5712a75c831dca7c583fffcb53000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Or distortion-like effects can be applied, as shown here using the XOR operator in the last step:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])^64&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100240000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024f067719de4f113fffded5400 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of using only one modulated oscillator or one sequence, also 2 or more can be used and combined, e.g.: ''f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2''&lt;br /&gt;
&lt;br /&gt;
So far, these are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with a sequence table containing as many steps ''S'' as we need for our composition, or at least as many as we can handle due to size constraints. The more steps ''S'' we use, the longer the song will last before it repeats. The larger the value of each sequence step can be, with a value range limited by ''log2(M)'' bits per step, the larger the range of notes we can end up using. Both parameters ''S'' and ''M'' will define the final byte size of our sequence table.&lt;br /&gt;
&lt;br /&gt;
== Cascaded Sequences ==&lt;br /&gt;
&lt;br /&gt;
The problem we will face with this approach in sizecoding products is, that such a sequence table will grow quickly and end up consuming quite a few bytes. Our reference example [https://www.pouet.net/prod.php?which=96071 MUZAK58] uses 10 bytes for all it's song data. Using our knowledge at this point, we would be able to divide these 10 bytes into a sequence of 40 steps (''S = 40''), as long as the limited range per step given by 4 bits (''M = 2^4 = 16'') is sufficient for the music composition we have in mind. 40 steps is not less, but the severely limited range of less than 1 octave will limit us to what we would most likely end up calling a children's song. Instead, the reference muzak sounds like it is made up of at least a multiple of 32 steps before it starts to repeat. And the octave range does not seem to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table like this?&lt;br /&gt;
&lt;br /&gt;
The trick wiRe discovered here is to cascade multiple sequencers and combine all their outputs into a single sequence with a much longer sequence duration (before repetition) and a wider pitch range per sequence step: ''s(t) = (s0(t) * s1(t) * s2(t) * ...) / C''&lt;br /&gt;
&lt;br /&gt;
But this limits the composer's freedom, you might think. This is true! But you will see that the results you get are not as bad as you might think at first, in fact the resulting limitation can even give new impulses to creativity; something we already know as the sizecoding effect.&lt;br /&gt;
&lt;br /&gt;
Here is an attempt to visualize how such an cascaded sequence evolves over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequence step counter. ''O'' is the time divider to derive the step count ''stepcnt = t / O'' with ''O = log2(ticks_per_step)'' to avoid any integer division.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;text&amp;quot;&amp;gt;&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | stpcnt | (t&amp;gt;&amp;gt;O)       | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | seq0ix | (t&amp;gt;&amp;gt;(O+0))%S | 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ... |&lt;br /&gt;
  | seq1ix | (t&amp;gt;&amp;gt;(O+1))%S | 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 ... |&lt;br /&gt;
  | seq2ix | (t&amp;gt;&amp;gt;(O+2))%S | 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 ... |&lt;br /&gt;
  | seq3ix | (t&amp;gt;&amp;gt;(O+3))%S | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 ... |&lt;br /&gt;
  | seq4ix | (t&amp;gt;&amp;gt;(O+4))%S | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
                                                               with S=4&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In combination with our oscillator function, the whole bytebeat will look like this: ''f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )''&lt;br /&gt;
&lt;br /&gt;
== Final Bytebeat Implementation ==&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple sawtooth oscillator, we have an easy game with our oscillator function being as simple as ''o(t) = t''. As we found out, the function ''f(t)'' is then only comprised of ''N+1'' terms, all multiplied together like this: ''f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C''. On each loop iteration of the final bytebeat player, the current sequencer ''sN(t)'' is evaluated by calculating the current sequencer index and looking it up in the sequencer table. The value stored there for this step is then multiplied towards the total result of ''f(t)''. If we keep ''M'' low, then even a 16-bit multiplication is sufficient. The final scaling factor ''C'' depends on the range of the values derived from the sequencer functions ''sN(t)''. Scaling is done as a shift-right operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift-right by 8.&lt;br /&gt;
&lt;br /&gt;
The reference implementation uses a total of 5 cascaded sequencers: ''N=5''. The table of each sequencer was chosen to store 4 sequence steps: ''S=4''. Which sequencer step to index is then based on 2 bits of the parameter ''t''. The shortest sequencer step time for this song was chosen to be ''2^10'' samples, which gives us ''O=10''. This means that the lookup index for each sequencer ''i'' with ''0 &amp;lt;= i &amp;lt; N'' is derived by ''(t&amp;gt;&amp;gt;(O+i))%S''. Each step value is limited by ''M=16''. Putting all this together, we can now start composing a song in this way:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  static constexpr unsigned O = 10;&lt;br /&gt;
  static constexpr unsigned N = 5;&lt;br /&gt;
  static constexpr unsigned S = 4;&lt;br /&gt;
  static constexpr unsigned C = 256;&lt;br /&gt;
&lt;br /&gt;
  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };&lt;br /&gt;
&lt;br /&gt;
  uint8_t get_next_sample( uint16_t t ) {&lt;br /&gt;
    for( unsigned i = 0; i &amp;lt; N; i++ ) t *= seqtbl[i][(t&amp;gt;&amp;gt;(O+i))%S];&lt;br /&gt;
    return static_cast&amp;lt;uint8_t&amp;gt;(t / C);&lt;br /&gt;
  }&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
Or the same thing written in Javascript:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
t&lt;br /&gt;
* [3,1,4,1][3&amp;amp;t&amp;gt;&amp;gt;10]&lt;br /&gt;
* [6,6,12,6][3&amp;amp;t&amp;gt;&amp;gt;11]&lt;br /&gt;
* [2,4,2,2][3&amp;amp;t&amp;gt;&amp;gt;12]&lt;br /&gt;
* [5,9,4,6][3&amp;amp;t&amp;gt;&amp;gt;13]&lt;br /&gt;
* [4,8,4,4][3&amp;amp;t&amp;gt;&amp;gt;14] &amp;gt;&amp;gt; 8&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;v=circles&amp;amp;bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
= The Sourcecode =&lt;br /&gt;
&lt;br /&gt;
With all these parameters carefully chosen, the final bytebeat implementation and sequence tables will be very small. Here is the commented source code of [https://www.pouet.net/prod.php?which=96071 MUZAK58]:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
           ; MUZAK58 by wiRe/NpM&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
            section .text&lt;br /&gt;
            org     100h&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;muzak sequence table&lt;br /&gt;
  seqtbl:   dw      0x1413                    ;1314       ;  t * [3,1,4,1][3&amp;amp;t&amp;gt;&amp;gt;10]       ;! 1314       adc dx,[si]&lt;br /&gt;
            dw      0x6C66                    ;666C       ;    * [6,6,12,6][3&amp;amp;t&amp;gt;&amp;gt;11]      ;! 666C       o32 insb&lt;br /&gt;
            dw      0x2242                    ;4222       ;    * [2,4,2,2][3&amp;amp;t&amp;gt;&amp;gt;12]       ;! 42         inc dx&lt;br /&gt;
            dw      0x6495                    ;9564       ;    * [5,9,4,6][3&amp;amp;t&amp;gt;&amp;gt;13]       ;! 22956484   and dl,[di-0x7b9c]&lt;br /&gt;
            dw      0x4484                    ;8444       ;    * [4,8,4,4][3&amp;amp;t&amp;gt;&amp;gt;14] &amp;gt;&amp;gt; 8  ;! 44         inc sp&lt;br /&gt;
&lt;br /&gt;
            mov     al, 0b00010000            ;B010       ;write 8253/8254 PIT command/mode register: resets PIT channel #0&lt;br /&gt;
            out     43h, al                   ;E643       ;  [7:6] channel #0, [5:4] LSB only, [3:1] mode0 (one-shot), [0] 16-bit binary&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;present next audio sample (DX:BX = 32-bit sample counter)&lt;br /&gt;
  bbeat:    add     al, 149                   ;04xx       ;  calculate new timer period (AL = 42..148)&lt;br /&gt;
            out     40h, al                   ;E640       ;  rearm timer&lt;br /&gt;
&lt;br /&gt;
            inc     bx                        ;43         ;  increment 16-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            pusha                             ;60         ;  store all registers&lt;br /&gt;
           ;mov     si, seqtbl                ;BExxxx     ;  load address of sequence table into SI (here SI already points to seqtbl by default)&lt;br /&gt;
            mov     dx, bx                    ;89DA       ;  load start value into DX&lt;br /&gt;
            mov     cl, 5                     ;B1xx       ;  init index counter inside CX (CH must be zero already!)&lt;br /&gt;
  bbeat_lp: push    cx                        ;51         ;  store CX counter&lt;br /&gt;
            mov     cl, 01100b                ;B1xx       ;  get bit sequence from time into CL&lt;br /&gt;
            and     cl, bh                    ;20F9       ;    CL := offset to 1 out of 4 entries&lt;br /&gt;
            lodsw                             ;AD         ;  load next sequence table entry (AX := DS:[SI]; SI := SI + 2)&lt;br /&gt;
            ror     ax, cl                    ;D3C8       ;  select sequence entry at bit-offset 0, 4, 8 or 12&lt;br /&gt;
            and     ax, 01111b                ;83E00F     ;  each sequence entry is 4 bits only (AX &amp;amp;= 15)&lt;br /&gt;
            mul     dx                        ;F7E2       ;  multiply (DX:AX := AX ∗ DX)&lt;br /&gt;
            xchg    ax, dx                    ;92         ;    DX := updated 16-bit sample&lt;br /&gt;
            pop     cx                        ;59         ;  restore CX counter&lt;br /&gt;
            shr     bx, 1                     ;D1ED       ;  get next bit sequence from time&lt;br /&gt;
            loop    bbeat_lp                  ;E2xx       ;  loop until all bits are out&lt;br /&gt;
&lt;br /&gt;
            mov     al, dh                    ;88F0       ;  get sample data into AL&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;  load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;  send 8-bit sample data to COVOX device&lt;br /&gt;
            popa                              ;61         ;  restore all registers (especially BX, CX, DX, SI)&lt;br /&gt;
&lt;br /&gt;
  main_lp: ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)&lt;br /&gt;
            in      al, 40h                   ;E440       ;  read low-byte&lt;br /&gt;
            cmp     al, 148                   ;3Cxx       ;  did timer counter overflowed to 149..0FFh?&lt;br /&gt;
            jo      short bbeat               ;71xx       ;    yes -&amp;gt; play&lt;br /&gt;
&lt;br /&gt;
           ;... the place for your intro&lt;br /&gt;
&lt;br /&gt;
            jmp     short main_lp             ;75xx       ;  loop forever&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1336</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1336"/>
				<updated>2024-02-16T13:38:34Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* The Sourcecode */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by [https://www.pouet.net/user.php?who=106446 wiRe/Napalm] and is 58 bytes in size. You can watch the video [https://www.youtube.com/watch?v=AX5OYQzzi1g here]. It won 4th place at the Lovebyte 2024 demoscene party and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by it's inventor. This page describes how this player works and how it can be adopted for other releases. Feel free to use these ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Commercial use is not allowed.&lt;br /&gt;
&lt;br /&gt;
Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data output lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion to an analog signal, so it was very cheap to build such a hardware device on your own at that time. Playing back an 8-bit sample, such as the output of a bytebeat algorithm, through COVOX LPT DAC is a very simple task. Assuming the next sample value is in register AL, then this is all you need to do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to play a bytebeat over the PC speaker in lower quality, as described here: [[Output#PC_Speaker_variant]]&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate quite well. Also, the bytebeat algorithm needs a time counter as input that reflects the current sample number. Therefore we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
To play data through the COVOX LPTDAC, we need a fairly accurate timer. A typical sample rate would be 8 kHz, but higher values can also be used. Lower values may also work in some special cases, but then very lo-fi. There are several ways to get such a timer:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter&lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is probably also the most expensive. Setting up the new interrupt handler (ignoring even the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between the handler and the non-interrupt code. All of this will cost bytes. In most cases, it will take less size to poll the timer instead, as in all the other variants described next. But it must be also be clear that the polling approach makes it necessary to perform this task at a higher frequency than the actual sampling rate, i.e. 8kHz. This requires the polling to be done in an inner loop, e.g. after every pixel update, which can eat up quite a bit of performance.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically, INT8 runs at a frequency of 18.2 Hz. On each call the default interrupt handler increments this value by 1. Reusing this default handler avoids the cost of writing a custom handler just to implement the counter incrementing logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. As soon as its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
            salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
  suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
  _tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
            jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
            mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
            inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
  ntick:&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
            jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever suits you better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
[http://wiki.osdev.org/Programmable_Interval_Timer Programmable Interval Timer]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This solution may result in the shortest code. A disadvantage is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another way to get an accurate time is to read the processor's timestamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply an 8-bit uncompressed audio wave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the time represented by the number of the sample, which is also equal to the byte offset within the stream. It will start generating the first sample for ''t=0'' and will play the sample ''f(8000)'' after exactly 1 second if the sampling rate is 8kHz. Since this is actually a softsynth (music synthesis done by software), in theory any sound or music can be approximated in this way. There are no limits except the increasing complexity of the resulting function.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented to generate the next sample to be written to the COVOX LPT1. But in terms of size, a bytebeat algorithm is better suited if it's formula can be implemented in as few bytes as possible. [https://www.pouet.net/prod.php?which=96071 MUZAK58] is to a certain extent a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the music played comes from sequence tables stored in memory. Changing these words will result in completely new music being played. If you use use more words for this table, the song becomes more complex so that it does not repeat itself so quickly. The sequence table of this reference example is 10 bytes long and looks like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
  seqtbl:   dw      0x1413&lt;br /&gt;
            dw      0x6C66&lt;br /&gt;
            dw      0x2242&lt;br /&gt;
            dw      0x6495&lt;br /&gt;
            dw      0x4484&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The method used to achieve a size-optimized, yet flexible bytebeat is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequencer ==&lt;br /&gt;
As you can read in many bytebeat tutorials, like [[Steady_On_Tim]] by Gasman or in the paper published by Viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function ''o(t)'', like sawtooth, square, triangle or sine waveforms, by multiplying the time parameter ''t'' by a scale factor ''p'': ''f(t) = o(t*p)''. This multiplication factor modulates the pitch. If we then use a sequence table ''s(t)'' to replace ''p'', which will change the pitch of our oscillator over time, we can already play a simple melody using this formula: ''f(t) = o(t*s(t))''.&lt;br /&gt;
&lt;br /&gt;
Accordingly, we implement a single pitch-modulated oscillator with sawtooth waveform:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;255&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9473119236434ffff34df800 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as it is defined by a sequence table. This example demonstrates a sequence of 8 steps, where ''S=8'' specifies the number of steps. Each step changes the pitch of the resulting sawtooth waveform.&lt;br /&gt;
&lt;br /&gt;
Replacing the trailing &amp;quot;&amp;amp;255&amp;quot; (implicit for a bytebeat) by &amp;quot;&amp;amp;128&amp;quot; would change the sawtooth waveform to a square wave function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;128&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9dc1d9b391be7fffcfb76000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Other waveforms are also possible. Here we use the sine function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  sin(t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8]/14)*127+127&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d000001002f0000000000000000399a4a1a8bae05d329e28520c901366398da262860ce3ea49cc63383ad4015395d56ced153c2b5712a75c831dca7c583fffcb53000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Or distortion-like effects can be applied, as shown here using the XOR operator in the last step:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])^64&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100240000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024f067719de4f113fffded5400 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of using only one modulated oscillator or one sequence, also 2 or more can be used and combined, e.g.: ''f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2''&lt;br /&gt;
&lt;br /&gt;
So far, these are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with a sequence table containing as many steps ''S'' as we need for our composition, or at least as many as we can handle due to size constraints. The more steps ''S'' we use, the longer the song will last before it repeats. The larger the value of each sequence step can be, with a value range limited by ''log2(M)'' bits per step, the larger the range of notes we can end up using. Both parameters ''S'' and ''M'' will define the final byte size of our sequence table.&lt;br /&gt;
&lt;br /&gt;
== Cascaded Sequences ==&lt;br /&gt;
&lt;br /&gt;
The problem we will face with this approach in sizecoding products is, that such a sequence table will grow quickly and end up consuming quite a few bytes. Our reference example [https://www.pouet.net/prod.php?which=96071 MUZAK58] uses 10 bytes for all it's song data. Using our knowledge at this point, we would be able to divide these 10 bytes into a sequence of 40 steps (''S = 40''), as long as the limited range per step given by 4 bits (''M = 2^4 = 16'') is sufficient for the music composition we have in mind. 40 steps is not less, but the severely limited range of less than 1 octave will limit us to what we would most likely end up calling a children's song. Instead, the reference muzak sounds like it is made up of at least a multiple of 32 steps before it starts to repeat. And the octave range does not seem to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table like this?&lt;br /&gt;
&lt;br /&gt;
The trick wiRe discovered here is to cascade multiple sequencers and combine all their outputs into a single sequence with a much longer sequence duration (before repetition) and a wider pitch range per sequence step: ''s(t) = (s0(t) * s1(t) * s2(t) * ...) / C''&lt;br /&gt;
&lt;br /&gt;
But this limits the composer's freedom, you might think. This is true! But you will see that the results you get are not as bad as you might think at first, in fact the resulting limitation can even give new impulses to creativity; something we already know as the sizecoding effect.&lt;br /&gt;
&lt;br /&gt;
Here is an attempt to visualize how such an cascaded sequence evolves over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequence step counter. ''O'' is the time divider to derive the step count ''stepcnt = t / O'' with ''O = log2(ticks_per_step)'' to avoid any integer division.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;text&amp;quot;&amp;gt;&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | stpcnt | (t&amp;gt;&amp;gt;O)       | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | seq0ix | (t&amp;gt;&amp;gt;(O+0))%S | 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ... |&lt;br /&gt;
  | seq1ix | (t&amp;gt;&amp;gt;(O+1))%S | 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 ... |&lt;br /&gt;
  | seq2ix | (t&amp;gt;&amp;gt;(O+2))%S | 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 ... |&lt;br /&gt;
  | seq3ix | (t&amp;gt;&amp;gt;(O+3))%S | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 ... |&lt;br /&gt;
  | seq4ix | (t&amp;gt;&amp;gt;(O+4))%S | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
                                                               with S=4&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In combination with our oscillator function, the whole bytebeat will look like this: ''f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )''&lt;br /&gt;
&lt;br /&gt;
== Final Bytebeat Implementation ==&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple sawtooth oscillator, we have an easy game with our oscillator function being as simple as ''o(t) = t''. As we found out, the function ''f(t)'' is then only comprised of ''N+1'' terms, all multiplied together like this: ''f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C''. On each loop iteration of the final bytebeat player, the current sequencer ''sN(t)'' is evaluated by calculating the current sequencer index and looking it up in the sequencer table. The value stored there for this step is then multiplied towards the total result of ''f(t)''. If we keep ''M'' low, then even a 16-bit multiplication is sufficient. The final scaling factor ''C'' depends on the range of the values derived from the sequencer functions ''sN(t)''. Scaling is done as a shift-right operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift-right by 8.&lt;br /&gt;
&lt;br /&gt;
The reference implementation uses a total of 5 cascaded sequencers: ''N=5''. The table of each sequencer was chosen to store 4 sequence steps: ''S=4''. Which sequencer step to index is then based on 2 bits of the parameter ''t''. The shortest sequencer step time for this song was chosen to be ''2^10'' samples, which gives us ''O=10''. This means that the lookup index for each sequencer ''i'' with ''0 &amp;lt;= i &amp;lt; N'' is derived by ''(t&amp;gt;&amp;gt;(O+i))%S''. Each step value is limited by ''M=16''. Putting all this together, we can now start composing a song in this way:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  static constexpr unsigned O = 10;&lt;br /&gt;
  static constexpr unsigned N = 5;&lt;br /&gt;
  static constexpr unsigned S = 4;&lt;br /&gt;
  static constexpr unsigned C = 256;&lt;br /&gt;
&lt;br /&gt;
  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };&lt;br /&gt;
&lt;br /&gt;
  uint8_t get_next_sample( uint16_t t ) {&lt;br /&gt;
    for( unsigned i = 0; i &amp;lt; N; i++ ) t *= seqtbl[i][(t&amp;gt;&amp;gt;(O+i))%S];&lt;br /&gt;
    return static_cast&amp;lt;uint8_t&amp;gt;(t / C);&lt;br /&gt;
  }&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;v=circles&amp;amp;bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
= The Sourcecode =&lt;br /&gt;
&lt;br /&gt;
With all these parameters carefully chosen, the final bytebeat implementation and sequence tables will be very small. Here is the commented source code of [https://www.pouet.net/prod.php?which=96071 MUZAK58]:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
           ; MUZAK58 by wiRe/NpM&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
            section .text&lt;br /&gt;
            org     100h&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;muzak sequence table&lt;br /&gt;
  seqtbl:   dw      0x1413                    ;1314       ;  t * [3,1,4,1][3&amp;amp;t&amp;gt;&amp;gt;10]       ;! 1314       adc dx,[si]&lt;br /&gt;
            dw      0x6C66                    ;666C       ;    * [6,6,12,6][3&amp;amp;t&amp;gt;&amp;gt;11]      ;! 666C       o32 insb&lt;br /&gt;
            dw      0x2242                    ;4222       ;    * [2,4,2,2][3&amp;amp;t&amp;gt;&amp;gt;12]       ;! 42         inc dx&lt;br /&gt;
            dw      0x6495                    ;9564       ;    * [5,9,4,6][3&amp;amp;t&amp;gt;&amp;gt;13]       ;! 22956484   and dl,[di-0x7b9c]&lt;br /&gt;
            dw      0x4484                    ;8444       ;    * [4,8,4,4][3&amp;amp;t&amp;gt;&amp;gt;14] &amp;gt;&amp;gt; 8  ;! 44         inc sp&lt;br /&gt;
&lt;br /&gt;
            mov     al, 0b00010000            ;B010       ;write 8253/8254 PIT command/mode register: resets PIT channel #0&lt;br /&gt;
            out     43h, al                   ;E643       ;  [7:6] channel #0, [5:4] LSB only, [3:1] mode0 (one-shot), [0] 16-bit binary&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;present next audio sample (DX:BX = 32-bit sample counter)&lt;br /&gt;
  bbeat:    add     al, 149                   ;04xx       ;  calculate new timer period (AL = 42..148)&lt;br /&gt;
            out     40h, al                   ;E640       ;  rearm timer&lt;br /&gt;
&lt;br /&gt;
            inc     bx                        ;43         ;  increment 16-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            pusha                             ;60         ;  store all registers&lt;br /&gt;
           ;mov     si, seqtbl                ;BExxxx     ;  load address of sequence table into SI (here SI already points to seqtbl by default)&lt;br /&gt;
            mov     dx, bx                    ;89DA       ;  load start value into DX&lt;br /&gt;
            mov     cl, 5                     ;B1xx       ;  init index counter inside CX (CH must be zero already!)&lt;br /&gt;
  bbeat_lp: push    cx                        ;51         ;  store CX counter&lt;br /&gt;
            mov     cl, 01100b                ;B1xx       ;  get bit sequence from time into CL&lt;br /&gt;
            and     cl, bh                    ;20F9       ;    CL := offset to 1 out of 4 entries&lt;br /&gt;
            lodsw                             ;AD         ;  load next sequence table entry (AX := DS:[SI]; SI := SI + 2)&lt;br /&gt;
            ror     ax, cl                    ;D3C8       ;  select sequence entry at bit-offset 0, 4, 8 or 12&lt;br /&gt;
            and     ax, 01111b                ;83E00F     ;  each sequence entry is 4 bits only (AX &amp;amp;= 15)&lt;br /&gt;
            mul     dx                        ;F7E2       ;  multiply (DX:AX := AX ∗ DX)&lt;br /&gt;
            xchg    ax, dx                    ;92         ;    DX := updated 16-bit sample&lt;br /&gt;
            pop     cx                        ;59         ;  restore CX counter&lt;br /&gt;
            shr     bx, 1                     ;D1ED       ;  get next bit sequence from time&lt;br /&gt;
            loop    bbeat_lp                  ;E2xx       ;  loop until all bits are out&lt;br /&gt;
&lt;br /&gt;
            mov     al, dh                    ;88F0       ;  get sample data into AL&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;  load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;  send 8-bit sample data to COVOX device&lt;br /&gt;
            popa                              ;61         ;  restore all registers (especially BX, CX, DX, SI)&lt;br /&gt;
&lt;br /&gt;
  main_lp: ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)&lt;br /&gt;
            in      al, 40h                   ;E440       ;  read low-byte&lt;br /&gt;
            cmp     al, 148                   ;3Cxx       ;  did timer counter overflowed to 149..0FFh?&lt;br /&gt;
            jo      short bbeat               ;71xx       ;    yes -&amp;gt; play&lt;br /&gt;
&lt;br /&gt;
           ;... the place for your intro&lt;br /&gt;
&lt;br /&gt;
            jmp     short main_lp             ;75xx       ;  loop forever&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1335</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1335"/>
				<updated>2024-02-15T20:32:26Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Cascaded Sequences */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by [https://www.pouet.net/user.php?who=106446 wiRe/Napalm] and is 58 bytes in size. You can watch the video [https://www.youtube.com/watch?v=AX5OYQzzi1g here]. It won 4th place at the Lovebyte 2024 demoscene party and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by it's inventor. This page describes how this player works and how it can be adopted for other releases. Feel free to use these ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Commercial use is not allowed.&lt;br /&gt;
&lt;br /&gt;
Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data output lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion to an analog signal, so it was very cheap to build such a hardware device on your own at that time. Playing back an 8-bit sample, such as the output of a bytebeat algorithm, through COVOX LPT DAC is a very simple task. Assuming the next sample value is in register AL, then this is all you need to do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to play a bytebeat over the PC speaker in lower quality, as described here: [[Output#PC_Speaker_variant]]&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate quite well. Also, the bytebeat algorithm needs a time counter as input that reflects the current sample number. Therefore we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
To play data through the COVOX LPTDAC, we need a fairly accurate timer. A typical sample rate would be 8 kHz, but higher values can also be used. Lower values may also work in some special cases, but then very lo-fi. There are several ways to get such a timer:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter&lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is probably also the most expensive. Setting up the new interrupt handler (ignoring even the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between the handler and the non-interrupt code. All of this will cost bytes. In most cases, it will take less size to poll the timer instead, as in all the other variants described next. But it must be also be clear that the polling approach makes it necessary to perform this task at a higher frequency than the actual sampling rate, i.e. 8kHz. This requires the polling to be done in an inner loop, e.g. after every pixel update, which can eat up quite a bit of performance.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically, INT8 runs at a frequency of 18.2 Hz. On each call the default interrupt handler increments this value by 1. Reusing this default handler avoids the cost of writing a custom handler just to implement the counter incrementing logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. As soon as its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
            salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
  suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
  _tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
            jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
            mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
            inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
  ntick:&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
            jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever suits you better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
[http://wiki.osdev.org/Programmable_Interval_Timer Programmable Interval Timer]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This solution may result in the shortest code. A disadvantage is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another way to get an accurate time is to read the processor's timestamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply an 8-bit uncompressed audio wave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the time represented by the number of the sample, which is also equal to the byte offset within the stream. It will start generating the first sample for ''t=0'' and will play the sample ''f(8000)'' after exactly 1 second if the sampling rate is 8kHz. Since this is actually a softsynth (music synthesis done by software), in theory any sound or music can be approximated in this way. There are no limits except the increasing complexity of the resulting function.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented to generate the next sample to be written to the COVOX LPT1. But in terms of size, a bytebeat algorithm is better suited if it's formula can be implemented in as few bytes as possible. [https://www.pouet.net/prod.php?which=96071 MUZAK58] is to a certain extent a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the music played comes from sequence tables stored in memory. Changing these words will result in completely new music being played. If you use use more words for this table, the song becomes more complex so that it does not repeat itself so quickly. The sequence table of this reference example is 10 bytes long and looks like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
  seqtbl:   dw      0x1413&lt;br /&gt;
            dw      0x6C66&lt;br /&gt;
            dw      0x2242&lt;br /&gt;
            dw      0x6495&lt;br /&gt;
            dw      0x4484&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The method used to achieve a size-optimized, yet flexible bytebeat is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequencer ==&lt;br /&gt;
As you can read in many bytebeat tutorials, like [[Steady_On_Tim]] by Gasman or in the paper published by Viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function ''o(t)'', like sawtooth, square, triangle or sine waveforms, by multiplying the time parameter ''t'' by a scale factor ''p'': ''f(t) = o(t*p)''. This multiplication factor modulates the pitch. If we then use a sequence table ''s(t)'' to replace ''p'', which will change the pitch of our oscillator over time, we can already play a simple melody using this formula: ''f(t) = o(t*s(t))''.&lt;br /&gt;
&lt;br /&gt;
Accordingly, we implement a single pitch-modulated oscillator with sawtooth waveform:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;255&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9473119236434ffff34df800 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as it is defined by a sequence table. This example demonstrates a sequence of 8 steps, where ''S=8'' specifies the number of steps. Each step changes the pitch of the resulting sawtooth waveform.&lt;br /&gt;
&lt;br /&gt;
Replacing the trailing &amp;quot;&amp;amp;255&amp;quot; (implicit for a bytebeat) by &amp;quot;&amp;amp;128&amp;quot; would change the sawtooth waveform to a square wave function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;128&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9dc1d9b391be7fffcfb76000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Other waveforms are also possible. Here we use the sine function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  sin(t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8]/14)*127+127&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d000001002f0000000000000000399a4a1a8bae05d329e28520c901366398da262860ce3ea49cc63383ad4015395d56ced153c2b5712a75c831dca7c583fffcb53000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Or distortion-like effects can be applied, as shown here using the XOR operator in the last step:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])^64&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100240000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024f067719de4f113fffded5400 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of using only one modulated oscillator or one sequence, also 2 or more can be used and combined, e.g.: ''f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2''&lt;br /&gt;
&lt;br /&gt;
So far, these are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with a sequence table containing as many steps ''S'' as we need for our composition, or at least as many as we can handle due to size constraints. The more steps ''S'' we use, the longer the song will last before it repeats. The larger the value of each sequence step can be, with a value range limited by ''log2(M)'' bits per step, the larger the range of notes we can end up using. Both parameters ''S'' and ''M'' will define the final byte size of our sequence table.&lt;br /&gt;
&lt;br /&gt;
== Cascaded Sequences ==&lt;br /&gt;
&lt;br /&gt;
The problem we will face with this approach in sizecoding products is, that such a sequence table will grow quickly and end up consuming quite a few bytes. Our reference example [https://www.pouet.net/prod.php?which=96071 MUZAK58] uses 10 bytes for all it's song data. Using our knowledge at this point, we would be able to divide these 10 bytes into a sequence of 40 steps (''S = 40''), as long as the limited range per step given by 4 bits (''M = 2^4 = 16'') is sufficient for the music composition we have in mind. 40 steps is not less, but the severely limited range of less than 1 octave will limit us to what we would most likely end up calling a children's song. Instead, the reference muzak sounds like it is made up of at least a multiple of 32 steps before it starts to repeat. And the octave range does not seem to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table like this?&lt;br /&gt;
&lt;br /&gt;
The trick wiRe discovered here is to cascade multiple sequencers and combine all their outputs into a single sequence with a much longer sequence duration (before repetition) and a wider pitch range per sequence step: ''s(t) = (s0(t) * s1(t) * s2(t) * ...) / C''&lt;br /&gt;
&lt;br /&gt;
But this limits the composer's freedom, you might think. This is true! But you will see that the results you get are not as bad as you might think at first, in fact the resulting limitation can even give new impulses to creativity; something we already know as the sizecoding effect.&lt;br /&gt;
&lt;br /&gt;
Here is an attempt to visualize how such an cascaded sequence evolves over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequence step counter. ''O'' is the time divider to derive the step count ''stepcnt = t / O'' with ''O = log2(ticks_per_step)'' to avoid any integer division.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;text&amp;quot;&amp;gt;&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | stpcnt | (t&amp;gt;&amp;gt;O)       | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | seq0ix | (t&amp;gt;&amp;gt;(O+0))%S | 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ... |&lt;br /&gt;
  | seq1ix | (t&amp;gt;&amp;gt;(O+1))%S | 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 ... |&lt;br /&gt;
  | seq2ix | (t&amp;gt;&amp;gt;(O+2))%S | 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 ... |&lt;br /&gt;
  | seq3ix | (t&amp;gt;&amp;gt;(O+3))%S | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 ... |&lt;br /&gt;
  | seq4ix | (t&amp;gt;&amp;gt;(O+4))%S | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
                                                               with S=4&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In combination with our oscillator function, the whole bytebeat will look like this: ''f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )''&lt;br /&gt;
&lt;br /&gt;
== Final Bytebeat Implementation ==&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple sawtooth oscillator, we have an easy game with our oscillator function being as simple as ''o(t) = t''. As we found out, the function ''f(t)'' is then only comprised of ''N+1'' terms, all multiplied together like this: ''f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C''. On each loop iteration of the final bytebeat player, the current sequencer ''sN(t)'' is evaluated by calculating the current sequencer index and looking it up in the sequencer table. The value stored there for this step is then multiplied towards the total result of ''f(t)''. If we keep ''M'' low, then even a 16-bit multiplication is sufficient. The final scaling factor ''C'' depends on the range of the values derived from the sequencer functions ''sN(t)''. Scaling is done as a shift-right operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift-right by 8.&lt;br /&gt;
&lt;br /&gt;
The reference implementation uses a total of 5 cascaded sequencers: ''N=5''. The table of each sequencer was chosen to store 4 sequence steps: ''S=4''. Which sequencer step to index is then based on 2 bits of the parameter ''t''. The shortest sequencer step time for this song was chosen to be ''2^10'' samples, which gives us ''O=10''. This means that the lookup index for each sequencer ''i'' with ''0 &amp;lt;= i &amp;lt; N'' is derived by ''(t&amp;gt;&amp;gt;(O+i))%S''. Each step value is limited by ''M=16''. Putting all this together, we can now start composing a song in this way:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  static constexpr unsigned O = 10;&lt;br /&gt;
  static constexpr unsigned N = 5;&lt;br /&gt;
  static constexpr unsigned S = 4;&lt;br /&gt;
  static constexpr unsigned C = 256;&lt;br /&gt;
&lt;br /&gt;
  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };&lt;br /&gt;
&lt;br /&gt;
  uint8_t get_next_sample( uint16_t t ) {&lt;br /&gt;
    for( unsigned i = 0; i &amp;lt; N; i++ ) t *= seqtbl[i][(t&amp;gt;&amp;gt;(O+i))%S];&lt;br /&gt;
    return static_cast&amp;lt;uint8_t&amp;gt;(t / C);&lt;br /&gt;
  }&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;v=circles&amp;amp;bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
= The Sourcecode =&lt;br /&gt;
&lt;br /&gt;
With all these parameters carefully chosen, the final bytebeat implementation and sequence tables will be very small. Here is the commented source code of [https://www.pouet.net/prod.php?which=96071 MUZAK58]:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
           ; MUZAK58 by wiRe/NpM&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
            section .text&lt;br /&gt;
            org     100h&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;muzak sequence table&lt;br /&gt;
  seqtbl:   dw      0x1413                    ;1314       ;  t * [3,1,4,1][3&amp;amp;t&amp;gt;&amp;gt;10]       ;! 1314       adc dx,[si]&lt;br /&gt;
            dw      0x6C66                    ;666C       ;    * [6,6,12,6][3&amp;amp;t&amp;gt;&amp;gt;11]      ;! 666C       o32 insb&lt;br /&gt;
            dw      0x2242                    ;4222       ;    * [2,4,2,2][3&amp;amp;t&amp;gt;&amp;gt;12]       ;! 42         inc dx&lt;br /&gt;
            dw      0x6495                    ;9564       ;    * [5,9,4,6][3&amp;amp;t&amp;gt;&amp;gt;13]       ;! 22956484   and dl,[di-0x7b9c]&lt;br /&gt;
            dw      0x4484                    ;8444       ;    * [4,8,4,4][3&amp;amp;t&amp;gt;&amp;gt;14] &amp;gt;&amp;gt; 8  ;! 44         inc sp&lt;br /&gt;
&lt;br /&gt;
            mov     al, 0b00010000            ;B010       ;write 8253/8254 PIT command/mode register: resets PIT channel #0&lt;br /&gt;
            out     43h, al                   ;E643       ;  [7:6] channel #0, [5:4] LSB only, [3:1] mode0 (one-shot), [0] 16-bit binary&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;present next audio sample (DX:BX = 32-bit sample counter)&lt;br /&gt;
  bbeat:    add     al, 149                   ;04xx       ;  calculate new timer period (AL = 42..148)&lt;br /&gt;
            out     40h, al                   ;E640       ;  rearm timer&lt;br /&gt;
&lt;br /&gt;
            inc     bx                        ;43         ;  increment 16-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            pusha                             ;60         ;  store all registers&lt;br /&gt;
           ;mov     si, seqtbl                ;BExxxx     ;  load address of sequence table into SI (here SI already points to seqtbl by default)&lt;br /&gt;
            mov     dx, bx                    ;89DA       ;  load start value into DX&lt;br /&gt;
            mov     cl, 5                     ;B1xx       ;  init index counter inside CX (CH must be zero already!)&lt;br /&gt;
  bbeat_lp: push    cx                        ;51         ;  store CX counter&lt;br /&gt;
            mov     cl, 01100b                ;B1xx       ;  get bit sequence from time into CL&lt;br /&gt;
            and     cl, bh                    ;20F9       ;    CL := offset to 1 out of 4 entries&lt;br /&gt;
            lodsw                             ;AD         ;  load next sequence table entry (AX := DS:[SI]; SI := SI + 2)&lt;br /&gt;
            ror     ax, cl                    ;D3C8       ;  select sequence entry at bit-offset 0, 4, 8 or 12&lt;br /&gt;
            and     ax, 01111b                ;83E00F     ;  each sequence entry is 4 bits only (AX &amp;amp;= 15)&lt;br /&gt;
            mul     dx                        ;F7E2       ;  multiply (DX:AX := AX ∗ DX)&lt;br /&gt;
            xchg    ax, dx                    ;92         ;    DX := updated 16-bit sample&lt;br /&gt;
            pop     cx                        ;59         ;  restore CX counter&lt;br /&gt;
            shr     bx, 1                     ;D1ED       ;  get next bit sequence from time&lt;br /&gt;
            loop    bbeat_lp                  ;E2xx       ;  loop until all bits are out&lt;br /&gt;
&lt;br /&gt;
            mov     al, dh                    ;88F0       ;  get sample data into AL&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;  load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;  send 8-bit sample data to COVOX device&lt;br /&gt;
            popa                              ;61         ;  restore all registers (especially BX, CX, DX, SI)&lt;br /&gt;
&lt;br /&gt;
  suplp:   ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)&lt;br /&gt;
            in      al, 40h                   ;E440       ;  read low-byte&lt;br /&gt;
            cmp     al, 148                   ;3Cxx       ;  did timer counter overflowed to 149..0FFh?&lt;br /&gt;
            jo      bbeat                     ;71xx       ;    yes -&amp;gt; play&lt;br /&gt;
&lt;br /&gt;
  bbeat_sk: jmp     short suplp               ;75xx       ;  loop forever&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1334</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1334"/>
				<updated>2024-02-15T20:23:00Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: text has been revised&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by [https://www.pouet.net/user.php?who=106446 wiRe/Napalm] and is 58 bytes in size. You can watch the video [https://www.youtube.com/watch?v=AX5OYQzzi1g here]. It won 4th place at the Lovebyte 2024 demoscene party and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by it's inventor. This page describes how this player works and how it can be adopted for other releases. Feel free to use these ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Commercial use is not allowed.&lt;br /&gt;
&lt;br /&gt;
Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data output lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion to an analog signal, so it was very cheap to build such a hardware device on your own at that time. Playing back an 8-bit sample, such as the output of a bytebeat algorithm, through COVOX LPT DAC is a very simple task. Assuming the next sample value is in register AL, then this is all you need to do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to play a bytebeat over the PC speaker in lower quality, as described here: [[Output#PC_Speaker_variant]]&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate quite well. Also, the bytebeat algorithm needs a time counter as input that reflects the current sample number. Therefore we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
To play data through the COVOX LPTDAC, we need a fairly accurate timer. A typical sample rate would be 8 kHz, but higher values can also be used. Lower values may also work in some special cases, but then very lo-fi. There are several ways to get such a timer:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter&lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is probably also the most expensive. Setting up the new interrupt handler (ignoring even the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between the handler and the non-interrupt code. All of this will cost bytes. In most cases, it will take less size to poll the timer instead, as in all the other variants described next. But it must be also be clear that the polling approach makes it necessary to perform this task at a higher frequency than the actual sampling rate, i.e. 8kHz. This requires the polling to be done in an inner loop, e.g. after every pixel update, which can eat up quite a bit of performance.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically, INT8 runs at a frequency of 18.2 Hz. On each call the default interrupt handler increments this value by 1. Reusing this default handler avoids the cost of writing a custom handler just to implement the counter incrementing logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. As soon as its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
            salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
  suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
  _tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
            jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
            mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
            inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
  ntick:&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
            jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever suits you better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
[http://wiki.osdev.org/Programmable_Interval_Timer Programmable Interval Timer]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This solution may result in the shortest code. A disadvantage is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another way to get an accurate time is to read the processor's timestamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply an 8-bit uncompressed audio wave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the time represented by the number of the sample, which is also equal to the byte offset within the stream. It will start generating the first sample for ''t=0'' and will play the sample ''f(8000)'' after exactly 1 second if the sampling rate is 8kHz. Since this is actually a softsynth (music synthesis done by software), in theory any sound or music can be approximated in this way. There are no limits except the increasing complexity of the resulting function.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented to generate the next sample to be written to the COVOX LPT1. But in terms of size, a bytebeat algorithm is better suited if it's formula can be implemented in as few bytes as possible. [https://www.pouet.net/prod.php?which=96071 MUZAK58] is to a certain extent a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the music played comes from sequence tables stored in memory. Changing these words will result in completely new music being played. If you use use more words for this table, the song becomes more complex so that it does not repeat itself so quickly. The sequence table of this reference example is 10 bytes long and looks like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
  seqtbl:   dw      0x1413&lt;br /&gt;
            dw      0x6C66&lt;br /&gt;
            dw      0x2242&lt;br /&gt;
            dw      0x6495&lt;br /&gt;
            dw      0x4484&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The method used to achieve a size-optimized, yet flexible bytebeat is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequencer ==&lt;br /&gt;
As you can read in many bytebeat tutorials, like [[Steady_On_Tim]] by Gasman or in the paper published by Viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function ''o(t)'', like sawtooth, square, triangle or sine waveforms, by multiplying the time parameter ''t'' by a scale factor ''p'': ''f(t) = o(t*p)''. This multiplication factor modulates the pitch. If we then use a sequence table ''s(t)'' to replace ''p'', which will change the pitch of our oscillator over time, we can already play a simple melody using this formula: ''f(t) = o(t*s(t))''.&lt;br /&gt;
&lt;br /&gt;
Accordingly, we implement a single pitch-modulated oscillator with sawtooth waveform:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;255&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9473119236434ffff34df800 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as it is defined by a sequence table. This example demonstrates a sequence of 8 steps, where ''S=8'' specifies the number of steps. Each step changes the pitch of the resulting sawtooth waveform.&lt;br /&gt;
&lt;br /&gt;
Replacing the trailing &amp;quot;&amp;amp;255&amp;quot; (implicit for a bytebeat) by &amp;quot;&amp;amp;128&amp;quot; would change the sawtooth waveform to a square wave function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;128&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9dc1d9b391be7fffcfb76000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Other waveforms are also possible. Here we use the sine function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  sin(t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8]/14)*127+127&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d000001002f0000000000000000399a4a1a8bae05d329e28520c901366398da262860ce3ea49cc63383ad4015395d56ced153c2b5712a75c831dca7c583fffcb53000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Or distortion-like effects can be applied, as shown here using the XOR operator in the last step:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])^64&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100240000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024f067719de4f113fffded5400 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of using only one modulated oscillator or one sequence, also 2 or more can be used and combined, e.g.: ''f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2''&lt;br /&gt;
&lt;br /&gt;
So far, these are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with a sequence table containing as many steps ''S'' as we need for our composition, or at least as many as we can handle due to size constraints. The more steps ''S'' we use, the longer the song will last before it repeats. The larger the value of each sequence step can be, with a value range limited by ''log2(M)'' bits per step, the larger the range of notes we can end up using. Both parameters ''S'' and ''M'' will define the final byte size of our sequence table.&lt;br /&gt;
&lt;br /&gt;
== Cascaded Sequences ==&lt;br /&gt;
&lt;br /&gt;
The problem we will face with this approach in sizecoding products is, that such a sequence table will grow quickly and end up consuming quite a few bytes. Our reference example [https://www.pouet.net/prod.php?which=96071 MUZAK58] uses 10 bytes for all it's song data. Using our knowledge at this point, we would be able to divide these 10 bytes into a sequence of 40 steps (''S = 40''), as long as the limited range per step given by 4 bits (''M = 2^4 = 16'') is sufficient for the music composition we have in mind. 40 steps is not less, but the severely limited range of less than 1 octave will limit us to what we would most likely end up calling a children's song. Instead, the reference muzak sounds like it is made up of at least a multiple of 32 steps before it starts to repeat. And the octave range does not seem to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table like this?&lt;br /&gt;
&lt;br /&gt;
The trick wiRe discovered here is to cascade multiple sequencers and combine all their outputs into a single sequence with a much longer sequence duration (before repetition) and a wider pitch range per sequence step: ''s(t) = (s0(t) * s1(t) * s2(t) * ...) / C''&lt;br /&gt;
&lt;br /&gt;
But this limits the composer's freedom, you might think. This is true! But you will see that the results you get are not as bad as you might think at first, in fact the resulting limitation can even give new impulses to creativity; something we already know as the sizecoding effect.&lt;br /&gt;
&lt;br /&gt;
Here is an attempt to visualize how such an cascaded sequence will evolves over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequence step counter. ''O'' is the time divider to derive the step count ''stepcnt = t / O'' with ''O = log2(ticks_per_step)'' to avoid any integer division.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;text&amp;quot;&amp;gt;&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | stpcnt | (t&amp;gt;&amp;gt;O)       | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | seq0ix | (t&amp;gt;&amp;gt;(O+0))%S | 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ... |&lt;br /&gt;
  | seq1ix | (t&amp;gt;&amp;gt;(O+1))%S | 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 ... |&lt;br /&gt;
  | seq2ix | (t&amp;gt;&amp;gt;(O+2))%S | 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 ... |&lt;br /&gt;
  | seq3ix | (t&amp;gt;&amp;gt;(O+3))%S | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 ... |&lt;br /&gt;
  | seq4ix | (t&amp;gt;&amp;gt;(O+4))%S | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
                                                               with S=4&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In combination with our oscillator function, the whole bytebeat will look like this: ''f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )''&lt;br /&gt;
&lt;br /&gt;
== Final Bytebeat Implementation ==&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple sawtooth oscillator, we have an easy game with our oscillator function being as simple as ''o(t) = t''. As we found out, the function ''f(t)'' is then only comprised of ''N+1'' terms, all multiplied together like this: ''f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C''. On each loop iteration of the final bytebeat player, the current sequencer ''sN(t)'' is evaluated by calculating the current sequencer index and looking it up in the sequencer table. The value stored there for this step is then multiplied towards the total result of ''f(t)''. If we keep ''M'' low, then even a 16-bit multiplication is sufficient. The final scaling factor ''C'' depends on the range of the values derived from the sequencer functions ''sN(t)''. Scaling is done as a shift-right operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift-right by 8.&lt;br /&gt;
&lt;br /&gt;
The reference implementation uses a total of 5 cascaded sequencers: ''N=5''. The table of each sequencer was chosen to store 4 sequence steps: ''S=4''. Which sequencer step to index is then based on 2 bits of the parameter ''t''. The shortest sequencer step time for this song was chosen to be ''2^10'' samples, which gives us ''O=10''. This means that the lookup index for each sequencer ''i'' with ''0 &amp;lt;= i &amp;lt; N'' is derived by ''(t&amp;gt;&amp;gt;(O+i))%S''. Each step value is limited by ''M=16''. Putting all this together, we can now start composing a song in this way:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  static constexpr unsigned O = 10;&lt;br /&gt;
  static constexpr unsigned N = 5;&lt;br /&gt;
  static constexpr unsigned S = 4;&lt;br /&gt;
  static constexpr unsigned C = 256;&lt;br /&gt;
&lt;br /&gt;
  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };&lt;br /&gt;
&lt;br /&gt;
  uint8_t get_next_sample( uint16_t t ) {&lt;br /&gt;
    for( unsigned i = 0; i &amp;lt; N; i++ ) t *= seqtbl[i][(t&amp;gt;&amp;gt;(O+i))%S];&lt;br /&gt;
    return static_cast&amp;lt;uint8_t&amp;gt;(t / C);&lt;br /&gt;
  }&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;v=circles&amp;amp;bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
= The Sourcecode =&lt;br /&gt;
&lt;br /&gt;
With all these parameters carefully chosen, the final bytebeat implementation and sequence tables will be very small. Here is the commented source code of [https://www.pouet.net/prod.php?which=96071 MUZAK58]:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
           ; MUZAK58 by wiRe/NpM&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
            section .text&lt;br /&gt;
            org     100h&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;muzak sequence table&lt;br /&gt;
  seqtbl:   dw      0x1413                    ;1314       ;  t * [3,1,4,1][3&amp;amp;t&amp;gt;&amp;gt;10]       ;! 1314       adc dx,[si]&lt;br /&gt;
            dw      0x6C66                    ;666C       ;    * [6,6,12,6][3&amp;amp;t&amp;gt;&amp;gt;11]      ;! 666C       o32 insb&lt;br /&gt;
            dw      0x2242                    ;4222       ;    * [2,4,2,2][3&amp;amp;t&amp;gt;&amp;gt;12]       ;! 42         inc dx&lt;br /&gt;
            dw      0x6495                    ;9564       ;    * [5,9,4,6][3&amp;amp;t&amp;gt;&amp;gt;13]       ;! 22956484   and dl,[di-0x7b9c]&lt;br /&gt;
            dw      0x4484                    ;8444       ;    * [4,8,4,4][3&amp;amp;t&amp;gt;&amp;gt;14] &amp;gt;&amp;gt; 8  ;! 44         inc sp&lt;br /&gt;
&lt;br /&gt;
            mov     al, 0b00010000            ;B010       ;write 8253/8254 PIT command/mode register: resets PIT channel #0&lt;br /&gt;
            out     43h, al                   ;E643       ;  [7:6] channel #0, [5:4] LSB only, [3:1] mode0 (one-shot), [0] 16-bit binary&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;present next audio sample (DX:BX = 32-bit sample counter)&lt;br /&gt;
  bbeat:    add     al, 149                   ;04xx       ;  calculate new timer period (AL = 42..148)&lt;br /&gt;
            out     40h, al                   ;E640       ;  rearm timer&lt;br /&gt;
&lt;br /&gt;
            inc     bx                        ;43         ;  increment 16-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            pusha                             ;60         ;  store all registers&lt;br /&gt;
           ;mov     si, seqtbl                ;BExxxx     ;  load address of sequence table into SI (here SI already points to seqtbl by default)&lt;br /&gt;
            mov     dx, bx                    ;89DA       ;  load start value into DX&lt;br /&gt;
            mov     cl, 5                     ;B1xx       ;  init index counter inside CX (CH must be zero already!)&lt;br /&gt;
  bbeat_lp: push    cx                        ;51         ;  store CX counter&lt;br /&gt;
            mov     cl, 01100b                ;B1xx       ;  get bit sequence from time into CL&lt;br /&gt;
            and     cl, bh                    ;20F9       ;    CL := offset to 1 out of 4 entries&lt;br /&gt;
            lodsw                             ;AD         ;  load next sequence table entry (AX := DS:[SI]; SI := SI + 2)&lt;br /&gt;
            ror     ax, cl                    ;D3C8       ;  select sequence entry at bit-offset 0, 4, 8 or 12&lt;br /&gt;
            and     ax, 01111b                ;83E00F     ;  each sequence entry is 4 bits only (AX &amp;amp;= 15)&lt;br /&gt;
            mul     dx                        ;F7E2       ;  multiply (DX:AX := AX ∗ DX)&lt;br /&gt;
            xchg    ax, dx                    ;92         ;    DX := updated 16-bit sample&lt;br /&gt;
            pop     cx                        ;59         ;  restore CX counter&lt;br /&gt;
            shr     bx, 1                     ;D1ED       ;  get next bit sequence from time&lt;br /&gt;
            loop    bbeat_lp                  ;E2xx       ;  loop until all bits are out&lt;br /&gt;
&lt;br /&gt;
            mov     al, dh                    ;88F0       ;  get sample data into AL&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;  load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;  send 8-bit sample data to COVOX device&lt;br /&gt;
            popa                              ;61         ;  restore all registers (especially BX, CX, DX, SI)&lt;br /&gt;
&lt;br /&gt;
  suplp:   ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)&lt;br /&gt;
            in      al, 40h                   ;E440       ;  read low-byte&lt;br /&gt;
            cmp     al, 148                   ;3Cxx       ;  did timer counter overflowed to 149..0FFh?&lt;br /&gt;
            jo      bbeat                     ;71xx       ;    yes -&amp;gt; play&lt;br /&gt;
&lt;br /&gt;
  bbeat_sk: jmp     short suplp               ;75xx       ;  loop forever&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1333</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1333"/>
				<updated>2024-02-15T19:16:25Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. Watch the video [https://www.youtube.com/watch?v=AX5OYQzzi1g here]. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Any commercial use is not permitted.&lt;br /&gt;
&lt;br /&gt;
Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to play a bytebeat through the PC speaker at lower quality, as described here: [[Output#PC_Speaker_variant]]&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases, but very Lo-Fi then. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter&lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between handler and non-interrupt code. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update, which might eat up quite some performance.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
            salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
  suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
  _tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
            jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
            mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
            inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
  ntick:&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
            jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
[http://wiki.osdev.org/Programmable_Interval_Timer Programmable Interval Timer]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This solution may result in the shortest code. But one drawback is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the time represented by the number of the sample, which is also equal to the byte offset inside the stream. It will start generation of the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will play the sample ''f(8000)'' after exactly 1 second. Since this is actually a Softsynth (music synthesis done by software), in theory any sound or music can be aproximated this way. There are no limits except the increasing complexity of the resulting function.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample to be written to the COVOX LPT1. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. [https://www.pouet.net/prod.php?which=96071 MUZAK58] is to some degree a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from sequence tables stored in memory. Changing those words results in an entirely new music being played. Also spending more words for this table will add ore variations to the song, that it will not repeat as fast. The sqeuence table of this reference example is 10 bytes in total and looks like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
  seqtbl:   dw      0x1413&lt;br /&gt;
            dw      0x6C66&lt;br /&gt;
            dw      0x2242&lt;br /&gt;
            dw      0x6495&lt;br /&gt;
            dw      0x4484&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The method used to achieve a size-optimized, but still flexible bytebeat is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequencer ==&lt;br /&gt;
As you can read in many Bytebeat tutorials, like [[Steady_On_Tim]] by Gasman or inside the paper published by viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function ''o(t)'', like saw-tooth, square, triangle or sine waveforms, by multiplying the time parameter ''t'' by a scale factor ''p'': ''f(t) = o(t*p)''. This multiplication factor will modulate the pitch. If we then use a sequence table ''s(t)'' to replace ''p'', that will change the pitch of our oscillator over time, then we can already play some simple melody using this formula: ''f(t) = o(t*s(t))''.&lt;br /&gt;
&lt;br /&gt;
Accordingly, we implement a single pitch-modulated oscillator with saw-tooth waveform:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;255&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9473119236434ffff34df800 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as this should be defined by a sequence table. This example demonstrates a sequence of 8 steps, with ''S=8'' specifying the number of steps. Each step will change the pitch of the resulting saw-tooth waveform.&lt;br /&gt;
&lt;br /&gt;
Replacing the trailing &amp;quot;&amp;amp;255&amp;quot; (implicit for a bytebeat) by &amp;quot;&amp;amp;128&amp;quot; would change the saw-tooth waveform into a square-wave function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;128&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9dc1d9b391be7fffcfb76000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Also other waveforms are possible. Here we use the sine function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  sin(t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8]/14)*127+127&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d000001002f0000000000000000399a4a1a8bae05d329e28520c901366398da262860ce3ea49cc63383ad4015395d56ced153c2b5712a75c831dca7c583fffcb53000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Or distortion-like effects can be applied, like demonstrated here by using the XOR operator in the last step:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])^64&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100240000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024f067719de4f113fffded5400 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of using only one modulated oscillator or one sequence, also 2 or more can be used and combined, like i.e.: ''f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2''&lt;br /&gt;
&lt;br /&gt;
So far, this are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with one sequence table that holds as many steps ''S'' as we need for our composition, or at least as many as we can effort due to size constraints. The more steps ''S'' we spent, the longer the song will durate before it repeats. The larger the value of each sequence step could be, with a value range limited by ''log2(M)'' bits per step, the larger the range of notes we can use in the end. Both parameters ''S'' and ''M'' will define the final byte-size of our sequence table.&lt;br /&gt;
&lt;br /&gt;
== Cascaded Sequences ==&lt;br /&gt;
&lt;br /&gt;
The issue we will face with this approach in size-coding is, that such a sequence table will quickly grow and consumes quite a lot of bytes in the end. Our reference example, [https://www.pouet.net/prod.php?which=96071 MUZAK58], spends 10 bytes for all it's song data. If we take our knowledge at this point, then we would be able to use those 10 bytes to divide them into a sequence of 40 steps (''S = 40''), as long as the limited range per step given by 4 bits (''M = 2^4 = 16'') is enough for the music composition we have in mind. 40 steps is not less, but the heavily limited range of less than 1 octave will limit us to something that we would very likely call a children's song in the end. The reference muzak instead sounds like being build out of at least a multiple of 32 steps, before it starts to repeat. And the octave range also does not appear to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table in this way?&lt;br /&gt;
&lt;br /&gt;
The trick discovered by wiRe here is to cascade multiple sequencers and combine all their output into a single sequence of much longer sequence duration (before repetition) and of wider pitch range per sequence step: ''s(t) = (s0(t) * s1(t) * s2(t) * ...) / C''&lt;br /&gt;
&lt;br /&gt;
But this will limit the freedom of the composer, you could think now. True! But you will see that the results you can achieve this way are not that bad as you may expect first. Indeed, the resulting limitation can even turn out to give new impulses to creativeness; something we already know as the sizecoding effect.&lt;br /&gt;
&lt;br /&gt;
Here is some attempt to visualize how such an cascaded sequence will develop over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequencer step count. ''O'' is the time divider to derive the step count ''stepcnt = t / O'' with ''O = log2(ticks_per_step)'' to avoid any integer division.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;text&amp;quot;&amp;gt;&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | stpcnt | (t&amp;gt;&amp;gt;O)       | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | seq0ix | (t&amp;gt;&amp;gt;(O+0))%S | 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ... |&lt;br /&gt;
  | seq1ix | (t&amp;gt;&amp;gt;(O+1))%S | 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 ... |&lt;br /&gt;
  | seq2ix | (t&amp;gt;&amp;gt;(O+2))%S | 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 ... |&lt;br /&gt;
  | seq3ix | (t&amp;gt;&amp;gt;(O+3))%S | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 ... |&lt;br /&gt;
  | seq4ix | (t&amp;gt;&amp;gt;(O+4))%S | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
                                                               with S=4&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In combination with our oscillator function, the entire bytebeat will finally look like this: ''f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )''&lt;br /&gt;
&lt;br /&gt;
== Final Bytebeat Implementation ==&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple saw-tooth oscillator, we have an easy game with our oscillator function beeing as simple as ''o(t) = t''. As we figured out, the function ''f(t)'' is then only comprised of ''N+1'' terms, all multiplied together like this: ''f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C''. On each loop iteration of the final bytebeat player, the current sequencer ''sN(t)'' is evaluated by calculating the current sequencer index and looking that up inside the sequencer table. The value stored there for this step is then multiplied to the total result of ''f(t)''. If we keep ''M'' low, then even a 16-bit multiplication is enough here. The final scale factor ''C'' depends on the range of the values derived from the sequencer functions ''sN(t)''. Scaling happens as a shift-right-operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift by 8.&lt;br /&gt;
&lt;br /&gt;
In the reference implementation, a total of 5 cascaded sequencers was used: ''N=5''. Each sequencer's table was chosen to store 4 sequence steps: ''S=4''. Which sequencer step to index is then based on 2-bits of parameter ''t''. The fastest sequencer step time for this song was chosen to be ''2^10'' timer ticks or samples, giving us ''O=10''. This means the lookup index for each sequencer ''i'' with ''0 &amp;lt;= i &amp;lt; N'' is derived by ''(t&amp;gt;&amp;gt;(O+i))%S''. Each step value is limited by ''M=16''. Putting this all together, we can now start composing one song this way:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  static constexpr unsigned O = 10;&lt;br /&gt;
  static constexpr unsigned N = 5;&lt;br /&gt;
  static constexpr unsigned S = 4;&lt;br /&gt;
  static constexpr unsigned C = 256;&lt;br /&gt;
&lt;br /&gt;
  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };&lt;br /&gt;
&lt;br /&gt;
  uint8_t get_next_sample( uint16_t t ) {&lt;br /&gt;
    for( unsigned i = 0; i &amp;lt; N; i++ ) t *= seqtbl[i][(t&amp;gt;&amp;gt;(O+i))%S];&lt;br /&gt;
    return static_cast&amp;lt;uint8_t&amp;gt;(t / C);&lt;br /&gt;
  }&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;v=circles&amp;amp;bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
= The Sourcecode =&lt;br /&gt;
&lt;br /&gt;
With all these parameters being chosen carefully, the final bytebeat implementation and sequence tables will become very small. Here is the commented source code of [https://www.pouet.net/prod.php?which=96071 MUZAK58]:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
           ; MUZAK58 by wiRe/NpM&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
            section .text&lt;br /&gt;
            org     100h&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;muzak sequence table&lt;br /&gt;
  seqtbl:   dw      0x1413                    ;1314       ;  t * [3,1,4,1][3&amp;amp;t&amp;gt;&amp;gt;10]       ;! 1314       adc dx,[si]&lt;br /&gt;
            dw      0x6C66                    ;666C       ;    * [6,6,12,6][3&amp;amp;t&amp;gt;&amp;gt;11]      ;! 666C       o32 insb&lt;br /&gt;
            dw      0x2242                    ;4222       ;    * [2,4,2,2][3&amp;amp;t&amp;gt;&amp;gt;12]       ;! 42         inc dx&lt;br /&gt;
            dw      0x6495                    ;9564       ;    * [5,9,4,6][3&amp;amp;t&amp;gt;&amp;gt;13]       ;! 22956484   and dl,[di-0x7b9c]&lt;br /&gt;
            dw      0x4484                    ;8444       ;    * [4,8,4,4][3&amp;amp;t&amp;gt;&amp;gt;14] &amp;gt;&amp;gt; 8  ;! 44         inc sp&lt;br /&gt;
&lt;br /&gt;
            mov     al, 0b00010000            ;B010       ;write 8253/8254 PIT command/mode register: resets PIT channel #0&lt;br /&gt;
            out     43h, al                   ;E643       ;  [7:6] channel #0, [5:4] LSB only, [3:1] mode0 (one-shot), [0] 16-bit binary&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;present next audio sample (DX:BX = 32-bit sample counter)&lt;br /&gt;
  bbeat:    add     al, 149                   ;04xx       ;  calculate new timer period (AL = 42..148)&lt;br /&gt;
            out     40h, al                   ;E640       ;  rearm timer&lt;br /&gt;
&lt;br /&gt;
            inc     bx                        ;43         ;  increment 16-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            pusha                             ;60         ;  store all registers&lt;br /&gt;
           ;mov     si, seqtbl                ;BExxxx     ;  load address of sequence table into SI (here SI already points to seqtbl by default)&lt;br /&gt;
            mov     dx, bx                    ;89DA       ;  load start value into DX&lt;br /&gt;
            mov     cl, 5                     ;B1xx       ;  init index counter inside CX (CH must be zero already!)&lt;br /&gt;
  bbeat_lp: push    cx                        ;51         ;  store CX counter&lt;br /&gt;
            mov     cl, 01100b                ;B1xx       ;  get bit sequence from time into CL&lt;br /&gt;
            and     cl, bh                    ;20F9       ;    CL := offset to 1 out of 4 entries&lt;br /&gt;
            lodsw                             ;AD         ;  load next sequence table entry (AX := DS:[SI]; SI := SI + 2)&lt;br /&gt;
            ror     ax, cl                    ;D3C8       ;  select sequence entry at bit-offset 0, 4, 8 or 12&lt;br /&gt;
            and     ax, 01111b                ;83E00F     ;  each sequence entry is 4 bits only (AX &amp;amp;= 15)&lt;br /&gt;
            mul     dx                        ;F7E2       ;  multiply (DX:AX := AX ∗ DX)&lt;br /&gt;
            xchg    ax, dx                    ;92         ;    DX := updated 16-bit sample&lt;br /&gt;
            pop     cx                        ;59         ;  restore CX counter&lt;br /&gt;
            shr     bx, 1                     ;D1ED       ;  get next bit sequence from time&lt;br /&gt;
            loop    bbeat_lp                  ;E2xx       ;  loop until all bits are out&lt;br /&gt;
&lt;br /&gt;
            mov     al, dh                    ;88F0       ;  get sample data into AL&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;  load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;  send 8-bit sample data to COVOX device&lt;br /&gt;
            popa                              ;61         ;  restore all registers (especially BX, CX, DX, SI)&lt;br /&gt;
&lt;br /&gt;
  suplp:   ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)&lt;br /&gt;
            in      al, 40h                   ;E440       ;  read low-byte&lt;br /&gt;
            cmp     al, 148                   ;3Cxx       ;  did timer counter overflowed to 149..0FFh?&lt;br /&gt;
            jo      bbeat                     ;71xx       ;    yes -&amp;gt; play&lt;br /&gt;
&lt;br /&gt;
  bbeat_sk: jmp     short suplp               ;75xx       ;  loop forever&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1332</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1332"/>
				<updated>2024-02-15T19:08:41Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. Watch the video [https://www.youtube.com/watch?v=AX5OYQzzi1g here]. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Any commercial use is not permitted.&lt;br /&gt;
&lt;br /&gt;
Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to play a bytebeat through the PC speaker at lower quality, as described here: [[Output#PC_Speaker_variant]]&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases, but very Lo-Fi then. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter&lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between handler and non-interrupt code. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update, which might eat up quite some performance.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
            salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
  suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
  _tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
            jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
            mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
            inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
  ntick:&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
            jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
[http://wiki.osdev.org/Programmable_Interval_Timer Programmable Interval Timer]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This solution may result in the shortest code. But one drawback is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the time represented by the number of the sample, which is also equal to the byte offset inside the stream. It will start generation of the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will play the sample ''f(8000)'' after exactly 1 second. Since this is actually a Softsynth (music synthesis done by software), in theory any sound or music can be aproximated this way. There are no limits except the increasing complexity of the resulting function.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample to be written to the COVOX LPT1. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. [https://www.pouet.net/prod.php?which=96071 MUZAK58] is to some degree a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from sequence tables stored in memory. Changing those words results in an entirely new music being played. Also spending more words for this table will add ore variations to the song, that it will not repeat as fast. The sqeuence table of this reference example is 10 bytes in total and looks like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
  seqtbl:   dw      0x1413&lt;br /&gt;
            dw      0x6C66&lt;br /&gt;
            dw      0x2242&lt;br /&gt;
            dw      0x6495&lt;br /&gt;
            dw      0x4484&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The method used to achieve a size-optimized, but still flexible bytebeat is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequencer ==&lt;br /&gt;
As you can read in many Bytebeat tutorials, like [[Steady_On_Tim]] by Gasman or inside the paper published by viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function ''o(t)'', like saw-tooth, square, triangle or sine waveforms, by multiplying the time parameter ''t'' by a scale factor ''p'': ''f(t) = o(t*p)''. This multiplication factor will modulate the pitch. If we then use a sequence table ''s(t)'' to replace ''p'', that will change the pitch of our oscillator over time, then we can already play some simple melody using this formula: ''f(t) = o(t*s(t))''.&lt;br /&gt;
&lt;br /&gt;
Accordingly, we implement a single pitch-modulated oscillator with saw-tooth waveform:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;255&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9473119236434ffff34df800 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as this should be defined by a sequence table. This example demonstrates a sequence of 8 steps, with ''S=8'' specifying the number of steps. Each step will change the pitch of the resulting saw-tooth waveform.&lt;br /&gt;
&lt;br /&gt;
Replacing the trailing &amp;quot;&amp;amp;255&amp;quot; (implicit for a bytebeat) by &amp;quot;&amp;amp;128&amp;quot; would change the saw-tooth waveform into a square-wave function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;128&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9dc1d9b391be7fffcfb76000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Also other waveforms are possible. Here we use the sine function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  sin(t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8]/14)*127+127&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d000001002f0000000000000000399a4a1a8bae05d329e28520c901366398da262860ce3ea49cc63383ad4015395d56ced153c2b5712a75c831dca7c583fffcb53000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Or distortion-like effects can be applied, like demonstrated here by using the XOR operator in the last step:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])^64&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100240000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024f067719de4f113fffded5400 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of using only one modulated oscillator or one sequence, also 2 or more can be used and combined, like i.e.: ''f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2''&lt;br /&gt;
&lt;br /&gt;
So far, this are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with one sequence table that holds as many steps ''S'' as we need for our composition, or at least as many as we can effort due to size constraints. The more steps ''S'' we spent, the longer the song will durate before it repeats. The larger the value of each sequence step could be, with a value range limited by ''log2(M)'' bits per step, the larger the range of notes we can use in the end. Both parameters ''S'' and ''M'' will define the final byte-size of our sequence table.&lt;br /&gt;
&lt;br /&gt;
== Cascaded Sequences ==&lt;br /&gt;
&lt;br /&gt;
The issue we will face with this approach in size-coding is, that such a sequence table will quickly grow and consumes quite a lot of bytes in the end. Our reference example, [https://www.pouet.net/prod.php?which=96071 MUZAK58], spends 10 bytes for all it's song data. If we take our knowledge at this point, then we would be able to use those 10 bytes to divide them into a sequence of 40 steps (''S = 40''), as long as the limited range per step given by 4 bits (''M = 2^4 = 16'') is enough for the music composition we have in mind. 40 steps is not less, but the heavily limited range of less than 1 octave will limit us to something that we would very likely call a children's song in the end. The reference muzak instead sounds like being build out of at least a multiple of 32 steps, before it starts to repeat. And the octave range also does not appear to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table in this way?&lt;br /&gt;
&lt;br /&gt;
The trick discovered by wiRe here is to cascade multiple sequencers and combine all their output into a single sequence of much longer sequence duration (before repetition) and of wider pitch range per sequence step: ''s(t) = (s0(t) * s1(t) * s2(t) * ...) / C''&lt;br /&gt;
&lt;br /&gt;
But this will limit the freedom of the composer, you could think now. True! But you will see that the results you can achieve this way are not that bad as you may expect first. Indeed, the resulting limitation can even turn out to give new impulses to creativeness; something we already know as the sizecoding effect.&lt;br /&gt;
&lt;br /&gt;
Here is some attempt to visualize how such an cascaded sequence will develop over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequencer step count. ''O'' is the time divider to derive the step count ''stepcnt = t / O'' with ''O = log2(ticks_per_step)'' to avoid any integer division.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;text&amp;quot;&amp;gt;&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | stpcnt | (t&amp;gt;&amp;gt;O)       | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | seq0ix | (t&amp;gt;&amp;gt;(O+0))%S | 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ... |&lt;br /&gt;
  | seq1ix | (t&amp;gt;&amp;gt;(O+1))%S | 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 ... |&lt;br /&gt;
  | seq2ix | (t&amp;gt;&amp;gt;(O+2))%S | 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 ... |&lt;br /&gt;
  | seq3ix | (t&amp;gt;&amp;gt;(O+3))%S | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 ... |&lt;br /&gt;
  | seq4ix | (t&amp;gt;&amp;gt;(O+4))%S | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In combination with our oscillator function, the entire bytebeat will finally look like this: ''f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )''&lt;br /&gt;
&lt;br /&gt;
== Final Bytebeat Implementation ==&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple saw-tooth oscillator, we have an easy game with our oscillator function beeing as simple as ''o(t) = t''. As we figured out, the function ''f(t)'' is then only comprised of ''N+1'' terms, all multiplied together like this: ''f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C''. On each loop iteration of the final bytebeat player, the current sequencer ''sN(t)'' is evaluated by calculating the current sequencer index and looking that up inside the sequencer table. The value stored there for this step is then multiplied to the total result of ''f(t)''. If we keep ''M'' low, then even a 16-bit multiplication is enough here. The final scale factor ''C'' depends on the range of the values derived from the sequencer functions ''sN(t)''. Scaling happens as a shift-right-operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift by 8.&lt;br /&gt;
&lt;br /&gt;
In the reference implementation, a total of 5 cascaded sequencers was used: ''N=5''. Each sequencer's table was chosen to store 4 sequence steps: ''S=4''. Which sequencer step to index is then based on 2-bits of parameter ''t''. The fastest sequencer step time for this song was chosen to be ''2^10'' timer ticks or samples, giving us ''O=10''. This means the lookup index for each sequencer ''i'' with ''0 &amp;lt;= i &amp;lt; N'' is derived by ''(t&amp;gt;&amp;gt;(O+i))%S''. Each step value is limited by ''M=16''. Putting this all together, we can now start composing one song this way:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  static constexpr unsigned O = 10;&lt;br /&gt;
  static constexpr unsigned N = 5;&lt;br /&gt;
  static constexpr unsigned S = 4;&lt;br /&gt;
  static constexpr unsigned C = 256;&lt;br /&gt;
&lt;br /&gt;
  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };&lt;br /&gt;
&lt;br /&gt;
  uint8_t get_next_sample( uint16_t t ) {&lt;br /&gt;
    for( unsigned i = 0; i &amp;lt; N; i++ ) t *= seqtbl[i][(t&amp;gt;&amp;gt;(O+i))%S];&lt;br /&gt;
    return static_cast&amp;lt;uint8_t&amp;gt;(t / C);&lt;br /&gt;
  }&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;v=circles&amp;amp;bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
= The Sourcecode =&lt;br /&gt;
&lt;br /&gt;
With all these parameters being chosen carefully, the final bytebeat implementation and sequence tables will become very small. Here is the documented source code of [https://www.pouet.net/prod.php?which=96071 MUZAK58]:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
           ; MUZAK58 by wiRe/NpM&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
            section .text&lt;br /&gt;
            org     100h&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;muzak sequence table&lt;br /&gt;
  seqtbl:   dw      0x1413                    ;1314       ;  t * [3,1,4,1][3&amp;amp;t&amp;gt;&amp;gt;10]       ;! 1314       adc dx,[si]&lt;br /&gt;
            dw      0x6C66                    ;666C       ;    * [6,6,12,6][3&amp;amp;t&amp;gt;&amp;gt;11]      ;! 666C       o32 insb&lt;br /&gt;
            dw      0x2242                    ;4222       ;    * [2,4,2,2][3&amp;amp;t&amp;gt;&amp;gt;12]       ;! 42         inc dx&lt;br /&gt;
            dw      0x6495                    ;9564       ;    * [5,9,4,6][3&amp;amp;t&amp;gt;&amp;gt;13]       ;! 22956484   and dl,[di-0x7b9c]&lt;br /&gt;
            dw      0x4484                    ;8444       ;    * [4,8,4,4][3&amp;amp;t&amp;gt;&amp;gt;14] &amp;gt;&amp;gt; 8  ;! 44         inc sp&lt;br /&gt;
&lt;br /&gt;
            mov     al, 0b00010000            ;B010       ;write 8253/8254 PIT command/mode register: resets PIT channel #0&lt;br /&gt;
            out     43h, al                   ;E643       ;  [7:6] channel #0, [5:4] LSB only, [3:1] mode0 (one-shot), [0] 16-bit binary&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;present next audio sample (DX:BX = 32-bit sample counter)&lt;br /&gt;
  bbeat:    add     al, 149                   ;04xx       ;  calculate new timer period (AL = 42..148)&lt;br /&gt;
            out     40h, al                   ;E640       ;  rearm timer&lt;br /&gt;
&lt;br /&gt;
            inc     bx                        ;43         ;  increment 16-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            pusha                             ;60         ;  store all registers&lt;br /&gt;
           ;mov     si, seqtbl                ;BExxxx     ;  load address of sequence table into SI (here SI already points to seqtbl by default)&lt;br /&gt;
            mov     dx, bx                    ;89DA       ;  load start value into DX&lt;br /&gt;
            mov     cl, 5                     ;B1xx       ;  init index counter inside CX (CH must be zero already!)&lt;br /&gt;
  bbeat_lp: push    cx                        ;51         ;  store CX counter&lt;br /&gt;
            mov     cl, 01100b                ;B1xx       ;  get bit sequence from time into CL&lt;br /&gt;
            and     cl, bh                    ;20F9       ;    CL := offset to 1 out of 4 entries&lt;br /&gt;
            lodsw                             ;AD         ;  load next sequence table entry (AX := DS:[SI]; SI := SI + 2)&lt;br /&gt;
            ror     ax, cl                    ;D3C8       ;  select sequence entry at bit-offset 0, 4, 8 or 12&lt;br /&gt;
            and     ax, 01111b                ;83E00F     ;  each sequence entry is 4 bits only (AX &amp;amp;= 15)&lt;br /&gt;
            mul     dx                        ;F7E2       ;  multiply (DX:AX := AX ∗ DX)&lt;br /&gt;
            xchg    ax, dx                    ;92         ;    DX := updated 16-bit sample&lt;br /&gt;
            pop     cx                        ;59         ;  restore CX counter&lt;br /&gt;
            shr     bx, 1                     ;D1ED       ;  get next bit sequence from time&lt;br /&gt;
            loop    bbeat_lp                  ;E2xx       ;  loop until all bits are out&lt;br /&gt;
&lt;br /&gt;
            mov     al, dh                    ;88F0       ;  get sample data into AL&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;  load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;  send 8-bit sample data to COVOX device&lt;br /&gt;
            popa                              ;61         ;  restore all registers (especially BX, CX, DX, SI)&lt;br /&gt;
&lt;br /&gt;
  suplp:   ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)&lt;br /&gt;
            in      al, 40h                   ;E440       ;  read low-byte&lt;br /&gt;
            cmp     al, 148                   ;3Cxx       ;  did timer counter overflowed to 149..0FFh?&lt;br /&gt;
            jo      bbeat                     ;71xx       ;    yes -&amp;gt; play&lt;br /&gt;
&lt;br /&gt;
  bbeat_sk: jmp     short suplp               ;75xx       ;  loop forever&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=General_Coding_Tricks&amp;diff=1331</id>
		<title>General Coding Tricks</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=General_Coding_Tricks&amp;diff=1331"/>
				<updated>2024-02-15T18:44:18Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Need a constant? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Data is code, code is data ==&lt;br /&gt;
&lt;br /&gt;
Code is nothing more than data that the CPU interprets.  For example, consider this multi-byte instruction:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov ah,37h&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This assembles to &amp;lt;code&amp;gt;B4 37&amp;lt;/code&amp;gt;.  &amp;lt;code&amp;gt;B4&amp;lt;/code&amp;gt; by itself isn't interesting, but &amp;lt;code&amp;gt;37&amp;lt;/code&amp;gt; happens to be the opcode for &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt;.  Let's say you had this code before a loop, and you needed to perform &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt; at the top of a loop.  Rather than put AAS at the top of the loop, you can reuse the opcode that will already be there as part of the &amp;lt;code&amp;gt;mov ah,37&amp;lt;/code&amp;gt; that comes before it.  Just jump directly into the middle of the &amp;lt;code&amp;gt;mov ah,37h&amp;lt;/code&amp;gt;, which will get interpreted and executed as &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt;:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
&lt;br /&gt;
label:&lt;br /&gt;
        mov ah,37h&lt;br /&gt;
        ;misc. stuff&lt;br /&gt;
        loop label+1&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;+1&amp;lt;/code&amp;gt; specifies the jump should go to 1 byte past the actual location.&lt;br /&gt;
&lt;br /&gt;
=== Reuse ===&lt;br /&gt;
&lt;br /&gt;
You can use opcodes hidden in your existing data.  For example, .COM files can end with &amp;lt;code&amp;gt;RET&amp;lt;/code&amp;gt;, which is opcode &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt;.  If you already have a &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt; somewhere else in your code, even as part of data, just &amp;lt;code&amp;gt;JMP&amp;lt;/code&amp;gt; to that pre-existing &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt; instead of adding a &amp;lt;code&amp;gt;RET&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== If your environment holds you back, change it ==&lt;br /&gt;
&lt;br /&gt;
The [https://en.wikipedia.org/wiki/Video_Graphics_Array#/media/File:VGA_palette_with_black_borders.svg default MCGA palette] is fairly horrible, but can be size advantages to changing it:  While setting a new palette costs bytes, the new palette arrangement could save you headaches down the road.  For example, if your code is calculating pixel colors that fall into goofy ranges, rather than constantly adjust the colors to sane ranges (ie. aligned to powers of 2), just set the palette so that values falling into those ranges look the way you want.  (This assumes you have very small ways of redefining the palette, of course.)&lt;br /&gt;
&lt;br /&gt;
The above is maybe not the best example.  Rewrites to this section are welcome.&lt;br /&gt;
&lt;br /&gt;
== Need a constant? ==&lt;br /&gt;
&lt;br /&gt;
If you need a constant value but you're out of space, search your assembled code for a byte value you can use.&lt;br /&gt;
&lt;br /&gt;
If you need more than a byte, then another method could be to create some '''literal pool''' in memory which can be addressed for constants. This technique was first used inside &lt;br /&gt;
[https://www.pouet.net/prod.php?which=94080 TERRA256] (please update if there are earlier examples) and is especially useful if [[Floating-point_Opcodes]] are used, since those can not address byte integer constants in memory but only word and dword. Different to byte constants, it is less likely to find word or dword constants inside the opcodes where hi- or lo-byte must be zero. Chances are higher if the lower byte of the searched constant word does not matter that much and can be ignored. Else, the literal pool is a very nice technique and simplifies the search for constants a lot.&lt;br /&gt;
&lt;br /&gt;
The construction of the literal pool can look like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov cx, 255 ;very likely this can be replaced by a shorter version or even skipped; value of CX can also be larger than 255&lt;br /&gt;
initlp: ;very likely there are other things to initialize, like i.e. setting the DAC color palette&lt;br /&gt;
        push cx ;push 16-bit word to the stack with the current counter value&lt;br /&gt;
        loop initlp&lt;br /&gt;
        push cx ;optionally push final zero constant, if required&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
This example will construct a literal pool of 256 words on the stack, starting with 0x00FF and ending with 0x0000. The additional &amp;quot;push cx&amp;quot; after the loop will add a final zero constant, if required. The final zero also allows to exit from a COM executable using the &amp;quot;ret&amp;quot; instruction. &lt;br /&gt;
&lt;br /&gt;
Practically almost any existing initialization loop can be used to setup such a pool, why this construction in the best case only costs a single additional byte for the &amp;quot;push cx&amp;quot; instruction.&lt;br /&gt;
&lt;br /&gt;
To use the literal pool, some index register like SI, DI, BX or BP should be initialized to point to the literal pool. The init value can be the content of the stack pointer (SP) but also a fixed offset, since the position of the literal pool and the current stack position is typically well known. Also the initial value of DI (0xFFFE) or a zero offset may work. Accessing the literal pool can look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov bx, sp ;point index register to literal pool&lt;br /&gt;
        fild word [bx+0x13*2] ;load integer value 0x0013 into FPU register&lt;br /&gt;
        fild word [bx+0x31*2] ;load integer value 0x0031 into FPU register&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To address larger values, some further tricks can be used to keep size low:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        fild word [bx+0x087*2]  ;take care that this instruction is 1 byte larger than the others&lt;br /&gt;
        fild word [bx+si+0x087*2-0x100] ;but this trick could help: load integer value 0x0087 into FPU register, assuming SI is 0x0100&lt;br /&gt;
        fild word [bx+0x03*2-1] ;load integer value 0x0300 into FPU register&lt;br /&gt;
        fidiv dword [bx+0x05*2-3] ;divide by integer value 0x05000400&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== A smaller way to point to Mode 13's screen segment ==&lt;br /&gt;
&lt;br /&gt;
Rather than &amp;lt;code&amp;gt;mov ah,0a0h; mov es,ax&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;push word 0a000h; pop es&amp;lt;/code&amp;gt;, try this 2-byte wonder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
les bp,[bx]&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This sets ES=9FFF, only one away from A000.  You can write to the screen with ES: this way as long as you are aware the segment is one paragraph (16 bytes) behind, so just increase your offset by 16 if you need exact placement.&lt;br /&gt;
&lt;br /&gt;
How does this work?   At start of execution of a .COM file, BX=0, and DS=CS. The contents of the COM file get loaded to offset 0x100 in that segment, but loaded before that is the PSP (program segment prefix), which DOS populates with information about the loaded program and other info. The PSP starts with &amp;lt;code&amp;gt;CD 20&amp;lt;/code&amp;gt; (&amp;lt;code&amp;gt;INT 20&amp;lt;/code&amp;gt;, which exits the program), so that's what gets loaded into BP.  The next word is the number of the last free conventional memory segment, typically 0x9fff (but can be something different if parts of the upper memory range are either not installed or allocated).&lt;br /&gt;
&lt;br /&gt;
'''Warning:''' This trick doesn't always work.  On FreeDos, this can set ES=9FE0 and there is something resident at that location that can screw up the system after normal program exit if you overwrite it.&lt;br /&gt;
&lt;br /&gt;
== Accessing the timer tick for free ==&lt;br /&gt;
&lt;br /&gt;
If using a 386+, FS=0 at .COM start.  So, &amp;lt;code&amp;gt;FS:[046C]&amp;lt;/code&amp;gt; gets you the DOS timer tick variable, which you could use for timing/pacing, or a random seed. Some environments, especially EMS/XMS programs, can modify the FS register, so it can't always assumed to be 0000h. &amp;lt;code&amp;gt;POP DS&amp;lt;/code&amp;gt; right after the start and accessing &amp;lt;code&amp;gt;[046C]&amp;lt;/code&amp;gt; does the trick then with equal size.&lt;br /&gt;
&lt;br /&gt;
== Looping twice ==&lt;br /&gt;
&lt;br /&gt;
If you need to repeat a section of code that doesn't modify the carry flag, and you know the carry flag is clear, you can loop once in only 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
looping:&lt;br /&gt;
        ;do stuff here&lt;br /&gt;
        cmc&lt;br /&gt;
        jc      looping&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Looping three times ==&lt;br /&gt;
&lt;br /&gt;
If you need to repeat a section of code and you have a register whose value is zero and can be incremented freely, or whose value is -1 and can be decremented freely, you can loop twice in only 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
looping:&lt;br /&gt;
        ;do stuff here&lt;br /&gt;
        inc     bx ;if decrementing instead, parity check must be reversed&lt;br /&gt;
        jpo     looping ;1 (01b) and 2 (10b) have odd parity, 3 (11b) has even parity&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Obtaining X and Y without DIV (The Rrrola Trick) ==&lt;br /&gt;
&lt;br /&gt;
In [[Output#Outputting_in_mode_13h_.28320x200.29|320x200 mode]], instead of constructing X and Y from the screen pointer DI with DIV, you can get a decent estimation with multiplying the screen pointer with 0xCCCD and read X and Y from the 8bit registers DH (+DL as 16bit value) and DL (+AH as 16bit value). The idea is to interpret DI as a kind of 16 bit fixed point in the range [0,1], from start to end. Multiplying this number in [0,1] with 65536 / 320 = 204,8 results in the row before the comma, and again as a kind of a fixed point, the column after the comma. The representation 0xCCCD is the nearest rounding of 204,8 * 256 ( = 52428,8 ~ 52429 = 0xCCCD). As long as the 16 bit representations are used, there is no precision loss.&lt;br /&gt;
&lt;br /&gt;
This is adapted from [http://www.pouet.net/prod.php?which=53816 &amp;quot;Puls&amp;quot; by Rrrola] where X and Y are directly modified on the stack by performing &amp;lt;code&amp;gt;add dword[di],0000CCCDh&amp;lt;/code&amp;gt; on each pixel iteration, which requires 7 bytes of code. The vertical alignment correction is solved with a good starting value on said DWORD on the stack before each frame, which requires 2 additional bytes. Both approaches are too different to directly compare, but share the core idea of [http://www.pouet.net/topic.php?which=8791&amp;amp;page=8#c411796 multiplying with &amp;lt;code&amp;gt;0xCCCD&amp;lt;/code&amp;gt;], so &amp;quot;Rrrolas trick&amp;quot; is an appropriate term to use.&lt;br /&gt;
&lt;br /&gt;
=== Alternative explanation by [https://news.ycombinator.com/user?id=pjc50 pjc50]===&lt;br /&gt;
[https://gistpreview.github.io/?9b252f267cd1fdf9754059bb73a18487 Interactive snippet]&lt;br /&gt;
More clearly: DI = (y * 320) + x&lt;br /&gt;
&lt;br /&gt;
Multiply by 0xCCCD =&amp;gt; (y * 0x1000040) + (x * 0xcccd)&lt;br /&gt;
&lt;br /&gt;
Take top byte is equivalent to divide by 0x1000000. So that gives you Y.&lt;br /&gt;
The next lower (third) byte is then (x * 0xcccd / 0x10000) == (x * 52429 / 65536) =~ (x * 256/320).&lt;br /&gt;
And the lower two bytes are noise.&lt;br /&gt;
&lt;br /&gt;
== Use the entire register for a smaller opcode form ==&lt;br /&gt;
&lt;br /&gt;
As you know e.g. &amp;lt;code&amp;gt;add cl,1&amp;lt;/code&amp;gt; produces 3 Bytes of code while &amp;lt;code&amp;gt;inc cl&amp;lt;/code&amp;gt; compiles to 2 Bytes. If ch does not matter (or you know that it won't be affected) use &amp;lt;code&amp;gt;inc cx&amp;lt;/code&amp;gt; instead and get the most out of that 1 Byte. This is no real trick but sometimes such things can be overlooked - while the 2 saved Bytes could be invested wisely.&lt;br /&gt;
&lt;br /&gt;
== Use the carry flag in your calculations ==&lt;br /&gt;
&lt;br /&gt;
Let's say you have to &amp;lt;code&amp;gt;add si,128&amp;lt;/code&amp;gt;. Unfortunately this takes 1 Byte more than &amp;lt;code&amp;gt;add si,127&amp;lt;/code&amp;gt;. But you can add 128 without that extra Byte. If your previous code sets the carry flag simply include it into your calculation and &amp;lt;code&amp;gt;adc si,127&amp;lt;/code&amp;gt;. The same goes for &amp;lt;code&amp;gt;sub si,128&amp;lt;/code&amp;gt; vs &amp;lt;code&amp;gt;sbb si,127&amp;lt;/code&amp;gt;.&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=General_Coding_Tricks&amp;diff=1330</id>
		<title>General Coding Tricks</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=General_Coding_Tricks&amp;diff=1330"/>
				<updated>2024-02-15T18:32:34Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Need a constant? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Data is code, code is data ==&lt;br /&gt;
&lt;br /&gt;
Code is nothing more than data that the CPU interprets.  For example, consider this multi-byte instruction:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov ah,37h&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This assembles to &amp;lt;code&amp;gt;B4 37&amp;lt;/code&amp;gt;.  &amp;lt;code&amp;gt;B4&amp;lt;/code&amp;gt; by itself isn't interesting, but &amp;lt;code&amp;gt;37&amp;lt;/code&amp;gt; happens to be the opcode for &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt;.  Let's say you had this code before a loop, and you needed to perform &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt; at the top of a loop.  Rather than put AAS at the top of the loop, you can reuse the opcode that will already be there as part of the &amp;lt;code&amp;gt;mov ah,37&amp;lt;/code&amp;gt; that comes before it.  Just jump directly into the middle of the &amp;lt;code&amp;gt;mov ah,37h&amp;lt;/code&amp;gt;, which will get interpreted and executed as &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt;:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
&lt;br /&gt;
label:&lt;br /&gt;
        mov ah,37h&lt;br /&gt;
        ;misc. stuff&lt;br /&gt;
        loop label+1&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;+1&amp;lt;/code&amp;gt; specifies the jump should go to 1 byte past the actual location.&lt;br /&gt;
&lt;br /&gt;
=== Reuse ===&lt;br /&gt;
&lt;br /&gt;
You can use opcodes hidden in your existing data.  For example, .COM files can end with &amp;lt;code&amp;gt;RET&amp;lt;/code&amp;gt;, which is opcode &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt;.  If you already have a &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt; somewhere else in your code, even as part of data, just &amp;lt;code&amp;gt;JMP&amp;lt;/code&amp;gt; to that pre-existing &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt; instead of adding a &amp;lt;code&amp;gt;RET&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== If your environment holds you back, change it ==&lt;br /&gt;
&lt;br /&gt;
The [https://en.wikipedia.org/wiki/Video_Graphics_Array#/media/File:VGA_palette_with_black_borders.svg default MCGA palette] is fairly horrible, but can be size advantages to changing it:  While setting a new palette costs bytes, the new palette arrangement could save you headaches down the road.  For example, if your code is calculating pixel colors that fall into goofy ranges, rather than constantly adjust the colors to sane ranges (ie. aligned to powers of 2), just set the palette so that values falling into those ranges look the way you want.  (This assumes you have very small ways of redefining the palette, of course.)&lt;br /&gt;
&lt;br /&gt;
The above is maybe not the best example.  Rewrites to this section are welcome.&lt;br /&gt;
&lt;br /&gt;
== Need a constant? ==&lt;br /&gt;
&lt;br /&gt;
If you need a constant value but you're out of space, search your assembled code for a byte value you can use.&lt;br /&gt;
&lt;br /&gt;
If you need more than a byte, then another method could be to create some '''literal pool''' in memory which can be addressed for constants. This technique was first used inside &lt;br /&gt;
[https://www.pouet.net/prod.php?which=94080 TERRA256] (please update if there are earlier examples) and is especially useful if [[Floating-point_Opcodes]] are used, since those can not address byte integer constants in memory but only word and dword. Different to byte constants, it is very unlikely to find word or dword constants inside the opcodes.&lt;br /&gt;
&lt;br /&gt;
The construction of the literal pool can look like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov cx, 255 ;very likely this can be replaced by a shorter version or even skipped; value of CX can also be larger than 255&lt;br /&gt;
initlp: ;very likely there are other things to initialize, like i.e. setting the DAC color palette&lt;br /&gt;
        push cx ;push 16-bit word to the stack with the current counter value&lt;br /&gt;
        loop initlp&lt;br /&gt;
        push cx ;optionally push final zero constant, if required&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
This example will construct a literal pool of 256 words on the stack, starting with 0x00FF and ending with 0x0000. The additional &amp;quot;push cx&amp;quot; after the loop will add a final zero constant, if required. The final zero also allows to exit from a COM executable using the &amp;quot;ret&amp;quot; instruction. &lt;br /&gt;
&lt;br /&gt;
Practically almost any existing initialization loop can be used to setup such a pool, why this construction in the best case only costs a single additional byte for the &amp;quot;push cx&amp;quot; instruction.&lt;br /&gt;
&lt;br /&gt;
To use the literal pool, some index register like SI, DI, BX or BP should be initialized to point to the literal pool. The init value can be the content of the stack pointer (SP) but also a fixed offset, since the position of the literal pool and the current stack position is typically well known. Also the initial value of DI (0xFFFE) or a zero offset may work. Accessing the literal pool can look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov bx, sp ;point index register to literal pool&lt;br /&gt;
        fild word [bx+0x13*2] ;load integer value 0x0013 into FPU register&lt;br /&gt;
        fild word [bx+0x31*2] ;load integer value 0x0031 into FPU register&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To address larger values, some further tricks can be used to keep size low:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        fild word [bx+0x087*2]  ;take care that this instruction is 1 byte larger than the others&lt;br /&gt;
        fild word [bx+si+0x087*2-0x100] ;but this trick could help: load integer value 0x0087 into FPU register, assuming SI is 0x0100&lt;br /&gt;
        fild word [bx+0x03*2-1] ;load integer value 0x0300 into FPU register&lt;br /&gt;
        fidiv dword [bx+0x05*2-3] ;divide by integer value 0x05000400&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== A smaller way to point to Mode 13's screen segment ==&lt;br /&gt;
&lt;br /&gt;
Rather than &amp;lt;code&amp;gt;mov ah,0a0h; mov es,ax&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;push word 0a000h; pop es&amp;lt;/code&amp;gt;, try this 2-byte wonder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
les bp,[bx]&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This sets ES=9FFF, only one away from A000.  You can write to the screen with ES: this way as long as you are aware the segment is one paragraph (16 bytes) behind, so just increase your offset by 16 if you need exact placement.&lt;br /&gt;
&lt;br /&gt;
How does this work?   At start of execution of a .COM file, BX=0, and DS=CS. The contents of the COM file get loaded to offset 0x100 in that segment, but loaded before that is the PSP (program segment prefix), which DOS populates with information about the loaded program and other info. The PSP starts with &amp;lt;code&amp;gt;CD 20&amp;lt;/code&amp;gt; (&amp;lt;code&amp;gt;INT 20&amp;lt;/code&amp;gt;, which exits the program), so that's what gets loaded into BP.  The next word is the number of the last free conventional memory segment, typically 0x9fff (but can be something different if parts of the upper memory range are either not installed or allocated).&lt;br /&gt;
&lt;br /&gt;
'''Warning:''' This trick doesn't always work.  On FreeDos, this can set ES=9FE0 and there is something resident at that location that can screw up the system after normal program exit if you overwrite it.&lt;br /&gt;
&lt;br /&gt;
== Accessing the timer tick for free ==&lt;br /&gt;
&lt;br /&gt;
If using a 386+, FS=0 at .COM start.  So, &amp;lt;code&amp;gt;FS:[046C]&amp;lt;/code&amp;gt; gets you the DOS timer tick variable, which you could use for timing/pacing, or a random seed. Some environments, especially EMS/XMS programs, can modify the FS register, so it can't always assumed to be 0000h. &amp;lt;code&amp;gt;POP DS&amp;lt;/code&amp;gt; right after the start and accessing &amp;lt;code&amp;gt;[046C]&amp;lt;/code&amp;gt; does the trick then with equal size.&lt;br /&gt;
&lt;br /&gt;
== Looping twice ==&lt;br /&gt;
&lt;br /&gt;
If you need to repeat a section of code that doesn't modify the carry flag, and you know the carry flag is clear, you can loop once in only 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
looping:&lt;br /&gt;
        ;do stuff here&lt;br /&gt;
        cmc&lt;br /&gt;
        jc      looping&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Looping three times ==&lt;br /&gt;
&lt;br /&gt;
If you need to repeat a section of code and you have a register whose value is zero and can be incremented freely, or whose value is -1 and can be decremented freely, you can loop twice in only 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
looping:&lt;br /&gt;
        ;do stuff here&lt;br /&gt;
        inc     bx ;if decrementing instead, parity check must be reversed&lt;br /&gt;
        jpo     looping ;1 (01b) and 2 (10b) have odd parity, 3 (11b) has even parity&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Obtaining X and Y without DIV (The Rrrola Trick) ==&lt;br /&gt;
&lt;br /&gt;
In [[Output#Outputting_in_mode_13h_.28320x200.29|320x200 mode]], instead of constructing X and Y from the screen pointer DI with DIV, you can get a decent estimation with multiplying the screen pointer with 0xCCCD and read X and Y from the 8bit registers DH (+DL as 16bit value) and DL (+AH as 16bit value). The idea is to interpret DI as a kind of 16 bit fixed point in the range [0,1], from start to end. Multiplying this number in [0,1] with 65536 / 320 = 204,8 results in the row before the comma, and again as a kind of a fixed point, the column after the comma. The representation 0xCCCD is the nearest rounding of 204,8 * 256 ( = 52428,8 ~ 52429 = 0xCCCD). As long as the 16 bit representations are used, there is no precision loss.&lt;br /&gt;
&lt;br /&gt;
This is adapted from [http://www.pouet.net/prod.php?which=53816 &amp;quot;Puls&amp;quot; by Rrrola] where X and Y are directly modified on the stack by performing &amp;lt;code&amp;gt;add dword[di],0000CCCDh&amp;lt;/code&amp;gt; on each pixel iteration, which requires 7 bytes of code. The vertical alignment correction is solved with a good starting value on said DWORD on the stack before each frame, which requires 2 additional bytes. Both approaches are too different to directly compare, but share the core idea of [http://www.pouet.net/topic.php?which=8791&amp;amp;page=8#c411796 multiplying with &amp;lt;code&amp;gt;0xCCCD&amp;lt;/code&amp;gt;], so &amp;quot;Rrrolas trick&amp;quot; is an appropriate term to use.&lt;br /&gt;
&lt;br /&gt;
=== Alternative explanation by [https://news.ycombinator.com/user?id=pjc50 pjc50]===&lt;br /&gt;
[https://gistpreview.github.io/?9b252f267cd1fdf9754059bb73a18487 Interactive snippet]&lt;br /&gt;
More clearly: DI = (y * 320) + x&lt;br /&gt;
&lt;br /&gt;
Multiply by 0xCCCD =&amp;gt; (y * 0x1000040) + (x * 0xcccd)&lt;br /&gt;
&lt;br /&gt;
Take top byte is equivalent to divide by 0x1000000. So that gives you Y.&lt;br /&gt;
The next lower (third) byte is then (x * 0xcccd / 0x10000) == (x * 52429 / 65536) =~ (x * 256/320).&lt;br /&gt;
And the lower two bytes are noise.&lt;br /&gt;
&lt;br /&gt;
== Use the entire register for a smaller opcode form ==&lt;br /&gt;
&lt;br /&gt;
As you know e.g. &amp;lt;code&amp;gt;add cl,1&amp;lt;/code&amp;gt; produces 3 Bytes of code while &amp;lt;code&amp;gt;inc cl&amp;lt;/code&amp;gt; compiles to 2 Bytes. If ch does not matter (or you know that it won't be affected) use &amp;lt;code&amp;gt;inc cx&amp;lt;/code&amp;gt; instead and get the most out of that 1 Byte. This is no real trick but sometimes such things can be overlooked - while the 2 saved Bytes could be invested wisely.&lt;br /&gt;
&lt;br /&gt;
== Use the carry flag in your calculations ==&lt;br /&gt;
&lt;br /&gt;
Let's say you have to &amp;lt;code&amp;gt;add si,128&amp;lt;/code&amp;gt;. Unfortunately this takes 1 Byte more than &amp;lt;code&amp;gt;add si,127&amp;lt;/code&amp;gt;. But you can add 128 without that extra Byte. If your previous code sets the carry flag simply include it into your calculation and &amp;lt;code&amp;gt;adc si,127&amp;lt;/code&amp;gt;. The same goes for &amp;lt;code&amp;gt;sub si,128&amp;lt;/code&amp;gt; vs &amp;lt;code&amp;gt;sbb si,127&amp;lt;/code&amp;gt;.&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Output&amp;diff=1329</id>
		<title>Output</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Output&amp;diff=1329"/>
				<updated>2024-02-15T17:43:25Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* COVOX output (aka LPT DAC) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Outputting to the screen ==&lt;br /&gt;
&lt;br /&gt;
First, be aware of the [http://img.tfd.com/cde/MEMMAP.GIF MSDOS memory layout]&lt;br /&gt;
&lt;br /&gt;
=== Outputting in Textmode (80x25) ===&lt;br /&gt;
&lt;br /&gt;
==== Hello World / High Level function ====&lt;br /&gt;
&lt;br /&gt;
Here's an obligatory &amp;quot;Hello World&amp;quot; program in text mode, using a [http://www.ctyme.com/intr/rb-2562.htm &amp;quot;high level&amp;quot; MS-DOS function]. With a small optimization already included (using &amp;lt;code&amp;gt;XCHG BP,AX&amp;lt;/code&amp;gt; instead of &amp;lt;code&amp;gt;MOV AH,09h&amp;lt;/code&amp;gt;), this snippet is 20 bytes in size.&lt;br /&gt;
&lt;br /&gt;
[[File:Hello world.png|thumb|Hello World!]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt; &lt;br /&gt;
org 100h			; we start at CS:100h&lt;br /&gt;
xchg 	bp,ax		; already a trick, puts 09h into AH&lt;br /&gt;
mov		dx,text		; DX expects the adress of a $ terminated string&lt;br /&gt;
int 	21h			; call the DOS function (AH = 09h)&lt;br /&gt;
ret					; quit&lt;br /&gt;
text:&lt;br /&gt;
db 'Hello World!$'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Of course, this gets shorter with each byte you remove from the text itself. Now let's look into arbitrary screen access. Right after the start of your program you are in mode 3, that is 80x25 in 16 colors. See the [http://www.columbia.edu/~em36/wpdos/videomodes.txt Video Modes List] [[File:Drawchar example.png|thumb|draw char example]] So, to show something on the screen, you would need to set a segment register to 0xB800, then write values into this segment. &lt;br /&gt;
&lt;br /&gt;
==== Low level access ====&lt;br /&gt;
&lt;br /&gt;
The following three snippets showcase how to draw a red smiley in three different ways. All example snippets are meant to be standalone programs, starting with the first instruction and nothing before it. The target coordinate (40,12) is about the middle of the screen. We need a multiplier 2 since one char needs two bytes in memory (char and color is a byte each). The high byte 0x04 means red (4) on black (0) while the 0x01 is the first ASCII char - a smiley.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;push 0xb800&lt;br /&gt;
pop ds&lt;br /&gt;
mov bx,(80*12+40)*2&lt;br /&gt;
mov ax, 0x0401&lt;br /&gt;
mov [bx],ax&lt;br /&gt;
ret&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;push 0xb800&lt;br /&gt;
pop es&lt;br /&gt;
mov di,(80*12+40)*2&lt;br /&gt;
mov ax, 0x0401&lt;br /&gt;
stosw&lt;br /&gt;
ret&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;push ss&lt;br /&gt;
push 0xb800&lt;br /&gt;
pop ss&lt;br /&gt;
mov sp,(80*12+40)*2&lt;br /&gt;
mov ax, 0x0401&lt;br /&gt;
push ax&lt;br /&gt;
pop ss&lt;br /&gt;
int 0x20&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You might notice that the ''push &amp;lt;word&amp;gt;'' + ''pop seg_reg'' combination is always the same and occupies four bytes alltogether. If correct alignment is not important to you and you really just want ''any'' pointer to the screen, there is another way to get a valid one:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt; &lt;br /&gt;
les bx,[si]&lt;br /&gt;
nop&lt;br /&gt;
stosb&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
That's also four bytes, but it already has the &amp;lt;code&amp;gt;stosb&amp;lt;/code&amp;gt; opcode (for putting something onto the screen) integrated and even one slot free for another one-byte-instruction. It works because &amp;lt;code&amp;gt;SI&amp;lt;/code&amp;gt; initially points to the start of our code, and &amp;lt;code&amp;gt;stosb&amp;lt;/code&amp;gt; has the hexadecimal representation of &amp;lt;code&amp;gt;0AAh&amp;lt;/code&amp;gt;. After the first command, the segment register &amp;lt;code&amp;gt;ES&amp;lt;/code&amp;gt; contains the value &amp;lt;code&amp;gt;0AA90h&amp;lt;/code&amp;gt;. If you repeatedly write something to the screen with &amp;lt;code&amp;gt;stosb&amp;lt;/code&amp;gt; you will eventually reach the &amp;lt;code&amp;gt;0B800h&amp;lt;/code&amp;gt; segment and chars will appear on the screen. With a careful selection of the free one-byte-opcode you can also reintroduce some alignment. This works also with the &amp;lt;code&amp;gt;stosw&amp;lt;/code&amp;gt; opcode &amp;lt;code&amp;gt;0ABh&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Alternative high level functions  ====&lt;br /&gt;
&lt;br /&gt;
Besides the direct way of accessing memory there are also other ways of bringing char to the screen (f.e)&lt;br /&gt;
* [http://www.ctyme.com/intr/rb-4124.htm INT 29h]&lt;br /&gt;
* [http://www.ctyme.com/intr/rb-2558.htm INT 21h AH=6]&lt;br /&gt;
* [http://www.ctyme.com/intr/rb-2562.htm INT 21h AH=9]&lt;br /&gt;
&lt;br /&gt;
=== Outputting in mode 13h (320x200) ===&lt;br /&gt;
&lt;br /&gt;
==== Basic pixel output ====&lt;br /&gt;
&lt;br /&gt;
The videomemory for mode 13h is located at segment 0xA000, so you need to assign this value to a segment register. Also, after the start of your program you are normally still in textmode, so you need to switch to the videomode. The following snippet does both:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;mov al,0x13 &lt;br /&gt;
int 0x10     ; AH = 0 means : set video mode to AL = 0x13 (320 x 200 pixels in 256 colors)&lt;br /&gt;
push 0xA000  ; put value on the stack&lt;br /&gt;
pop es       ; pop the top stack value into segment register ES&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You're free to use any of the segment register / opcode combinations to write to the screen&lt;br /&gt;
* &amp;lt;code&amp;gt;ES&amp;lt;/code&amp;gt; (&amp;lt;code&amp;gt;stosb&amp;lt;/code&amp;gt;)&lt;br /&gt;
* &amp;lt;code&amp;gt;DS&amp;lt;/code&amp;gt; (&amp;lt;code&amp;gt;mov&amp;lt;/code&amp;gt;)&lt;br /&gt;
* &amp;lt;code&amp;gt;SS&amp;lt;/code&amp;gt; (&amp;lt;code&amp;gt;push&amp;lt;/code&amp;gt;)&lt;br /&gt;
&lt;br /&gt;
Let's add some code that actually draws something on the screen, the following program occupies 23 bytes and draws a fullscreen XOR texture&lt;br /&gt;
[[File:Mode13h-example-xor.png|left|bottom|thumb|mode13h-example-xor]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;mov al,0x13&lt;br /&gt;
int 0x10&lt;br /&gt;
push 0xa000&lt;br /&gt;
pop es&lt;br /&gt;
X: cwd			; &amp;quot;clear&amp;quot; DX (if AH &amp;lt; 0x7F)&lt;br /&gt;
mov ax,di		; get screen position into AX&lt;br /&gt;
mov bx,320		; get screen width into BX&lt;br /&gt;
div bx			; divide, to get row and column&lt;br /&gt;
xor ax,dx		; the famous XOR pattern&lt;br /&gt;
and al,32+8		; a more interesting variation of it&lt;br /&gt;
stosb			; finally, draw to the screen&lt;br /&gt;
jmp short X		; rinse and repeat&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Note that there is a different way of preparing the segment register, instead of :&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;push 0xa000&lt;br /&gt;
pop es&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
you can also do :&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;mov ah,0xA0&lt;br /&gt;
mov es,ax&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
both variations occupy 4 bytes, but the latter is executable on processor architectures where ''push &amp;lt;word&amp;gt;'' is not available.&lt;br /&gt;
&lt;br /&gt;
==== Alternative way of pixel plotting and optimization ====&lt;br /&gt;
&lt;br /&gt;
Now let's optimize on the snippet. First, we can adapt the &amp;quot;LES&amp;quot; trick from the textmode section. We just exchange &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;push 0xa000&lt;br /&gt;
pop es&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
with:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;les bx,[bx]&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
to save two bytes. This works because BX is 0x0000 at start and thus, accesses the region ''before'' our code, which is called [https://en.wikipedia.org/wiki/Program_Segment_Prefix Program Segment Prefix]. The two bytes that are put into the segment register ES are bytes 2 and 3  = ''&amp;quot;Segment of the first byte beyond the memory allocated to the program&amp;quot;'' which is usually 0x9FFF. That is just off by one to our desired 0xA000. Unfortunately that means a 16 pixel offset, so if screen alignment means something to you, you can't use this optimization. Also, said two bytes are not always 0x9FFF; for example, if resident programs are above the ''&amp;quot;memory allocated to the program&amp;quot;'' (FreeDos), their content is overwritten if we take their base as our video memory base.&lt;br /&gt;
&lt;br /&gt;
Second, we can use an alternative way of putting pixels to the screen, subfunction AH = 0x0C of int 0x10. Also, instead of constructing row and column from the screen pointer, we can use some interesting properties of the screenwidth regarding logical operations. This results in the following 16 byte program:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;cwd             ; &amp;quot;clear&amp;quot; DX for perfect alignment&lt;br /&gt;
mov al,0x13&lt;br /&gt;
X: int 0x10		; set video mode AND draw pixel&lt;br /&gt;
inc cx			; increment column&lt;br /&gt;
mov ax,cx		; get column in AH&lt;br /&gt;
xor al,ah		; the famous XOR pattern&lt;br /&gt;
mov ah,0x0C		; set subfunction &amp;quot;set pixel&amp;quot; for int 0x10&lt;br /&gt;
and al,32+8		; a more interesting variation of it&lt;br /&gt;
jmp short X		; rinse and repeat&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first optimization is the double usage of the same &amp;quot;int 0x10&amp;quot; as setting the videomode and drawing the pixel. The subfunction AH = 0x0C expects row and column in DX and CX. Since the screenwidth is 320, which is 5 * 64, we can ignore the row and just works with the column, if we use logical operations and just use bit 0-6 of the result. The subfunction AH = 0x0C allows for unbounded column values in CX (up to 65535) and correctly &amp;quot;wraps&amp;quot; it internally without an error.&lt;br /&gt;
&lt;br /&gt;
The major drawback of the &amp;quot;subfunction AH = 0x0C&amp;quot; approach is performance loss. While DosBox and many emulators perform just fine, real hardware will draw much much slower based on the Video BIOS.&lt;br /&gt;
&lt;br /&gt;
==== Basic animation and user interaction ====&lt;br /&gt;
&lt;br /&gt;
Now let's add the convenient check for the ESC key and also add a simple animation. The &amp;lt;code&amp;gt;DI&amp;lt;/code&amp;gt; register is used as frame counter and incremented after the pixel counter &amp;lt;code&amp;gt;CX&amp;lt;/code&amp;gt; ran through all 65536 values via &amp;lt;code&amp;gt;LOOP&amp;lt;/code&amp;gt;. This frame counter is then added to the column. The resulting program is now 25 bytes in size :&lt;br /&gt;
&lt;br /&gt;
[[File:Xor anim example.gif|thumb]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;cwd             	; &amp;quot;clear&amp;quot; DX for perfect alignment&lt;br /&gt;
mov 	al,0x13&lt;br /&gt;
X: 		int 0x10	; set video mode AND draw pixel&lt;br /&gt;
mov 	ax,cx		; get column in AH&lt;br /&gt;
add		ax,di		; offset by framecounter&lt;br /&gt;
xor 	al,ah		; the famous XOR pattern&lt;br /&gt;
and 	al,32+8		; a more interesting variation of it&lt;br /&gt;
mov 	ah,0x0C		; set subfunction &amp;quot;set pixel&amp;quot; for int 0x10&lt;br /&gt;
loop 	X			; loop 65536 times&lt;br /&gt;
inc 	di			; increment framecounter&lt;br /&gt;
in 		al,0x60		; check keyboard ...&lt;br /&gt;
dec 	al			; ... for ESC&lt;br /&gt;
jnz 	X			; rinse and repeat&lt;br /&gt;
ret					; quit program&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
( ↑ This example is the blueprint in the [[Floating-point_Opcodes#FPU_Basics| FPU Basics Section]].)&lt;br /&gt;
&lt;br /&gt;
=== Using Custom Colors ===&lt;br /&gt;
&lt;br /&gt;
==== Shades of Hue within the Default VGA palette ====&lt;br /&gt;
&lt;br /&gt;
You might have noticed there is a bit of structure to [https://i.stack.imgur.com/OSZ6D.png the default VGA Palette], which you can exploit for some interesting results. Looking at the pallete there is a rainbow of different hue values that start at index 32 that are repeated in a slightly different luma seperated by 72 indices. If you are okay with limiting the amount of shades you need, you can get a small colorramp for all kinds of hue values by simply calculating your color-index like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;color=((shade%3)*72)+32+huevalue&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
For an example of how this looks for all kinds of hue values, see [https://www.pouet.net/prod.php?which=63520 Popcast] by Hellmood/Desire.&lt;br /&gt;
&lt;br /&gt;
==== Setting a Custom Palette ====&lt;br /&gt;
&lt;br /&gt;
Sometimes, when the [https://i.stack.imgur.com/OSZ6D.png Default VGA Palette] doesn't quite match the look you are looking for, it can be useful to set your own palette using the VGA registers, the basic setup loop looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
palloop:&lt;br /&gt;
mov ax,cx&lt;br /&gt;
mov dx,0x3c8&lt;br /&gt;
out dx,al    ; select palette color&lt;br /&gt;
inc dx&lt;br /&gt;
out dx,al    ; write red value (0..63)&lt;br /&gt;
out dx,al    ; write green value (0..63)&lt;br /&gt;
out dx,al    ; write blue value (0..63)&lt;br /&gt;
loop palloop&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The above code sets a simple grayscale palette, assumes CX Register to be at 0) and is compatible with all DOS platforms.&lt;br /&gt;
In some cases you can ommit the &amp;lt;code&amp;gt;mov dx,0x3c8, out dx,al, inc dx&amp;lt;/code&amp;gt; and directly access the data register by just using &amp;lt;code&amp;gt;mov dx,0x3c9&amp;lt;/code&amp;gt; instead.&lt;br /&gt;
&lt;br /&gt;
To get more interesting colors than just grayscale, you can alter the value of the AL register in between setting the red, green and blue values.  For example by shifting, adding, substracting or performing logical operations. Just get creative and check if the result is sufficient for your usecase.&lt;br /&gt;
&lt;br /&gt;
TomCat will show the most common color palettes grouped by functionality. Check his article: [https://abaddon.hu/256b/colors.html Colors (in tiny intros)]&lt;br /&gt;
&lt;br /&gt;
== Producing sound ==&lt;br /&gt;
&lt;br /&gt;
=== MIDI notes ===&lt;br /&gt;
&lt;br /&gt;
Creating sounds with MIDI requires a bit more preparation, but once you're familiar with it, it's even simpler than PC Speaker sound, because you basically don't have to ''create'' the sound, you just have to ''trigger'' it. For the start, you have to know, that there is a lot of [https://en.wikipedia.org/wiki/General_MIDI different instruments] and a [https://www.midi.org/specifications/item/table-1-summary-of-midi-message defined way of communication]. Imagine the MIDI interface like a keyboard, you tell it which button/key you want to press, which knob to twist, and sometimes, how hard. Per default, the active instrument is the [https://en.wikipedia.org/wiki/Piano Acoustic Grand Piano].&lt;br /&gt;
&lt;br /&gt;
==== Single piano note ====&lt;br /&gt;
&lt;br /&gt;
Let's start of with a simple example, playing a single note on the piano :&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;mov 	al,	3Fh		;	set UART mode - command&lt;br /&gt;
mov 	dx,	331h	;	MIDI Control Port&lt;br /&gt;
out 	dx,	al 		;	send !&lt;br /&gt;
dec 	dx			;	MIDI Data Port ( = 330h )&lt;br /&gt;
mov 	al,	90h		;	send note on channel ZERO - command&lt;br /&gt;
out 	dx,	al		;	send !&lt;br /&gt;
mov 	al,	56h		;	data byte 1 : KEY = 56h&lt;br /&gt;
out 	dx,	al		; 	send !&lt;br /&gt;
mov 	al,	67h		;	data byte 2 : VOLUME = 67h&lt;br /&gt;
out 	dx,	al		;	send !&lt;br /&gt;
ret					; 	quit&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In short: you turn your keyboard on (switching to UART mode), then press a KEY with a certain VOLUME on channel ZERO, then exit. Besides switching to UART mode, all this communication uses the port &amp;lt;code&amp;gt;330h&amp;lt;/code&amp;gt;. This example will work on DosBox but not on Windows XP NTVDM: for [http://www.pouet.net/topic.php?which=10720&amp;amp;page=1 still unclear reasons], the NTVDM emulation delays the note until it receives a second one. The simplest way of at least hearing something is to repeatedly play notes, like in the following example :&lt;br /&gt;
&lt;br /&gt;
==== Repeated piano notes ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;mov 	al,	3Fh			;	set UART mode - command&lt;br /&gt;
mov 	dx,	331h		;	MIDI Control Port&lt;br /&gt;
out 	dx,	al 			;	send !&lt;br /&gt;
dec 	dx				;	MIDI Data Port ( = 330h )&lt;br /&gt;
main:&lt;br /&gt;
mov 	al,	90h			;	send note on channel 0 - command&lt;br /&gt;
out 	dx,	al			;	send !&lt;br /&gt;
mov 	al,	56h			;	data byte 1 : KEY = 56h&lt;br /&gt;
out 	dx,	al			; 	send !&lt;br /&gt;
mov 	al,	67h			;	data byte 2 : VOLUME = 67h&lt;br /&gt;
out 	dx,	al			;	send !&lt;br /&gt;
mov 	cl,5&lt;br /&gt;
_wait:&lt;br /&gt;
hlt&lt;br /&gt;
loop _wait&lt;br /&gt;
in 		al,	0x60		;	check for ESC&lt;br /&gt;
dec 	al				;&lt;br /&gt;
jnz 	main			;	no? repeat&lt;br /&gt;
ret						;	quit&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
↑ This is the previous example, enriched with synchronizing against the timer and checking for the ESC key. It works on both DosBox and Windows XP NTVDM and plays a note on the Piano repeatedly.&lt;br /&gt;
&lt;br /&gt;
==== Repeated notes of other instruments ====&lt;br /&gt;
&lt;br /&gt;
While hitting one key repeatedly is not really interesting in general, it can produce decent results when doing it with the right instrument activated, like it was done with the &amp;quot;French Horn&amp;quot; in [https://www.youtube.com/watch?v=tsbxdjP9l50 Timelord (by Baudsurfer)]. Apart from just changing the instrument, let's also optimize a little bit on the size: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;org 100h&lt;br /&gt;
start:	&lt;br /&gt;
mov 	si,data			; init pointer for outsb&lt;br /&gt;
mov 	dx,330h			; change to data port&lt;br /&gt;
mov 	cl,5			; play our music data&lt;br /&gt;
rep 	outsb			; (see below at &amp;quot;data&amp;quot; label)&lt;br /&gt;
inc 	dx				; switch to control port&lt;br /&gt;
outsb					; change to mode &amp;quot;UART&amp;quot;&lt;br /&gt;
_wait:&lt;br /&gt;
mov 	al,[fs:0x46c]	; read timer value&lt;br /&gt;
cmp	 	al,bl			; wait until...&lt;br /&gt;
jz 		_wait			; ...timer value changed&lt;br /&gt;
xchg 	bx,ax			; save old timer value&lt;br /&gt;
in 		al,0x60			; check for ...&lt;br /&gt;
dec 	al				; ... ESC key&lt;br /&gt;
jnz 	start			; otherwise : repeat &lt;br /&gt;
dec 	dx				; switch to data port again&lt;br /&gt;
outsb					; stop all ... &lt;br /&gt;
outsb					; ... notes played ...&lt;br /&gt;
outsb					; ... on channel 3&lt;br /&gt;
data: 	&lt;br /&gt;
db		0c3h			; change instrument on channel 3&lt;br /&gt;
						; (is also &amp;quot;RET&amp;quot; for program quit)&lt;br /&gt;
db		60				; to &amp;quot;French Horn&amp;quot;&lt;br /&gt;
db		93h				; play note on channel 3&lt;br /&gt;
db		35				; deep &amp;quot;b&amp;quot; = note number 35&lt;br /&gt;
db		127				; play with volume = 127&lt;br /&gt;
db		3fh				; change mode to &amp;quot;UART&amp;quot;&lt;br /&gt;
db		0b3h			; control change on channel 3&lt;br /&gt;
db		123				; Channel Mode Message &amp;quot;All Notes Off&amp;quot;&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
↑ This is the previous example, with changed instrument, structuring the MIDI data into a data section, optimizing the output with the usage of &amp;lt;code&amp;gt;outsb&amp;lt;/code&amp;gt; instead of &amp;lt;code&amp;gt;out dx,al&amp;lt;/code&amp;gt;, and finalizing the program with a special command to turn ''All Notes Off''. This is necessary for all instruments which don't stop by themself. In all the previous examples, we sent the &amp;quot;NOTE ON&amp;quot; command (&amp;lt;code&amp;gt;9Xh&amp;lt;/code&amp;gt;), but not the according &amp;quot;NOTE OFF&amp;quot; command (&amp;lt;code&amp;gt;8Xh&amp;lt;/code&amp;gt;). Also, the note is now played on channel &amp;lt;code&amp;gt;03h&amp;lt;/code&amp;gt;, since the commandbyte for changing an instrument on channel 3 is &amp;lt;code&amp;gt;0C3h&amp;lt;/code&amp;gt; which is also &amp;lt;code&amp;gt;RET&amp;lt;/code&amp;gt; and can be reused. If this looks complicated at first, always remember, it's just sending [https://www.midi.org/specifications/item/table-1-summary-of-midi-message defined commands] to a single port.&lt;br /&gt;
&lt;br /&gt;
==== The drum channel ====&lt;br /&gt;
&lt;br /&gt;
Now, that you're aware that there are different channels (overall: 16) to play notes on, how would you like a channel &amp;lt;code&amp;gt;09h&amp;lt;/code&amp;gt; specifically for 'Drums' ? [http://www.voidaudio.net/percussion.html Ten different drumsets] with dozens of samples are available out of the box. Per default, the &amp;quot;Standard Kit&amp;quot; is active. The following example plays a track of drum notes repeatedly, while further optimizing for size :&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;org 100h&lt;br /&gt;
aas						; 3fh = &amp;quot;set UART mode&amp;quot;&lt;br /&gt;
cwd						; 99h = &amp;quot;play note on drum channel&amp;quot; command&lt;br /&gt;
db 		42,38,42,35		; the drum notes (kick, snare, hihat)&lt;br /&gt;
mov 	dx,0x331		; MIDI Control Port&lt;br /&gt;
outsb					; send &amp;quot;set UART mode&amp;quot;&lt;br /&gt;
dec 	dx				; switch to MIDI data port&lt;br /&gt;
outsb					; send &amp;quot;play note on drum channel&amp;quot; command&lt;br /&gt;
&lt;br /&gt;
main:&lt;br /&gt;
mov		al,[fs:0x46c]	; read timer&lt;br /&gt;
test	al,3&lt;br /&gt;
jnz 	main			; skip 3 values&lt;br /&gt;
inc 	byte [fs:0x46c] ; inc manually to prevent retrigger&lt;br /&gt;
&lt;br /&gt;
inc 	bx				; increment note counter&lt;br /&gt;
and 	bl,3			; truncate to 4 notes&lt;br /&gt;
mov 	al,[bx+si]		; read the drumnote (see above)&lt;br /&gt;
&lt;br /&gt;
out 	dx,al			; send the drum&lt;br /&gt;
mov 	al,127			; set volume to maximum&lt;br /&gt;
out 	dx,al			; send volume&lt;br /&gt;
&lt;br /&gt;
in 		al,0x60			; check for ESC&lt;br /&gt;
dec 	al				;&lt;br /&gt;
jnz 	main			; no? repeat&lt;br /&gt;
ret						; otherwise quit&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In contrast to the previous example, the data section is now at the start. That means, it's executed as code! This is dangerous of course, but also saves bytes on assigning the &amp;lt;code&amp;gt;DATA&amp;lt;/code&amp;gt; offset to &amp;lt;code&amp;gt;SI&amp;lt;/code&amp;gt;. Once &amp;lt;code&amp;gt;outsb&amp;lt;/code&amp;gt; incremented &amp;lt;code&amp;gt;SI&amp;lt;/code&amp;gt; initially two times, it is fixed and further reading from the drumdata is done with &amp;lt;code&amp;gt;[BX+SI]&amp;lt;/code&amp;gt;. Unless you know exactly what you are doing, don't use that kind of &amp;quot;executing data&amp;quot; optimization!&amp;quot;. In this special case &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;CWD&amp;lt;/code&amp;gt; do no harm and the drum notes &amp;lt;code&amp;gt;42,38,42,35&amp;lt;/code&amp;gt; are carefully crafted and arranged to resemble the instruction &amp;lt;code&amp;gt;SUB AH,[232Ah]&amp;lt;/code&amp;gt; which does no harm either.&lt;br /&gt;
&lt;br /&gt;
==== Further Midi instrument tuning by controllers and pitch ====&lt;br /&gt;
&lt;br /&gt;
If you are familiar with hardware synthesizers you'll definitely remember the typical pitch bend or modulation wheels beside the keys, usually two of them. Those are commonly assigned to a vibrato/tremolo effect and a +/-pitch to tune the played note. You can also use those functions in your intro code to affect the currently played midi instrument note.&lt;br /&gt;
&lt;br /&gt;
To access these parameters the coding follows the usual midi programming like you can see here:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;mov al,10110000b	;Controller command on Midi channel 0&lt;br /&gt;
out dx,al&lt;br /&gt;
mov al,00000001b	;0...127 data byte 1 =&amp;gt; '1' is the code for the modulation wheel typically assigned to vibrato/tremolo&lt;br /&gt;
out dx,al&lt;br /&gt;
mov al,01111111b	;0...127 data byte 2 =&amp;gt; e.g. '01111111' =&amp;gt; Maximum vibrato level&lt;br /&gt;
out dx,al&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In that example the maximum vibrato level is assigned to any instrument played on midi channel 0. This effect was used in the [http://www.mikusite.de/x86/cryscom.zip Crystal Comet 128 Byte intro by Kuemmel].&lt;br /&gt;
&lt;br /&gt;
For pitch bend the code would be like:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;mov al,11100000b	;Pitch bend command on Midi channel 0&lt;br /&gt;
out dx,al&lt;br /&gt;
mov al,0lllllllb	;0...127 data byte 1 =&amp;gt; LSB value for pitch&lt;br /&gt;
out dx,al&lt;br /&gt;
mov al,0mmmmmmmb	;0...127 data byte 2 =&amp;gt; MSB value for pitch&lt;br /&gt;
out dx,al	&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pitch bend uses a 14 Bit value. The center is at 0x2000 (meaning no pitch). Numbers from 0x2000 up to 0x3fff increase the pitch and from 0x2000 down to 0x0000 will decrease it. The range of 0x2000 should refer to 2 semitones. So you can bend +/- 2 semitones. Please be aware that those values must be converted to two 7 Bit values. Therefore e.g. 0x3000 would be 0x60 (MSB) and 0x00 (LSB). &lt;br /&gt;
&lt;br /&gt;
Of course there are more midi controller options, e.g. you could change the stereo pan level. As a reference and for more detailed information please have a look at this [http://www.music-software-development.com/midi-tutorial.html Midi tutorial page].&lt;br /&gt;
==== Creating basic sound effects in 16 bytes ====&lt;br /&gt;
&lt;br /&gt;
In the [https://en.wikipedia.org/wiki/General_MIDI#Sound_Effects MIDI repertoire], there are already some sound effects available. With the &amp;quot;data execution&amp;quot; optimization above, let's fire a gunshot in 16 bytes :&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;aas&lt;br /&gt;
les 	di,[bx-0x6C]&lt;br /&gt;
xor 	al,127&lt;br /&gt;
mov 	dx,0x331&lt;br /&gt;
outsb&lt;br /&gt;
dec 	dx&lt;br /&gt;
mov 	cl,5&lt;br /&gt;
rep 	outsb&lt;br /&gt;
ret&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first three instructions don't do anything (they do, but we don't care), it's just MIDI data.&lt;br /&gt;
* &amp;lt;code&amp;gt;aas&amp;lt;/code&amp;gt;&lt;br /&gt;
the command for switching to &amp;quot;UART&amp;quot; mode, for sending to port &amp;lt;code&amp;gt;0x331&amp;lt;/code&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;les di,[bx-0x6C]&amp;lt;/code&amp;gt;&lt;br /&gt;
assembles to &amp;lt;code&amp;gt;0xc4&amp;lt;/code&amp;gt; (change instrument on channel 4), &amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt; (change it to &amp;quot;Gunshot&amp;quot;), &amp;lt;code&amp;gt;0x94&amp;lt;/code&amp;gt; (play note on channel 4)&lt;br /&gt;
* &amp;lt;code&amp;gt;xor al,127&amp;lt;/code&amp;gt;&lt;br /&gt;
assembles to &amp;lt;code&amp;gt;0x34&amp;lt;/code&amp;gt; (play THIS note), &amp;lt;code&amp;gt;0x7f&amp;lt;/code&amp;gt; (play it THAT loud, 127 is also the allowed maximum)&lt;br /&gt;
&lt;br /&gt;
The rest of the code basically just sends the MIDI data to the interface and exits. You can change the kind of sound effect with modifying the modbyte of the second instruction (change BX to BP or SI etc.). Changing the volume is more simple, change the byte value of &amp;lt;code&amp;gt;xor al,127&amp;lt;/code&amp;gt; to any value between 0 and 127.&lt;br /&gt;
&lt;br /&gt;
==== Procedural MIDI music generation in 64 bytes ====&lt;br /&gt;
&lt;br /&gt;
With all the above you should now be able to follow the next snippet [http://www.pouet.net/prod.php?which=66313 Descent OST], a small framework for procedural MIDI sound generation in 64 bytes :&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;; &amp;quot;Descent OST&amp;quot;, a 62 byte MIDI music player for MSDOS&lt;br /&gt;
; created by HellMood/DESiRE (C)2015&lt;br /&gt;
; this is the extracted music routine used in &amp;quot;Descent&amp;quot; &lt;br /&gt;
; it is a procedural MIDI algorithm which sticks a&lt;br /&gt;
; subroutine to the DOS timer (interrupt 0x1C)&lt;br /&gt;
; the registered routine is called ~18.2 times per second&lt;br /&gt;
&lt;br /&gt;
; developed for use with &amp;quot;NASM&amp;quot;, &lt;br /&gt;
; see http://sourceforge.net/projects/nasm/files/&lt;br /&gt;
&lt;br /&gt;
%define		 	rhythmPattern			0b11&lt;br /&gt;
; with &amp;quot;rhythmPattern&amp;quot;, you define how often a note is played&lt;br /&gt;
; generally, higher values and values containing many &amp;quot;ones&amp;quot;&lt;br /&gt;
; in binary representation, will result in faster play&lt;br /&gt;
; for example &amp;quot;0b11&amp;quot; will play every 4th note&lt;br /&gt;
%define 		baseInstrument			9&lt;br /&gt;
; defines the number of the first instrument used.&lt;br /&gt;
; see http://www.midi.org/techspecs/gm1sound.php for a full list&lt;br /&gt;
; keep in mind, that there are only a few instrument blocks&lt;br /&gt;
; whose sounds stop after a while. You won't get good results&lt;br /&gt;
; from strings etc. just a mess of overlayed sounds&lt;br /&gt;
%define 		numInstruments			7&lt;br /&gt;
; defines how many instrument are used. keep in mind, that &amp;quot;rhythm-&lt;br /&gt;
; Pattern&amp;quot; has influence on the picked instrument. the instruments&lt;br /&gt;
; from 9 to 9+7 are called &amp;quot;chromatic percussion&amp;quot;&lt;br /&gt;
%define 		noteStep 				5&lt;br /&gt;
; defines the basic difference from on note to the next. recommended&lt;br /&gt;
; values here are (mainly) 3,4 and 5 for music theoretic reasons&lt;br /&gt;
; but feel free to play around =)&lt;br /&gt;
%define 		noteRange				12&lt;br /&gt;
; after adding the noteStep, the note value is &amp;quot;mod&amp;quot;ded with&lt;br /&gt;
; the &amp;quot;noteRange&amp;quot;. 12 means octave, which results in very harmonic&lt;br /&gt;
; scales&lt;br /&gt;
%define 		noteSpread				3&lt;br /&gt;
; the third step spreads the notes over the tonal spectrum, you may&lt;br /&gt;
; want to keep &amp;quot;noteSpread&amp;quot; * &amp;quot;noteRange&amp;quot; round about 30-60.&lt;br /&gt;
%define 		baseNote				40&lt;br /&gt;
; the general tone height of everything. some instruments don't play&lt;br /&gt;
; arbitrary deep notes correctly, and too high notes cause ear bleeding&lt;br /&gt;
; adjust with care ;)&lt;br /&gt;
&lt;br /&gt;
; WARNING : after exiting the program, the timer interrupt is still active&lt;br /&gt;
; i strongly recommend to reboot or restart DOSBOX!&lt;br /&gt;
&lt;br /&gt;
; ADVISE : Yes, there are music- and math-related things going on here&lt;br /&gt;
; if you're not into music theory, cycle of fifth, and the like, it maybe&lt;br /&gt;
; better to just play around with the parameters, rather then understanding them&lt;br /&gt;
; just change stuff slowly, and eventually you will get &amp;quot;there&amp;quot;&lt;br /&gt;
; wherever that is ;)&lt;br /&gt;
&lt;br /&gt;
org 0x100&lt;br /&gt;
xchg cx,ax					; set our second counter to zero&lt;br /&gt;
mov dx,music			&lt;br /&gt;
mov ax,0x251C				; mode &amp;quot;0x25&amp;quot; , &amp;quot;0x1C&amp;quot; = change address of timer interrupt&lt;br /&gt;
int 0x21					; see http://mprolab.teipir.gr/vivlio80X86/dosints.pdf&lt;br /&gt;
S:&lt;br /&gt;
in ax,0x60					; wait for &amp;quot;ESC&amp;quot; press, then exit&lt;br /&gt;
dec al						; music plays on anyway, this is just for&lt;br /&gt;
jnz S						; keeping the music exactly as in &amp;quot;Descent&amp;quot;&lt;br /&gt;
ret							; return to prompt&lt;br /&gt;
music:&lt;br /&gt;
inc bx						; increment our first counter (starts at zero)&lt;br /&gt;
test bl,byte rhythmPattern	; play a note every 4th time tick&lt;br /&gt;
jnz nomusic					; otherwise do nothing&lt;br /&gt;
mov dx,0x331&lt;br /&gt;
mov al,0x3F&lt;br /&gt;
out dx,al&lt;br /&gt;
dec dx&lt;br /&gt;
mov al,0xC0					; change instrument on channel 0...&lt;br /&gt;
out dx,al&lt;br /&gt;
mov ax,bx					&lt;br /&gt;
aam byte numInstruments&lt;br /&gt;
add al,byte baseInstrument	; ...to this instrument&lt;br /&gt;
out dx,al&lt;br /&gt;
mov al,0x90					; play note on channel 0 ...&lt;br /&gt;
out dx,al&lt;br /&gt;
add cl,byte noteStep			 &lt;br /&gt;
mov al,cl&lt;br /&gt;
aam byte noteRange&lt;br /&gt;
imul ax,noteSpread&lt;br /&gt;
add al,baseNote				; ... play THIS note&lt;br /&gt;
out dx,al&lt;br /&gt;
neg al						; (play deeper notes louder = add bass)&lt;br /&gt;
add al,127+39				; ... play it THAT loud&lt;br /&gt;
out dx,al&lt;br /&gt;
nomusic:&lt;br /&gt;
iret&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== PC Speaker ===&lt;br /&gt;
&lt;br /&gt;
Producing sound with PC speakers is incredibly easy. Basically, you set a system timer to a desired frequency, then connect this timer to the speaker. [http://wiki.osdev.org/PC_Speaker The PC Speaker Article] from OSDEV Wiki has the details about it. An example for a tiny intro that uses PC speaker music is [http://www.pouet.net/prod.php?which=67833 SpeaCore]&lt;br /&gt;
&lt;br /&gt;
==== Basic example with melody pattern ====&lt;br /&gt;
&lt;br /&gt;
A very optimized and dirty variant of producing sound with the speaker is this 12 byte snippet (sound routine from [http://www.pouet.net/prod.php?which=67829 the tiny intro &amp;quot;darkweb&amp;quot;]):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;hlt				; sync to timer1&lt;br /&gt;
inc bx			; increment our counter&lt;br /&gt;
mov ax,bx		; work with a copy&lt;br /&gt;
or al,0x4B      ; melody pattern + 2 LSB for speaker link&lt;br /&gt;
out 0x42,al		; set new countdown for timer2 (two passes)&lt;br /&gt;
out 0x61,al		; link timer2 to PC speaker (2 LSBs are 1)&lt;br /&gt;
jmp si			; rinse and repeat&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
Instead of sending low and high byte of our divisor directly in succession, we do it the &amp;quot;two path&amp;quot; way. That reduces the amount of possible frequencies to 255, which is still good enough for some rough sounds. Linking the timer to the PC speaker might not be obvious : Normally you would read the value of port 0x61, set the two least significant bits to TRUE and write the value again. You can save on all of this, if you just send the &amp;quot;two path&amp;quot; value which you just used for the timer if that value has the two least significant bits already set (''or al,0x4B'' does this). Be aware that port 0x61 does many things apart from just connecting the timer to the speaker. A useful resource for ports in general is the [http://bochs.sourceforge.net/techspec/PORTS.LST Bochs Ports List], for port 0x61 it displays:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
''0061	w	KB controller port B (ISA, EISA)   (PS/2 port A is at 0092)&lt;br /&gt;
&lt;br /&gt;
		system control port for compatibility with 8255&lt;br /&gt;
&lt;br /&gt;
		bit 7	(1= IRQ 0 reset )&lt;br /&gt;
&lt;br /&gt;
		bit 6-4    reserved&lt;br /&gt;
&lt;br /&gt;
		bit 3 = 1  channel check enable&lt;br /&gt;
&lt;br /&gt;
		bit 2 = 1  parity check enable&lt;br /&gt;
&lt;br /&gt;
		'''bit 1 = 1  speaker data enable'''&lt;br /&gt;
&lt;br /&gt;
		'''bit 0 = 1  timer 2 gate to speaker enable''' ''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
So if you experience strange things with highly optimized pc speaker output, revert to the safe way. The described way works with real hardware and DosBox. Unfortunately, both Orcacle Virtual Box with MsDos 6.22 and Windows XP NTVDM seem not to properly emulate PC speakers (Investigation and citation needed here!)&lt;br /&gt;
&lt;br /&gt;
==== Simple deep sound in 8 bytes ====&lt;br /&gt;
&lt;br /&gt;
One of the smallest possible PC speaker sound generation might be this 8 byte snippet :&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;dec ax			; AX initially 0000h -&amp;gt; AL = 0xFF&lt;br /&gt;
out 42h,al		; change divisor of timer2 to 0xFFFF&lt;br /&gt;
out 42h,al		; resulting in a very low frequency&lt;br /&gt;
out 61h,al		; 2 LSBs are set, connect timer to speaker&lt;br /&gt;
ret				; quit&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(Note: This may fail on actual hardware, as there might not be time for the bus to settle between the consecutive &amp;lt;code&amp;gt;out 42h,al&amp;lt;/code&amp;gt; statements.)&lt;br /&gt;
&lt;br /&gt;
=== COVOX output (aka LPT DAC) ===&lt;br /&gt;
&lt;br /&gt;
It is possible to output to an LPT-connected DAC (&amp;quot;[https://en.wikipedia.org/wiki/Covox_Speech_Thing COVOX]&amp;quot;) in a tinyprog.  A proof-of-concept example is [http://www.pouet.net/prod.php?which=57991 Express Train 125] which uses COVOX for sound generation.  &lt;br /&gt;
&lt;br /&gt;
This method follows the [http://countercomplex.blogspot.com/2011/10/algorithmic-symphonies-from-one-line-of.html &amp;quot;audio from one line of C code&amp;quot;] style of sound generation.  [http://www.pouet.net/topic.php?which=8357&amp;amp;page=1 A pouet discussion] exists for more background information.&lt;br /&gt;
&lt;br /&gt;
Here are some detailed instructions on how a very small bytebeat player can be realized: [[MUZAK58]]&lt;br /&gt;
&lt;br /&gt;
=== Advanced PC Speaker and COVOX sound via interrupt ===&lt;br /&gt;
&lt;br /&gt;
For a more advanced use of PC Speaker or COVOX sound output for tiny intros,&lt;br /&gt;
also regarding a specific timing to a desired sample frequency playback, the use of an interrupt&lt;br /&gt;
timer is recommended. To illustrate this we take a so called bytebeat and make it into a workable&lt;br /&gt;
code example for PC Speaker and COVOX.&lt;br /&gt;
&lt;br /&gt;
The major difference between the two is that COVOX has the benefit of a precision of 8 bits and PC Speaker usually only 6 bits.&lt;br /&gt;
Furthermore the setup/access is different as shown in the sections before. Regarding size of the code and quality of the&lt;br /&gt;
sound COVOX is preferable.&lt;br /&gt;
&lt;br /&gt;
The bytebeat we are using as an example can be seen and heard [http://wurstcaptures.untergrund.net/music/?oneliner=((t%264096)%3F((t*(t%5Et%25255)%7C(t%3E%3E4))%3E%3E1)%3A(t%3E%3E3)%7C((t%268192)%3Ft%20%3C%3C2%3At))&amp;amp;rate=11025 here.] It's based on JavaScript syntax: ((t&amp;amp;4096)?((t*(t^t%255)|(t&amp;gt;&amp;gt;4))&amp;gt;&amp;gt;1):(t&amp;gt;&amp;gt;3)|((t&amp;amp;8192)?t &amp;lt;&amp;lt;2:t))&lt;br /&gt;
&lt;br /&gt;
Bytebeat code like this can be directly ported to assembler by evaluating the single expressions step by step as you can&lt;br /&gt;
see in the implementations here. Those examples work within DOSBox and should also run on real hardware with FreeDOS. It doesn't&lt;br /&gt;
show any graphical output, it just plays the bytebeat until a key is pressed. Your graphics routine should be placed right after&lt;br /&gt;
the 'main' label. &lt;br /&gt;
&lt;br /&gt;
==== PC Speaker variant ====&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;org 100h&lt;br /&gt;
&lt;br /&gt;
 mov    ax,3508h			;21h, ah=35h get interrupt handler | al=08h interrupt number (PIT timer)&lt;br /&gt;
 int    21h					;return: es:bx&lt;br /&gt;
 push   es&lt;br /&gt;
 push   bx					;backup current interrupt handler&lt;br /&gt;
 mov    cx,63 + 108*256  	;PIT counter divisor = 108 and speaker enable for init   &lt;br /&gt;
 mov    bl,90h				;10010000b =&amp;gt; on &amp;quot;init&amp;quot;  	&lt;br /&gt;
							;Bit0   =   0 counter 16 Bits set&lt;br /&gt;
							;Bit3-1 = 000 mode 0 select&lt;br /&gt;
							;Bit5-4 =  01 read/write counter bits 0-7 only&lt;br /&gt;
							;Bit7-6 =  10 counter 2 select&lt;br /&gt;
&lt;br /&gt;
 mov    dx,irq				;new handler address&lt;br /&gt;
 call   init&lt;br /&gt;
&lt;br /&gt;
main:&lt;br /&gt;
 mov    ah,0&lt;br /&gt;
 int    16h					;ah = 0, int16h =&amp;gt; read keypress&lt;br /&gt;
&lt;br /&gt;
 pop    dx&lt;br /&gt;
 pop    ds					;restore handler address at exit&lt;br /&gt;
 xor    cx,cx				;PIT counter divisor = 0 and speaker disable for exit &lt;br /&gt;
 mov    bl,0b6h				;bl = 10110110b =&amp;gt; at exit&lt;br /&gt;
init:&lt;br /&gt;
 xchg   ax,cx &lt;br /&gt;
 out    61h,al				;al = 0 or 63 =&amp;gt; Bit0 = 1 timer 2 gate to speaker enable,&lt;br /&gt;
 mov    al,ah				;Bit1 = 1 speaker data enable ...or disable both at al = 0&lt;br /&gt;
 out    40h,al				;al = 0 or 108 =&amp;gt; write PIT counter 0 divisor&lt;br /&gt;
 salc&lt;br /&gt;
 out    40h,al				;al = 0 =&amp;gt; write PIT counter 0 divisor again = 0 high byte&lt;br /&gt;
							;=&amp;gt; this results in a frequency for the interrupt call of 11025 Hz.&lt;br /&gt;
							;as clock is 1,19318181818 MHz =&amp;gt; 1,19318181818 MHz / 108 = 11025 Hz&lt;br /&gt;
 xchg   ax,bx				;al=bl  = 10110110b&lt;br /&gt;
 out    43h,al				;Bit0   =   0 counter 16 Bits set&lt;br /&gt;
							;Bit3-1 = 011 mode 3 select, square wave generator&lt;br /&gt;
							;Bit5-4 =  11 read/write counter bits 0-7 first, then 8-15&lt;br /&gt;
							;Bit7-6 =  10 counter 2 select&lt;br /&gt;
 mov    ax,2508h			;21h, ah=25h set interrupt handler | al=08h interrupt number (PIT timer)  &lt;br /&gt;
 int    21h&lt;br /&gt;
retn&lt;br /&gt;
&lt;br /&gt;
;bytebeat: ((t&amp;amp;4096)?((t*(t^t%255)|(t&amp;gt;&amp;gt;4))&amp;gt;&amp;gt;1):(t&amp;gt;&amp;gt;3)|((t&amp;amp;8192)?t&amp;lt;&amp;lt;2:t))&lt;br /&gt;
irq:&lt;br /&gt;
 pusha&lt;br /&gt;
 mov    bp,255&lt;br /&gt;
 mov    ax,0            	;ax: t&lt;br /&gt;
.counter:&lt;br /&gt;
 mov    cx,ax&lt;br /&gt;
 shr    cx,3            	;cx: (t&amp;gt;&amp;gt;3)&lt;br /&gt;
 test   ax,4096         	;(t&amp;amp;4096)?&lt;br /&gt;
 jz     .1&lt;br /&gt;
 mov    bx,ax           	;bx: t&lt;br /&gt;
 sub    dx,dx           	;dx:ax t&lt;br /&gt;
 div    bp              	;dx: (t%255)&lt;br /&gt;
 xor    dx,bx           	;dx: (t^(t%255))&lt;br /&gt;
 shr    cx,1            	;cx: (t&amp;gt;&amp;gt;4)&lt;br /&gt;
 xchg   ax,bx           	;ax: t&lt;br /&gt;
 mul    dx              	;ax: t*(t^(t%255))&lt;br /&gt;
 or     ax,cx           	;ax: t*(t^(t%255))|(t&amp;gt;&amp;gt;4)&lt;br /&gt;
 shr    ax,1            	;ax: (t*(t^(t%255))|(t&amp;gt;&amp;gt;4))&amp;gt;&amp;gt;1&lt;br /&gt;
 jmp    .3&lt;br /&gt;
.1:&lt;br /&gt;
 test   ax,8192         	;(t&amp;amp;8192)?&lt;br /&gt;
 jz     .2&lt;br /&gt;
 shl    ax,2            	;ax: (t&amp;lt;&amp;lt;2)&lt;br /&gt;
.2:&lt;br /&gt;
 or     ax,cx           	;ax: ax|(t&amp;gt;&amp;gt;3)&lt;br /&gt;
.3:&lt;br /&gt;
 shr    al,2            	;downscale to 6 bits&lt;br /&gt;
 jz     .4&lt;br /&gt;
 out    42h,al				;write 6 Bit data to speaker (PIT counter 2)&lt;br /&gt;
.4:&lt;br /&gt;
&lt;br /&gt;
 inc    word [bp-255+irq.counter-2]&lt;br /&gt;
 mov    al,20h				;00100000b&lt;br /&gt;
 out    20h,al				;Bit 5 = 1 send End Of Interrupt (EOI) signal&lt;br /&gt;
 popa&lt;br /&gt;
iret&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== COVOX variant ====&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;org 100h&lt;br /&gt;
&lt;br /&gt;
 mov    ax,3508h			;21h, ah=35h get interrupt handler | al=08h interrupt number (PIT timer)&lt;br /&gt;
 int    21h					;return: es:bx&lt;br /&gt;
 push   es&lt;br /&gt;
 push   bx					;backup current interrupt handler&lt;br /&gt;
 mov    al,108				;PIT counter divisor&lt;br /&gt;
 mov    dx,irq				;new handler address&lt;br /&gt;
 call   init&lt;br /&gt;
&lt;br /&gt;
main:&lt;br /&gt;
 mov    ah,0&lt;br /&gt;
 int    16h					;ah = 0, int16h =&amp;gt; read keypress&lt;br /&gt;
 pop    dx&lt;br /&gt;
 pop    ds					;restore handler address at exit&lt;br /&gt;
 salc						;al = 0 at exit&lt;br /&gt;
&lt;br /&gt;
init:&lt;br /&gt;
 out    40h,al				;al = 0 or 108 =&amp;gt; write PIT counter 0 divisor = 108 low byte&lt;br /&gt;
 salc&lt;br /&gt;
 out    40h,al				;al = 0 =&amp;gt; write PIT counter 0 divisor again = 0 high byte&lt;br /&gt;
							;=&amp;gt; this results in a frequency for the interrupt call of 11025 Hz.&lt;br /&gt;
							;as clock is 1,19318181818 MHz =&amp;gt; 1,19318181818 MHz / 108 = 11025 Hz&lt;br /&gt;
 mov    ax,2508h			;21h, ah=25h set interrupt handler | al=08h interrupt number (PIT timer)&lt;br /&gt;
 int    21h                                                             &lt;br /&gt;
 retn&lt;br /&gt;
&lt;br /&gt;
;bytebeat: ((t&amp;amp;4096)?((t*(t^t%255)|(t&amp;gt;&amp;gt;4))&amp;gt;&amp;gt;1):(t&amp;gt;&amp;gt;3)|((t&amp;amp;8192)?t&amp;lt;&amp;lt;2:t))&lt;br /&gt;
irq:&lt;br /&gt;
 pusha&lt;br /&gt;
 mov    bp,255&lt;br /&gt;
 mov    ax,0				;ax: t&lt;br /&gt;
.counter:&lt;br /&gt;
 mov    cx,ax&lt;br /&gt;
 shr    cx,3				;cx: (t&amp;gt;&amp;gt;3)&lt;br /&gt;
 test   ax,4096				;(t&amp;amp;4096)?&lt;br /&gt;
 jz     .1&lt;br /&gt;
 mov    bx,ax				;bx: t&lt;br /&gt;
 sub    dx,dx				;dx: ax t&lt;br /&gt;
 div    bp					;dx: (t%255)&lt;br /&gt;
 xor    dx,bx				;dx: (t^(t%255))&lt;br /&gt;
 shr    cx,1				;cx: (t&amp;gt;&amp;gt;4)&lt;br /&gt;
 xchg   ax,bx				;ax: t&lt;br /&gt;
 mul    dx					;ax: t*(t^(t%255))&lt;br /&gt;
 or     ax,cx				;ax: t*(t^(t%255))|(t&amp;gt;&amp;gt;4)&lt;br /&gt;
 shr    ax,1				;ax: (t*(t^(t%255))|(t&amp;gt;&amp;gt;4))&amp;gt;&amp;gt;1&lt;br /&gt;
 jmp   .3&lt;br /&gt;
.1:&lt;br /&gt;
 test   ax,8192				;(t&amp;amp;8192)?&lt;br /&gt;
 jz    .2&lt;br /&gt;
 shl    ax,2				;ax: (t&amp;lt;&amp;lt;2)&lt;br /&gt;
.2:&lt;br /&gt;
 or     ax,cx				;ax: ax|(t&amp;gt;&amp;gt;3)&lt;br /&gt;
.3:&lt;br /&gt;
 mov    dx,0378h			;LPT1 parallel port address&lt;br /&gt;
 out    dx,al				;write 8 Bit sample data&lt;br /&gt;
 inc    word[bp-255+irq.counter-2]&lt;br /&gt;
 mov    al,20h              ;00100000b&lt;br /&gt;
 out    20h,al				;Bit 5 = 1 send End Of Interrupt (EOI) signal&lt;br /&gt;
 popa&lt;br /&gt;
 iret&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Further notes on the two variants ====&lt;br /&gt;
&lt;br /&gt;
It's important to set and know the sample frequency you want. E.g. if you want to port the frequency from 11025 Hz to e.g. 18939 Hz&lt;br /&gt;
for the same sound you need to change the following code parts e.g. for COVOX. Pay attention that also the bytebeat parameters where adjusted to fit more or less the double frequency:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;;...snip...&lt;br /&gt;
 mov    al,63				;PIT counter divisor instead of 108 =&amp;gt; 1,19318181818 MHz / 63 = 18939 Hz&lt;br /&gt;
;...snip...&lt;br /&gt;
;bytebeat: ((t&amp;amp;8192)?((t*(t^t%255)|(t&amp;gt;&amp;gt;5))&amp;gt;&amp;gt;1):(t&amp;gt;&amp;gt;4)|((t&amp;amp;16192)?t&amp;lt;&amp;lt;2:t))&lt;br /&gt;
;...snip...&lt;br /&gt;
.counter:&lt;br /&gt;
 mov    cx,ax&lt;br /&gt;
 shr    cx,4            	;cx: (t&amp;gt;&amp;gt;4)&lt;br /&gt;
 test   ax,8192         	;(t&amp;amp;8192)?&lt;br /&gt;
 jz     .1              	&lt;br /&gt;
 mov    bx,ax           	;bx: t&lt;br /&gt;
 sub    dx,dx           	;dx:ax t&lt;br /&gt;
 div    bp              	;dx: (t%255)&lt;br /&gt;
 xor    dx,bx           	;dx: (t^(t%255))&lt;br /&gt;
 shr    cx,1            	;cx: (t&amp;gt;&amp;gt;5)&lt;br /&gt;
 xchg   ax,bx           	;ax: t&lt;br /&gt;
 mul    dx              	;ax: t*(t^(t%255))&lt;br /&gt;
 or     ax,cx           	;ax: t*(t^(t%255))|(t&amp;gt;&amp;gt;4)&lt;br /&gt;
 shr    ax,1            	;ax: (t*(t^(t%255))|(t&amp;gt;&amp;gt;4))&amp;gt;&amp;gt;1&lt;br /&gt;
 jmp    .3&lt;br /&gt;
.1:&lt;br /&gt;
 test   ax,16384        	;(t&amp;amp;16384)?&lt;br /&gt;
 jz     .2&lt;br /&gt;
 shl    ax,2            	;ax: (t&amp;lt;&amp;lt;2)&lt;br /&gt;
.2:&lt;br /&gt;
 or     ax,cx           	;ax: ax|(t&amp;gt;&amp;gt;3)&lt;br /&gt;
;...snip...&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The routine here uses a frequency of 18939 Hz. So regarding the 16 bit timer used here this would result in a length of a maximum&lt;br /&gt;
of 65535/18939 = 3.46 seconds before everything loops. Usually that would be enough for some drumbeat, but not for a complete song&lt;br /&gt;
or melody. In that case you have to use another register as a 'top' timer to trigger your changes for the sound.&lt;br /&gt;
&lt;br /&gt;
One more thing to check and maybe modify if you hear an imperfect sound is the timing regarding when a sample value is actually &amp;quot;played&amp;quot;.&lt;br /&gt;
Preferable you would want to play each sample value at exactly the same time. But as your sample generation routine might need a different&lt;br /&gt;
amount of CPU cycles each time the interrupt is called this can differ all the time, when code is used like above.&lt;br /&gt;
&lt;br /&gt;
One solution for this is to play the sample calculated from the last interrupt call right away when the interrupt is called the next time.&lt;br /&gt;
You can do that via self-modifying code like shown here. It takes 5 Bytes more:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;;...snip...&lt;br /&gt;
irq:&lt;br /&gt;
pusha&lt;br /&gt;
mov    dx,0378h&lt;br /&gt;
mov    al,0           &lt;br /&gt;
.sample:&lt;br /&gt;
out    dx,al&lt;br /&gt;
mov    bp,255&lt;br /&gt;
mov    ax,0            ; ax: t&lt;br /&gt;
.counter:&lt;br /&gt;
;...snip...&lt;br /&gt;
inc    word [bp-255+irq.counter-2]&lt;br /&gt;
mov    byte [bp-255+irq.sample-1],al&lt;br /&gt;
mov    al,20h&lt;br /&gt;
out    20h,al&lt;br /&gt;
;...snip...&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some remarks: All the code above is not optimized to the max regarding size due to educational reasons.&lt;br /&gt;
Depending on your code and dependency of the interrupt subroutine you can do several size optimizations.&lt;br /&gt;
&lt;br /&gt;
Instead of using the interrupt &amp;lt;code&amp;gt;08&amp;lt;/code&amp;gt; theoretically the user defined interrupt number &amp;lt;code&amp;gt;1c&amp;lt;/code&amp;gt; could be&lt;br /&gt;
used also, but by now this seems to work only with DOSBox but not on a real system with FreeDOS. Further tests&lt;br /&gt;
are needed to see what is the problem here. The use of interrupt &amp;lt;code&amp;gt;1c&amp;lt;/code&amp;gt; would save 4 bytes as the following&lt;br /&gt;
code lines to finalize the interrupt could be omitted in the examples above:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;;...snip...&lt;br /&gt;
mov al,20h&lt;br /&gt;
out 20h,al&lt;br /&gt;
;...snip...&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
What can be done with bytebeats ? Basically everything :-) You can look at this [http://www.pouet.net/topic.php?which=8357&amp;amp;page=1 thread] on pouet to get an idea and check the first PC Speaker intro with bytebeat: [http://www.pouet.net/prod.php?which=71766 TCTRONIC by TomCat/Abaddon].&lt;br /&gt;
 &lt;br /&gt;
Some basic waveforms can be encoded like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;;sawtooth wave&lt;br /&gt;
t &amp;amp; 127&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;;square wave&lt;br /&gt;
t &amp;amp; 128&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;;triangle wave&lt;br /&gt;
t ^ ((t &amp;amp; 128) * 127)&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How would you go from here to create a specific tone, e.g. an &amp;quot;A4&amp;quot;, which would have a frequency of 440 Hz (Check this &lt;br /&gt;
[https://en.wikipedia.org/wiki/Piano_key_frequencies link] to get a list for the frequencies of the notes) ?&lt;br /&gt;
For that you have to relate the set frequency of the interrupt to the tone frequency and the length of one wave of your&lt;br /&gt;
wave generator. If we have a sawtooth of &amp;lt;code&amp;gt;t&amp;amp;127&amp;lt;/code&amp;gt; at 22050 Hz this would result in a tone of 22050/128 = 172.3 Hz.&lt;br /&gt;
To reach 440 Hz we can simply stretch/multiply the timer by 440/172.3 = 2.554 to hear the desired note:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
(t*2.554) &amp;amp; 127&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://naivesound.comglitch Glitch] is another online tool to create bytebeat sounds with some enhanced syntax. There are several good reads on this tool and the theorise behind it and byte beat [https://medium.com/@naive_sound here]&lt;br /&gt;
&lt;br /&gt;
Of course there are endless possibilities and the whole world of real time sound calculation/generation is open to you. Here are some tiny intros which use this techniques already: [http://www.pouet.net/prod.php?which=77741 Plasmifier cover 256B], [http://www.pouet.net/prod.php?which=70599 2(56)unlimited], [http://www.pouet.net/prod.php?which=78505 somehow].&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Output&amp;diff=1328</id>
		<title>Output</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Output&amp;diff=1328"/>
				<updated>2024-02-15T17:42:42Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* COVOX output (aka LPT DAC) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Outputting to the screen ==&lt;br /&gt;
&lt;br /&gt;
First, be aware of the [http://img.tfd.com/cde/MEMMAP.GIF MSDOS memory layout]&lt;br /&gt;
&lt;br /&gt;
=== Outputting in Textmode (80x25) ===&lt;br /&gt;
&lt;br /&gt;
==== Hello World / High Level function ====&lt;br /&gt;
&lt;br /&gt;
Here's an obligatory &amp;quot;Hello World&amp;quot; program in text mode, using a [http://www.ctyme.com/intr/rb-2562.htm &amp;quot;high level&amp;quot; MS-DOS function]. With a small optimization already included (using &amp;lt;code&amp;gt;XCHG BP,AX&amp;lt;/code&amp;gt; instead of &amp;lt;code&amp;gt;MOV AH,09h&amp;lt;/code&amp;gt;), this snippet is 20 bytes in size.&lt;br /&gt;
&lt;br /&gt;
[[File:Hello world.png|thumb|Hello World!]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt; &lt;br /&gt;
org 100h			; we start at CS:100h&lt;br /&gt;
xchg 	bp,ax		; already a trick, puts 09h into AH&lt;br /&gt;
mov		dx,text		; DX expects the adress of a $ terminated string&lt;br /&gt;
int 	21h			; call the DOS function (AH = 09h)&lt;br /&gt;
ret					; quit&lt;br /&gt;
text:&lt;br /&gt;
db 'Hello World!$'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Of course, this gets shorter with each byte you remove from the text itself. Now let's look into arbitrary screen access. Right after the start of your program you are in mode 3, that is 80x25 in 16 colors. See the [http://www.columbia.edu/~em36/wpdos/videomodes.txt Video Modes List] [[File:Drawchar example.png|thumb|draw char example]] So, to show something on the screen, you would need to set a segment register to 0xB800, then write values into this segment. &lt;br /&gt;
&lt;br /&gt;
==== Low level access ====&lt;br /&gt;
&lt;br /&gt;
The following three snippets showcase how to draw a red smiley in three different ways. All example snippets are meant to be standalone programs, starting with the first instruction and nothing before it. The target coordinate (40,12) is about the middle of the screen. We need a multiplier 2 since one char needs two bytes in memory (char and color is a byte each). The high byte 0x04 means red (4) on black (0) while the 0x01 is the first ASCII char - a smiley.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;push 0xb800&lt;br /&gt;
pop ds&lt;br /&gt;
mov bx,(80*12+40)*2&lt;br /&gt;
mov ax, 0x0401&lt;br /&gt;
mov [bx],ax&lt;br /&gt;
ret&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;push 0xb800&lt;br /&gt;
pop es&lt;br /&gt;
mov di,(80*12+40)*2&lt;br /&gt;
mov ax, 0x0401&lt;br /&gt;
stosw&lt;br /&gt;
ret&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;push ss&lt;br /&gt;
push 0xb800&lt;br /&gt;
pop ss&lt;br /&gt;
mov sp,(80*12+40)*2&lt;br /&gt;
mov ax, 0x0401&lt;br /&gt;
push ax&lt;br /&gt;
pop ss&lt;br /&gt;
int 0x20&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You might notice that the ''push &amp;lt;word&amp;gt;'' + ''pop seg_reg'' combination is always the same and occupies four bytes alltogether. If correct alignment is not important to you and you really just want ''any'' pointer to the screen, there is another way to get a valid one:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt; &lt;br /&gt;
les bx,[si]&lt;br /&gt;
nop&lt;br /&gt;
stosb&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
That's also four bytes, but it already has the &amp;lt;code&amp;gt;stosb&amp;lt;/code&amp;gt; opcode (for putting something onto the screen) integrated and even one slot free for another one-byte-instruction. It works because &amp;lt;code&amp;gt;SI&amp;lt;/code&amp;gt; initially points to the start of our code, and &amp;lt;code&amp;gt;stosb&amp;lt;/code&amp;gt; has the hexadecimal representation of &amp;lt;code&amp;gt;0AAh&amp;lt;/code&amp;gt;. After the first command, the segment register &amp;lt;code&amp;gt;ES&amp;lt;/code&amp;gt; contains the value &amp;lt;code&amp;gt;0AA90h&amp;lt;/code&amp;gt;. If you repeatedly write something to the screen with &amp;lt;code&amp;gt;stosb&amp;lt;/code&amp;gt; you will eventually reach the &amp;lt;code&amp;gt;0B800h&amp;lt;/code&amp;gt; segment and chars will appear on the screen. With a careful selection of the free one-byte-opcode you can also reintroduce some alignment. This works also with the &amp;lt;code&amp;gt;stosw&amp;lt;/code&amp;gt; opcode &amp;lt;code&amp;gt;0ABh&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Alternative high level functions  ====&lt;br /&gt;
&lt;br /&gt;
Besides the direct way of accessing memory there are also other ways of bringing char to the screen (f.e)&lt;br /&gt;
* [http://www.ctyme.com/intr/rb-4124.htm INT 29h]&lt;br /&gt;
* [http://www.ctyme.com/intr/rb-2558.htm INT 21h AH=6]&lt;br /&gt;
* [http://www.ctyme.com/intr/rb-2562.htm INT 21h AH=9]&lt;br /&gt;
&lt;br /&gt;
=== Outputting in mode 13h (320x200) ===&lt;br /&gt;
&lt;br /&gt;
==== Basic pixel output ====&lt;br /&gt;
&lt;br /&gt;
The videomemory for mode 13h is located at segment 0xA000, so you need to assign this value to a segment register. Also, after the start of your program you are normally still in textmode, so you need to switch to the videomode. The following snippet does both:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;mov al,0x13 &lt;br /&gt;
int 0x10     ; AH = 0 means : set video mode to AL = 0x13 (320 x 200 pixels in 256 colors)&lt;br /&gt;
push 0xA000  ; put value on the stack&lt;br /&gt;
pop es       ; pop the top stack value into segment register ES&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You're free to use any of the segment register / opcode combinations to write to the screen&lt;br /&gt;
* &amp;lt;code&amp;gt;ES&amp;lt;/code&amp;gt; (&amp;lt;code&amp;gt;stosb&amp;lt;/code&amp;gt;)&lt;br /&gt;
* &amp;lt;code&amp;gt;DS&amp;lt;/code&amp;gt; (&amp;lt;code&amp;gt;mov&amp;lt;/code&amp;gt;)&lt;br /&gt;
* &amp;lt;code&amp;gt;SS&amp;lt;/code&amp;gt; (&amp;lt;code&amp;gt;push&amp;lt;/code&amp;gt;)&lt;br /&gt;
&lt;br /&gt;
Let's add some code that actually draws something on the screen, the following program occupies 23 bytes and draws a fullscreen XOR texture&lt;br /&gt;
[[File:Mode13h-example-xor.png|left|bottom|thumb|mode13h-example-xor]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;mov al,0x13&lt;br /&gt;
int 0x10&lt;br /&gt;
push 0xa000&lt;br /&gt;
pop es&lt;br /&gt;
X: cwd			; &amp;quot;clear&amp;quot; DX (if AH &amp;lt; 0x7F)&lt;br /&gt;
mov ax,di		; get screen position into AX&lt;br /&gt;
mov bx,320		; get screen width into BX&lt;br /&gt;
div bx			; divide, to get row and column&lt;br /&gt;
xor ax,dx		; the famous XOR pattern&lt;br /&gt;
and al,32+8		; a more interesting variation of it&lt;br /&gt;
stosb			; finally, draw to the screen&lt;br /&gt;
jmp short X		; rinse and repeat&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Note that there is a different way of preparing the segment register, instead of :&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;push 0xa000&lt;br /&gt;
pop es&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
you can also do :&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;mov ah,0xA0&lt;br /&gt;
mov es,ax&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
both variations occupy 4 bytes, but the latter is executable on processor architectures where ''push &amp;lt;word&amp;gt;'' is not available.&lt;br /&gt;
&lt;br /&gt;
==== Alternative way of pixel plotting and optimization ====&lt;br /&gt;
&lt;br /&gt;
Now let's optimize on the snippet. First, we can adapt the &amp;quot;LES&amp;quot; trick from the textmode section. We just exchange &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;push 0xa000&lt;br /&gt;
pop es&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
with:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;les bx,[bx]&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
to save two bytes. This works because BX is 0x0000 at start and thus, accesses the region ''before'' our code, which is called [https://en.wikipedia.org/wiki/Program_Segment_Prefix Program Segment Prefix]. The two bytes that are put into the segment register ES are bytes 2 and 3  = ''&amp;quot;Segment of the first byte beyond the memory allocated to the program&amp;quot;'' which is usually 0x9FFF. That is just off by one to our desired 0xA000. Unfortunately that means a 16 pixel offset, so if screen alignment means something to you, you can't use this optimization. Also, said two bytes are not always 0x9FFF; for example, if resident programs are above the ''&amp;quot;memory allocated to the program&amp;quot;'' (FreeDos), their content is overwritten if we take their base as our video memory base.&lt;br /&gt;
&lt;br /&gt;
Second, we can use an alternative way of putting pixels to the screen, subfunction AH = 0x0C of int 0x10. Also, instead of constructing row and column from the screen pointer, we can use some interesting properties of the screenwidth regarding logical operations. This results in the following 16 byte program:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;cwd             ; &amp;quot;clear&amp;quot; DX for perfect alignment&lt;br /&gt;
mov al,0x13&lt;br /&gt;
X: int 0x10		; set video mode AND draw pixel&lt;br /&gt;
inc cx			; increment column&lt;br /&gt;
mov ax,cx		; get column in AH&lt;br /&gt;
xor al,ah		; the famous XOR pattern&lt;br /&gt;
mov ah,0x0C		; set subfunction &amp;quot;set pixel&amp;quot; for int 0x10&lt;br /&gt;
and al,32+8		; a more interesting variation of it&lt;br /&gt;
jmp short X		; rinse and repeat&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first optimization is the double usage of the same &amp;quot;int 0x10&amp;quot; as setting the videomode and drawing the pixel. The subfunction AH = 0x0C expects row and column in DX and CX. Since the screenwidth is 320, which is 5 * 64, we can ignore the row and just works with the column, if we use logical operations and just use bit 0-6 of the result. The subfunction AH = 0x0C allows for unbounded column values in CX (up to 65535) and correctly &amp;quot;wraps&amp;quot; it internally without an error.&lt;br /&gt;
&lt;br /&gt;
The major drawback of the &amp;quot;subfunction AH = 0x0C&amp;quot; approach is performance loss. While DosBox and many emulators perform just fine, real hardware will draw much much slower based on the Video BIOS.&lt;br /&gt;
&lt;br /&gt;
==== Basic animation and user interaction ====&lt;br /&gt;
&lt;br /&gt;
Now let's add the convenient check for the ESC key and also add a simple animation. The &amp;lt;code&amp;gt;DI&amp;lt;/code&amp;gt; register is used as frame counter and incremented after the pixel counter &amp;lt;code&amp;gt;CX&amp;lt;/code&amp;gt; ran through all 65536 values via &amp;lt;code&amp;gt;LOOP&amp;lt;/code&amp;gt;. This frame counter is then added to the column. The resulting program is now 25 bytes in size :&lt;br /&gt;
&lt;br /&gt;
[[File:Xor anim example.gif|thumb]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;cwd             	; &amp;quot;clear&amp;quot; DX for perfect alignment&lt;br /&gt;
mov 	al,0x13&lt;br /&gt;
X: 		int 0x10	; set video mode AND draw pixel&lt;br /&gt;
mov 	ax,cx		; get column in AH&lt;br /&gt;
add		ax,di		; offset by framecounter&lt;br /&gt;
xor 	al,ah		; the famous XOR pattern&lt;br /&gt;
and 	al,32+8		; a more interesting variation of it&lt;br /&gt;
mov 	ah,0x0C		; set subfunction &amp;quot;set pixel&amp;quot; for int 0x10&lt;br /&gt;
loop 	X			; loop 65536 times&lt;br /&gt;
inc 	di			; increment framecounter&lt;br /&gt;
in 		al,0x60		; check keyboard ...&lt;br /&gt;
dec 	al			; ... for ESC&lt;br /&gt;
jnz 	X			; rinse and repeat&lt;br /&gt;
ret					; quit program&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
( ↑ This example is the blueprint in the [[Floating-point_Opcodes#FPU_Basics| FPU Basics Section]].)&lt;br /&gt;
&lt;br /&gt;
=== Using Custom Colors ===&lt;br /&gt;
&lt;br /&gt;
==== Shades of Hue within the Default VGA palette ====&lt;br /&gt;
&lt;br /&gt;
You might have noticed there is a bit of structure to [https://i.stack.imgur.com/OSZ6D.png the default VGA Palette], which you can exploit for some interesting results. Looking at the pallete there is a rainbow of different hue values that start at index 32 that are repeated in a slightly different luma seperated by 72 indices. If you are okay with limiting the amount of shades you need, you can get a small colorramp for all kinds of hue values by simply calculating your color-index like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;color=((shade%3)*72)+32+huevalue&amp;lt;/code&amp;gt; &lt;br /&gt;
&lt;br /&gt;
For an example of how this looks for all kinds of hue values, see [https://www.pouet.net/prod.php?which=63520 Popcast] by Hellmood/Desire.&lt;br /&gt;
&lt;br /&gt;
==== Setting a Custom Palette ====&lt;br /&gt;
&lt;br /&gt;
Sometimes, when the [https://i.stack.imgur.com/OSZ6D.png Default VGA Palette] doesn't quite match the look you are looking for, it can be useful to set your own palette using the VGA registers, the basic setup loop looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
palloop:&lt;br /&gt;
mov ax,cx&lt;br /&gt;
mov dx,0x3c8&lt;br /&gt;
out dx,al    ; select palette color&lt;br /&gt;
inc dx&lt;br /&gt;
out dx,al    ; write red value (0..63)&lt;br /&gt;
out dx,al    ; write green value (0..63)&lt;br /&gt;
out dx,al    ; write blue value (0..63)&lt;br /&gt;
loop palloop&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The above code sets a simple grayscale palette, assumes CX Register to be at 0) and is compatible with all DOS platforms.&lt;br /&gt;
In some cases you can ommit the &amp;lt;code&amp;gt;mov dx,0x3c8, out dx,al, inc dx&amp;lt;/code&amp;gt; and directly access the data register by just using &amp;lt;code&amp;gt;mov dx,0x3c9&amp;lt;/code&amp;gt; instead.&lt;br /&gt;
&lt;br /&gt;
To get more interesting colors than just grayscale, you can alter the value of the AL register in between setting the red, green and blue values.  For example by shifting, adding, substracting or performing logical operations. Just get creative and check if the result is sufficient for your usecase.&lt;br /&gt;
&lt;br /&gt;
TomCat will show the most common color palettes grouped by functionality. Check his article: [https://abaddon.hu/256b/colors.html Colors (in tiny intros)]&lt;br /&gt;
&lt;br /&gt;
== Producing sound ==&lt;br /&gt;
&lt;br /&gt;
=== MIDI notes ===&lt;br /&gt;
&lt;br /&gt;
Creating sounds with MIDI requires a bit more preparation, but once you're familiar with it, it's even simpler than PC Speaker sound, because you basically don't have to ''create'' the sound, you just have to ''trigger'' it. For the start, you have to know, that there is a lot of [https://en.wikipedia.org/wiki/General_MIDI different instruments] and a [https://www.midi.org/specifications/item/table-1-summary-of-midi-message defined way of communication]. Imagine the MIDI interface like a keyboard, you tell it which button/key you want to press, which knob to twist, and sometimes, how hard. Per default, the active instrument is the [https://en.wikipedia.org/wiki/Piano Acoustic Grand Piano].&lt;br /&gt;
&lt;br /&gt;
==== Single piano note ====&lt;br /&gt;
&lt;br /&gt;
Let's start of with a simple example, playing a single note on the piano :&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;mov 	al,	3Fh		;	set UART mode - command&lt;br /&gt;
mov 	dx,	331h	;	MIDI Control Port&lt;br /&gt;
out 	dx,	al 		;	send !&lt;br /&gt;
dec 	dx			;	MIDI Data Port ( = 330h )&lt;br /&gt;
mov 	al,	90h		;	send note on channel ZERO - command&lt;br /&gt;
out 	dx,	al		;	send !&lt;br /&gt;
mov 	al,	56h		;	data byte 1 : KEY = 56h&lt;br /&gt;
out 	dx,	al		; 	send !&lt;br /&gt;
mov 	al,	67h		;	data byte 2 : VOLUME = 67h&lt;br /&gt;
out 	dx,	al		;	send !&lt;br /&gt;
ret					; 	quit&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In short: you turn your keyboard on (switching to UART mode), then press a KEY with a certain VOLUME on channel ZERO, then exit. Besides switching to UART mode, all this communication uses the port &amp;lt;code&amp;gt;330h&amp;lt;/code&amp;gt;. This example will work on DosBox but not on Windows XP NTVDM: for [http://www.pouet.net/topic.php?which=10720&amp;amp;page=1 still unclear reasons], the NTVDM emulation delays the note until it receives a second one. The simplest way of at least hearing something is to repeatedly play notes, like in the following example :&lt;br /&gt;
&lt;br /&gt;
==== Repeated piano notes ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;mov 	al,	3Fh			;	set UART mode - command&lt;br /&gt;
mov 	dx,	331h		;	MIDI Control Port&lt;br /&gt;
out 	dx,	al 			;	send !&lt;br /&gt;
dec 	dx				;	MIDI Data Port ( = 330h )&lt;br /&gt;
main:&lt;br /&gt;
mov 	al,	90h			;	send note on channel 0 - command&lt;br /&gt;
out 	dx,	al			;	send !&lt;br /&gt;
mov 	al,	56h			;	data byte 1 : KEY = 56h&lt;br /&gt;
out 	dx,	al			; 	send !&lt;br /&gt;
mov 	al,	67h			;	data byte 2 : VOLUME = 67h&lt;br /&gt;
out 	dx,	al			;	send !&lt;br /&gt;
mov 	cl,5&lt;br /&gt;
_wait:&lt;br /&gt;
hlt&lt;br /&gt;
loop _wait&lt;br /&gt;
in 		al,	0x60		;	check for ESC&lt;br /&gt;
dec 	al				;&lt;br /&gt;
jnz 	main			;	no? repeat&lt;br /&gt;
ret						;	quit&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
↑ This is the previous example, enriched with synchronizing against the timer and checking for the ESC key. It works on both DosBox and Windows XP NTVDM and plays a note on the Piano repeatedly.&lt;br /&gt;
&lt;br /&gt;
==== Repeated notes of other instruments ====&lt;br /&gt;
&lt;br /&gt;
While hitting one key repeatedly is not really interesting in general, it can produce decent results when doing it with the right instrument activated, like it was done with the &amp;quot;French Horn&amp;quot; in [https://www.youtube.com/watch?v=tsbxdjP9l50 Timelord (by Baudsurfer)]. Apart from just changing the instrument, let's also optimize a little bit on the size: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;org 100h&lt;br /&gt;
start:	&lt;br /&gt;
mov 	si,data			; init pointer for outsb&lt;br /&gt;
mov 	dx,330h			; change to data port&lt;br /&gt;
mov 	cl,5			; play our music data&lt;br /&gt;
rep 	outsb			; (see below at &amp;quot;data&amp;quot; label)&lt;br /&gt;
inc 	dx				; switch to control port&lt;br /&gt;
outsb					; change to mode &amp;quot;UART&amp;quot;&lt;br /&gt;
_wait:&lt;br /&gt;
mov 	al,[fs:0x46c]	; read timer value&lt;br /&gt;
cmp	 	al,bl			; wait until...&lt;br /&gt;
jz 		_wait			; ...timer value changed&lt;br /&gt;
xchg 	bx,ax			; save old timer value&lt;br /&gt;
in 		al,0x60			; check for ...&lt;br /&gt;
dec 	al				; ... ESC key&lt;br /&gt;
jnz 	start			; otherwise : repeat &lt;br /&gt;
dec 	dx				; switch to data port again&lt;br /&gt;
outsb					; stop all ... &lt;br /&gt;
outsb					; ... notes played ...&lt;br /&gt;
outsb					; ... on channel 3&lt;br /&gt;
data: 	&lt;br /&gt;
db		0c3h			; change instrument on channel 3&lt;br /&gt;
						; (is also &amp;quot;RET&amp;quot; for program quit)&lt;br /&gt;
db		60				; to &amp;quot;French Horn&amp;quot;&lt;br /&gt;
db		93h				; play note on channel 3&lt;br /&gt;
db		35				; deep &amp;quot;b&amp;quot; = note number 35&lt;br /&gt;
db		127				; play with volume = 127&lt;br /&gt;
db		3fh				; change mode to &amp;quot;UART&amp;quot;&lt;br /&gt;
db		0b3h			; control change on channel 3&lt;br /&gt;
db		123				; Channel Mode Message &amp;quot;All Notes Off&amp;quot;&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
↑ This is the previous example, with changed instrument, structuring the MIDI data into a data section, optimizing the output with the usage of &amp;lt;code&amp;gt;outsb&amp;lt;/code&amp;gt; instead of &amp;lt;code&amp;gt;out dx,al&amp;lt;/code&amp;gt;, and finalizing the program with a special command to turn ''All Notes Off''. This is necessary for all instruments which don't stop by themself. In all the previous examples, we sent the &amp;quot;NOTE ON&amp;quot; command (&amp;lt;code&amp;gt;9Xh&amp;lt;/code&amp;gt;), but not the according &amp;quot;NOTE OFF&amp;quot; command (&amp;lt;code&amp;gt;8Xh&amp;lt;/code&amp;gt;). Also, the note is now played on channel &amp;lt;code&amp;gt;03h&amp;lt;/code&amp;gt;, since the commandbyte for changing an instrument on channel 3 is &amp;lt;code&amp;gt;0C3h&amp;lt;/code&amp;gt; which is also &amp;lt;code&amp;gt;RET&amp;lt;/code&amp;gt; and can be reused. If this looks complicated at first, always remember, it's just sending [https://www.midi.org/specifications/item/table-1-summary-of-midi-message defined commands] to a single port.&lt;br /&gt;
&lt;br /&gt;
==== The drum channel ====&lt;br /&gt;
&lt;br /&gt;
Now, that you're aware that there are different channels (overall: 16) to play notes on, how would you like a channel &amp;lt;code&amp;gt;09h&amp;lt;/code&amp;gt; specifically for 'Drums' ? [http://www.voidaudio.net/percussion.html Ten different drumsets] with dozens of samples are available out of the box. Per default, the &amp;quot;Standard Kit&amp;quot; is active. The following example plays a track of drum notes repeatedly, while further optimizing for size :&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;org 100h&lt;br /&gt;
aas						; 3fh = &amp;quot;set UART mode&amp;quot;&lt;br /&gt;
cwd						; 99h = &amp;quot;play note on drum channel&amp;quot; command&lt;br /&gt;
db 		42,38,42,35		; the drum notes (kick, snare, hihat)&lt;br /&gt;
mov 	dx,0x331		; MIDI Control Port&lt;br /&gt;
outsb					; send &amp;quot;set UART mode&amp;quot;&lt;br /&gt;
dec 	dx				; switch to MIDI data port&lt;br /&gt;
outsb					; send &amp;quot;play note on drum channel&amp;quot; command&lt;br /&gt;
&lt;br /&gt;
main:&lt;br /&gt;
mov		al,[fs:0x46c]	; read timer&lt;br /&gt;
test	al,3&lt;br /&gt;
jnz 	main			; skip 3 values&lt;br /&gt;
inc 	byte [fs:0x46c] ; inc manually to prevent retrigger&lt;br /&gt;
&lt;br /&gt;
inc 	bx				; increment note counter&lt;br /&gt;
and 	bl,3			; truncate to 4 notes&lt;br /&gt;
mov 	al,[bx+si]		; read the drumnote (see above)&lt;br /&gt;
&lt;br /&gt;
out 	dx,al			; send the drum&lt;br /&gt;
mov 	al,127			; set volume to maximum&lt;br /&gt;
out 	dx,al			; send volume&lt;br /&gt;
&lt;br /&gt;
in 		al,0x60			; check for ESC&lt;br /&gt;
dec 	al				;&lt;br /&gt;
jnz 	main			; no? repeat&lt;br /&gt;
ret						; otherwise quit&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In contrast to the previous example, the data section is now at the start. That means, it's executed as code! This is dangerous of course, but also saves bytes on assigning the &amp;lt;code&amp;gt;DATA&amp;lt;/code&amp;gt; offset to &amp;lt;code&amp;gt;SI&amp;lt;/code&amp;gt;. Once &amp;lt;code&amp;gt;outsb&amp;lt;/code&amp;gt; incremented &amp;lt;code&amp;gt;SI&amp;lt;/code&amp;gt; initially two times, it is fixed and further reading from the drumdata is done with &amp;lt;code&amp;gt;[BX+SI]&amp;lt;/code&amp;gt;. Unless you know exactly what you are doing, don't use that kind of &amp;quot;executing data&amp;quot; optimization!&amp;quot;. In this special case &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;CWD&amp;lt;/code&amp;gt; do no harm and the drum notes &amp;lt;code&amp;gt;42,38,42,35&amp;lt;/code&amp;gt; are carefully crafted and arranged to resemble the instruction &amp;lt;code&amp;gt;SUB AH,[232Ah]&amp;lt;/code&amp;gt; which does no harm either.&lt;br /&gt;
&lt;br /&gt;
==== Further Midi instrument tuning by controllers and pitch ====&lt;br /&gt;
&lt;br /&gt;
If you are familiar with hardware synthesizers you'll definitely remember the typical pitch bend or modulation wheels beside the keys, usually two of them. Those are commonly assigned to a vibrato/tremolo effect and a +/-pitch to tune the played note. You can also use those functions in your intro code to affect the currently played midi instrument note.&lt;br /&gt;
&lt;br /&gt;
To access these parameters the coding follows the usual midi programming like you can see here:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;mov al,10110000b	;Controller command on Midi channel 0&lt;br /&gt;
out dx,al&lt;br /&gt;
mov al,00000001b	;0...127 data byte 1 =&amp;gt; '1' is the code for the modulation wheel typically assigned to vibrato/tremolo&lt;br /&gt;
out dx,al&lt;br /&gt;
mov al,01111111b	;0...127 data byte 2 =&amp;gt; e.g. '01111111' =&amp;gt; Maximum vibrato level&lt;br /&gt;
out dx,al&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In that example the maximum vibrato level is assigned to any instrument played on midi channel 0. This effect was used in the [http://www.mikusite.de/x86/cryscom.zip Crystal Comet 128 Byte intro by Kuemmel].&lt;br /&gt;
&lt;br /&gt;
For pitch bend the code would be like:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;mov al,11100000b	;Pitch bend command on Midi channel 0&lt;br /&gt;
out dx,al&lt;br /&gt;
mov al,0lllllllb	;0...127 data byte 1 =&amp;gt; LSB value for pitch&lt;br /&gt;
out dx,al&lt;br /&gt;
mov al,0mmmmmmmb	;0...127 data byte 2 =&amp;gt; MSB value for pitch&lt;br /&gt;
out dx,al	&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Pitch bend uses a 14 Bit value. The center is at 0x2000 (meaning no pitch). Numbers from 0x2000 up to 0x3fff increase the pitch and from 0x2000 down to 0x0000 will decrease it. The range of 0x2000 should refer to 2 semitones. So you can bend +/- 2 semitones. Please be aware that those values must be converted to two 7 Bit values. Therefore e.g. 0x3000 would be 0x60 (MSB) and 0x00 (LSB). &lt;br /&gt;
&lt;br /&gt;
Of course there are more midi controller options, e.g. you could change the stereo pan level. As a reference and for more detailed information please have a look at this [http://www.music-software-development.com/midi-tutorial.html Midi tutorial page].&lt;br /&gt;
==== Creating basic sound effects in 16 bytes ====&lt;br /&gt;
&lt;br /&gt;
In the [https://en.wikipedia.org/wiki/General_MIDI#Sound_Effects MIDI repertoire], there are already some sound effects available. With the &amp;quot;data execution&amp;quot; optimization above, let's fire a gunshot in 16 bytes :&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;aas&lt;br /&gt;
les 	di,[bx-0x6C]&lt;br /&gt;
xor 	al,127&lt;br /&gt;
mov 	dx,0x331&lt;br /&gt;
outsb&lt;br /&gt;
dec 	dx&lt;br /&gt;
mov 	cl,5&lt;br /&gt;
rep 	outsb&lt;br /&gt;
ret&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first three instructions don't do anything (they do, but we don't care), it's just MIDI data.&lt;br /&gt;
* &amp;lt;code&amp;gt;aas&amp;lt;/code&amp;gt;&lt;br /&gt;
the command for switching to &amp;quot;UART&amp;quot; mode, for sending to port &amp;lt;code&amp;gt;0x331&amp;lt;/code&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;les di,[bx-0x6C]&amp;lt;/code&amp;gt;&lt;br /&gt;
assembles to &amp;lt;code&amp;gt;0xc4&amp;lt;/code&amp;gt; (change instrument on channel 4), &amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt; (change it to &amp;quot;Gunshot&amp;quot;), &amp;lt;code&amp;gt;0x94&amp;lt;/code&amp;gt; (play note on channel 4)&lt;br /&gt;
* &amp;lt;code&amp;gt;xor al,127&amp;lt;/code&amp;gt;&lt;br /&gt;
assembles to &amp;lt;code&amp;gt;0x34&amp;lt;/code&amp;gt; (play THIS note), &amp;lt;code&amp;gt;0x7f&amp;lt;/code&amp;gt; (play it THAT loud, 127 is also the allowed maximum)&lt;br /&gt;
&lt;br /&gt;
The rest of the code basically just sends the MIDI data to the interface and exits. You can change the kind of sound effect with modifying the modbyte of the second instruction (change BX to BP or SI etc.). Changing the volume is more simple, change the byte value of &amp;lt;code&amp;gt;xor al,127&amp;lt;/code&amp;gt; to any value between 0 and 127.&lt;br /&gt;
&lt;br /&gt;
==== Procedural MIDI music generation in 64 bytes ====&lt;br /&gt;
&lt;br /&gt;
With all the above you should now be able to follow the next snippet [http://www.pouet.net/prod.php?which=66313 Descent OST], a small framework for procedural MIDI sound generation in 64 bytes :&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;; &amp;quot;Descent OST&amp;quot;, a 62 byte MIDI music player for MSDOS&lt;br /&gt;
; created by HellMood/DESiRE (C)2015&lt;br /&gt;
; this is the extracted music routine used in &amp;quot;Descent&amp;quot; &lt;br /&gt;
; it is a procedural MIDI algorithm which sticks a&lt;br /&gt;
; subroutine to the DOS timer (interrupt 0x1C)&lt;br /&gt;
; the registered routine is called ~18.2 times per second&lt;br /&gt;
&lt;br /&gt;
; developed for use with &amp;quot;NASM&amp;quot;, &lt;br /&gt;
; see http://sourceforge.net/projects/nasm/files/&lt;br /&gt;
&lt;br /&gt;
%define		 	rhythmPattern			0b11&lt;br /&gt;
; with &amp;quot;rhythmPattern&amp;quot;, you define how often a note is played&lt;br /&gt;
; generally, higher values and values containing many &amp;quot;ones&amp;quot;&lt;br /&gt;
; in binary representation, will result in faster play&lt;br /&gt;
; for example &amp;quot;0b11&amp;quot; will play every 4th note&lt;br /&gt;
%define 		baseInstrument			9&lt;br /&gt;
; defines the number of the first instrument used.&lt;br /&gt;
; see http://www.midi.org/techspecs/gm1sound.php for a full list&lt;br /&gt;
; keep in mind, that there are only a few instrument blocks&lt;br /&gt;
; whose sounds stop after a while. You won't get good results&lt;br /&gt;
; from strings etc. just a mess of overlayed sounds&lt;br /&gt;
%define 		numInstruments			7&lt;br /&gt;
; defines how many instrument are used. keep in mind, that &amp;quot;rhythm-&lt;br /&gt;
; Pattern&amp;quot; has influence on the picked instrument. the instruments&lt;br /&gt;
; from 9 to 9+7 are called &amp;quot;chromatic percussion&amp;quot;&lt;br /&gt;
%define 		noteStep 				5&lt;br /&gt;
; defines the basic difference from on note to the next. recommended&lt;br /&gt;
; values here are (mainly) 3,4 and 5 for music theoretic reasons&lt;br /&gt;
; but feel free to play around =)&lt;br /&gt;
%define 		noteRange				12&lt;br /&gt;
; after adding the noteStep, the note value is &amp;quot;mod&amp;quot;ded with&lt;br /&gt;
; the &amp;quot;noteRange&amp;quot;. 12 means octave, which results in very harmonic&lt;br /&gt;
; scales&lt;br /&gt;
%define 		noteSpread				3&lt;br /&gt;
; the third step spreads the notes over the tonal spectrum, you may&lt;br /&gt;
; want to keep &amp;quot;noteSpread&amp;quot; * &amp;quot;noteRange&amp;quot; round about 30-60.&lt;br /&gt;
%define 		baseNote				40&lt;br /&gt;
; the general tone height of everything. some instruments don't play&lt;br /&gt;
; arbitrary deep notes correctly, and too high notes cause ear bleeding&lt;br /&gt;
; adjust with care ;)&lt;br /&gt;
&lt;br /&gt;
; WARNING : after exiting the program, the timer interrupt is still active&lt;br /&gt;
; i strongly recommend to reboot or restart DOSBOX!&lt;br /&gt;
&lt;br /&gt;
; ADVISE : Yes, there are music- and math-related things going on here&lt;br /&gt;
; if you're not into music theory, cycle of fifth, and the like, it maybe&lt;br /&gt;
; better to just play around with the parameters, rather then understanding them&lt;br /&gt;
; just change stuff slowly, and eventually you will get &amp;quot;there&amp;quot;&lt;br /&gt;
; wherever that is ;)&lt;br /&gt;
&lt;br /&gt;
org 0x100&lt;br /&gt;
xchg cx,ax					; set our second counter to zero&lt;br /&gt;
mov dx,music			&lt;br /&gt;
mov ax,0x251C				; mode &amp;quot;0x25&amp;quot; , &amp;quot;0x1C&amp;quot; = change address of timer interrupt&lt;br /&gt;
int 0x21					; see http://mprolab.teipir.gr/vivlio80X86/dosints.pdf&lt;br /&gt;
S:&lt;br /&gt;
in ax,0x60					; wait for &amp;quot;ESC&amp;quot; press, then exit&lt;br /&gt;
dec al						; music plays on anyway, this is just for&lt;br /&gt;
jnz S						; keeping the music exactly as in &amp;quot;Descent&amp;quot;&lt;br /&gt;
ret							; return to prompt&lt;br /&gt;
music:&lt;br /&gt;
inc bx						; increment our first counter (starts at zero)&lt;br /&gt;
test bl,byte rhythmPattern	; play a note every 4th time tick&lt;br /&gt;
jnz nomusic					; otherwise do nothing&lt;br /&gt;
mov dx,0x331&lt;br /&gt;
mov al,0x3F&lt;br /&gt;
out dx,al&lt;br /&gt;
dec dx&lt;br /&gt;
mov al,0xC0					; change instrument on channel 0...&lt;br /&gt;
out dx,al&lt;br /&gt;
mov ax,bx					&lt;br /&gt;
aam byte numInstruments&lt;br /&gt;
add al,byte baseInstrument	; ...to this instrument&lt;br /&gt;
out dx,al&lt;br /&gt;
mov al,0x90					; play note on channel 0 ...&lt;br /&gt;
out dx,al&lt;br /&gt;
add cl,byte noteStep			 &lt;br /&gt;
mov al,cl&lt;br /&gt;
aam byte noteRange&lt;br /&gt;
imul ax,noteSpread&lt;br /&gt;
add al,baseNote				; ... play THIS note&lt;br /&gt;
out dx,al&lt;br /&gt;
neg al						; (play deeper notes louder = add bass)&lt;br /&gt;
add al,127+39				; ... play it THAT loud&lt;br /&gt;
out dx,al&lt;br /&gt;
nomusic:&lt;br /&gt;
iret&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== PC Speaker ===&lt;br /&gt;
&lt;br /&gt;
Producing sound with PC speakers is incredibly easy. Basically, you set a system timer to a desired frequency, then connect this timer to the speaker. [http://wiki.osdev.org/PC_Speaker The PC Speaker Article] from OSDEV Wiki has the details about it. An example for a tiny intro that uses PC speaker music is [http://www.pouet.net/prod.php?which=67833 SpeaCore]&lt;br /&gt;
&lt;br /&gt;
==== Basic example with melody pattern ====&lt;br /&gt;
&lt;br /&gt;
A very optimized and dirty variant of producing sound with the speaker is this 12 byte snippet (sound routine from [http://www.pouet.net/prod.php?which=67829 the tiny intro &amp;quot;darkweb&amp;quot;]):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;hlt				; sync to timer1&lt;br /&gt;
inc bx			; increment our counter&lt;br /&gt;
mov ax,bx		; work with a copy&lt;br /&gt;
or al,0x4B      ; melody pattern + 2 LSB for speaker link&lt;br /&gt;
out 0x42,al		; set new countdown for timer2 (two passes)&lt;br /&gt;
out 0x61,al		; link timer2 to PC speaker (2 LSBs are 1)&lt;br /&gt;
jmp si			; rinse and repeat&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
Instead of sending low and high byte of our divisor directly in succession, we do it the &amp;quot;two path&amp;quot; way. That reduces the amount of possible frequencies to 255, which is still good enough for some rough sounds. Linking the timer to the PC speaker might not be obvious : Normally you would read the value of port 0x61, set the two least significant bits to TRUE and write the value again. You can save on all of this, if you just send the &amp;quot;two path&amp;quot; value which you just used for the timer if that value has the two least significant bits already set (''or al,0x4B'' does this). Be aware that port 0x61 does many things apart from just connecting the timer to the speaker. A useful resource for ports in general is the [http://bochs.sourceforge.net/techspec/PORTS.LST Bochs Ports List], for port 0x61 it displays:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
''0061	w	KB controller port B (ISA, EISA)   (PS/2 port A is at 0092)&lt;br /&gt;
&lt;br /&gt;
		system control port for compatibility with 8255&lt;br /&gt;
&lt;br /&gt;
		bit 7	(1= IRQ 0 reset )&lt;br /&gt;
&lt;br /&gt;
		bit 6-4    reserved&lt;br /&gt;
&lt;br /&gt;
		bit 3 = 1  channel check enable&lt;br /&gt;
&lt;br /&gt;
		bit 2 = 1  parity check enable&lt;br /&gt;
&lt;br /&gt;
		'''bit 1 = 1  speaker data enable'''&lt;br /&gt;
&lt;br /&gt;
		'''bit 0 = 1  timer 2 gate to speaker enable''' ''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
So if you experience strange things with highly optimized pc speaker output, revert to the safe way. The described way works with real hardware and DosBox. Unfortunately, both Orcacle Virtual Box with MsDos 6.22 and Windows XP NTVDM seem not to properly emulate PC speakers (Investigation and citation needed here!)&lt;br /&gt;
&lt;br /&gt;
==== Simple deep sound in 8 bytes ====&lt;br /&gt;
&lt;br /&gt;
One of the smallest possible PC speaker sound generation might be this 8 byte snippet :&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;dec ax			; AX initially 0000h -&amp;gt; AL = 0xFF&lt;br /&gt;
out 42h,al		; change divisor of timer2 to 0xFFFF&lt;br /&gt;
out 42h,al		; resulting in a very low frequency&lt;br /&gt;
out 61h,al		; 2 LSBs are set, connect timer to speaker&lt;br /&gt;
ret				; quit&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(Note: This may fail on actual hardware, as there might not be time for the bus to settle between the consecutive &amp;lt;code&amp;gt;out 42h,al&amp;lt;/code&amp;gt; statements.)&lt;br /&gt;
&lt;br /&gt;
=== COVOX output (aka LPT DAC) ===&lt;br /&gt;
&lt;br /&gt;
It is possible to output to an LPT-connected DAC (&amp;quot;[https://en.wikipedia.org/wiki/Covox_Speech_Thing COVOX]&amp;quot;) in a tinyprog.  A proof-of-concept example is [http://www.pouet.net/prod.php?which=57991 Express Train 125] which uses COVOX for sound generation.  &lt;br /&gt;
&lt;br /&gt;
This method follows the [http://countercomplex.blogspot.com/2011/10/algorithmic-symphonies-from-one-line-of.html &amp;quot;audio from one line of C code&amp;quot;] style of sound generation.  [http://www.pouet.net/topic.php?which=8357&amp;amp;page=1 A pouet discussion] exists for more background information.&lt;br /&gt;
&lt;br /&gt;
Here are some detailed isntructions on how a very small bytebeat player can be realized: [[MUZAK58]]&lt;br /&gt;
&lt;br /&gt;
=== Advanced PC Speaker and COVOX sound via interrupt ===&lt;br /&gt;
&lt;br /&gt;
For a more advanced use of PC Speaker or COVOX sound output for tiny intros,&lt;br /&gt;
also regarding a specific timing to a desired sample frequency playback, the use of an interrupt&lt;br /&gt;
timer is recommended. To illustrate this we take a so called bytebeat and make it into a workable&lt;br /&gt;
code example for PC Speaker and COVOX.&lt;br /&gt;
&lt;br /&gt;
The major difference between the two is that COVOX has the benefit of a precision of 8 bits and PC Speaker usually only 6 bits.&lt;br /&gt;
Furthermore the setup/access is different as shown in the sections before. Regarding size of the code and quality of the&lt;br /&gt;
sound COVOX is preferable.&lt;br /&gt;
&lt;br /&gt;
The bytebeat we are using as an example can be seen and heard [http://wurstcaptures.untergrund.net/music/?oneliner=((t%264096)%3F((t*(t%5Et%25255)%7C(t%3E%3E4))%3E%3E1)%3A(t%3E%3E3)%7C((t%268192)%3Ft%20%3C%3C2%3At))&amp;amp;rate=11025 here.] It's based on JavaScript syntax: ((t&amp;amp;4096)?((t*(t^t%255)|(t&amp;gt;&amp;gt;4))&amp;gt;&amp;gt;1):(t&amp;gt;&amp;gt;3)|((t&amp;amp;8192)?t &amp;lt;&amp;lt;2:t))&lt;br /&gt;
&lt;br /&gt;
Bytebeat code like this can be directly ported to assembler by evaluating the single expressions step by step as you can&lt;br /&gt;
see in the implementations here. Those examples work within DOSBox and should also run on real hardware with FreeDOS. It doesn't&lt;br /&gt;
show any graphical output, it just plays the bytebeat until a key is pressed. Your graphics routine should be placed right after&lt;br /&gt;
the 'main' label. &lt;br /&gt;
&lt;br /&gt;
==== PC Speaker variant ====&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;org 100h&lt;br /&gt;
&lt;br /&gt;
 mov    ax,3508h			;21h, ah=35h get interrupt handler | al=08h interrupt number (PIT timer)&lt;br /&gt;
 int    21h					;return: es:bx&lt;br /&gt;
 push   es&lt;br /&gt;
 push   bx					;backup current interrupt handler&lt;br /&gt;
 mov    cx,63 + 108*256  	;PIT counter divisor = 108 and speaker enable for init   &lt;br /&gt;
 mov    bl,90h				;10010000b =&amp;gt; on &amp;quot;init&amp;quot;  	&lt;br /&gt;
							;Bit0   =   0 counter 16 Bits set&lt;br /&gt;
							;Bit3-1 = 000 mode 0 select&lt;br /&gt;
							;Bit5-4 =  01 read/write counter bits 0-7 only&lt;br /&gt;
							;Bit7-6 =  10 counter 2 select&lt;br /&gt;
&lt;br /&gt;
 mov    dx,irq				;new handler address&lt;br /&gt;
 call   init&lt;br /&gt;
&lt;br /&gt;
main:&lt;br /&gt;
 mov    ah,0&lt;br /&gt;
 int    16h					;ah = 0, int16h =&amp;gt; read keypress&lt;br /&gt;
&lt;br /&gt;
 pop    dx&lt;br /&gt;
 pop    ds					;restore handler address at exit&lt;br /&gt;
 xor    cx,cx				;PIT counter divisor = 0 and speaker disable for exit &lt;br /&gt;
 mov    bl,0b6h				;bl = 10110110b =&amp;gt; at exit&lt;br /&gt;
init:&lt;br /&gt;
 xchg   ax,cx &lt;br /&gt;
 out    61h,al				;al = 0 or 63 =&amp;gt; Bit0 = 1 timer 2 gate to speaker enable,&lt;br /&gt;
 mov    al,ah				;Bit1 = 1 speaker data enable ...or disable both at al = 0&lt;br /&gt;
 out    40h,al				;al = 0 or 108 =&amp;gt; write PIT counter 0 divisor&lt;br /&gt;
 salc&lt;br /&gt;
 out    40h,al				;al = 0 =&amp;gt; write PIT counter 0 divisor again = 0 high byte&lt;br /&gt;
							;=&amp;gt; this results in a frequency for the interrupt call of 11025 Hz.&lt;br /&gt;
							;as clock is 1,19318181818 MHz =&amp;gt; 1,19318181818 MHz / 108 = 11025 Hz&lt;br /&gt;
 xchg   ax,bx				;al=bl  = 10110110b&lt;br /&gt;
 out    43h,al				;Bit0   =   0 counter 16 Bits set&lt;br /&gt;
							;Bit3-1 = 011 mode 3 select, square wave generator&lt;br /&gt;
							;Bit5-4 =  11 read/write counter bits 0-7 first, then 8-15&lt;br /&gt;
							;Bit7-6 =  10 counter 2 select&lt;br /&gt;
 mov    ax,2508h			;21h, ah=25h set interrupt handler | al=08h interrupt number (PIT timer)  &lt;br /&gt;
 int    21h&lt;br /&gt;
retn&lt;br /&gt;
&lt;br /&gt;
;bytebeat: ((t&amp;amp;4096)?((t*(t^t%255)|(t&amp;gt;&amp;gt;4))&amp;gt;&amp;gt;1):(t&amp;gt;&amp;gt;3)|((t&amp;amp;8192)?t&amp;lt;&amp;lt;2:t))&lt;br /&gt;
irq:&lt;br /&gt;
 pusha&lt;br /&gt;
 mov    bp,255&lt;br /&gt;
 mov    ax,0            	;ax: t&lt;br /&gt;
.counter:&lt;br /&gt;
 mov    cx,ax&lt;br /&gt;
 shr    cx,3            	;cx: (t&amp;gt;&amp;gt;3)&lt;br /&gt;
 test   ax,4096         	;(t&amp;amp;4096)?&lt;br /&gt;
 jz     .1&lt;br /&gt;
 mov    bx,ax           	;bx: t&lt;br /&gt;
 sub    dx,dx           	;dx:ax t&lt;br /&gt;
 div    bp              	;dx: (t%255)&lt;br /&gt;
 xor    dx,bx           	;dx: (t^(t%255))&lt;br /&gt;
 shr    cx,1            	;cx: (t&amp;gt;&amp;gt;4)&lt;br /&gt;
 xchg   ax,bx           	;ax: t&lt;br /&gt;
 mul    dx              	;ax: t*(t^(t%255))&lt;br /&gt;
 or     ax,cx           	;ax: t*(t^(t%255))|(t&amp;gt;&amp;gt;4)&lt;br /&gt;
 shr    ax,1            	;ax: (t*(t^(t%255))|(t&amp;gt;&amp;gt;4))&amp;gt;&amp;gt;1&lt;br /&gt;
 jmp    .3&lt;br /&gt;
.1:&lt;br /&gt;
 test   ax,8192         	;(t&amp;amp;8192)?&lt;br /&gt;
 jz     .2&lt;br /&gt;
 shl    ax,2            	;ax: (t&amp;lt;&amp;lt;2)&lt;br /&gt;
.2:&lt;br /&gt;
 or     ax,cx           	;ax: ax|(t&amp;gt;&amp;gt;3)&lt;br /&gt;
.3:&lt;br /&gt;
 shr    al,2            	;downscale to 6 bits&lt;br /&gt;
 jz     .4&lt;br /&gt;
 out    42h,al				;write 6 Bit data to speaker (PIT counter 2)&lt;br /&gt;
.4:&lt;br /&gt;
&lt;br /&gt;
 inc    word [bp-255+irq.counter-2]&lt;br /&gt;
 mov    al,20h				;00100000b&lt;br /&gt;
 out    20h,al				;Bit 5 = 1 send End Of Interrupt (EOI) signal&lt;br /&gt;
 popa&lt;br /&gt;
iret&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== COVOX variant ====&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;org 100h&lt;br /&gt;
&lt;br /&gt;
 mov    ax,3508h			;21h, ah=35h get interrupt handler | al=08h interrupt number (PIT timer)&lt;br /&gt;
 int    21h					;return: es:bx&lt;br /&gt;
 push   es&lt;br /&gt;
 push   bx					;backup current interrupt handler&lt;br /&gt;
 mov    al,108				;PIT counter divisor&lt;br /&gt;
 mov    dx,irq				;new handler address&lt;br /&gt;
 call   init&lt;br /&gt;
&lt;br /&gt;
main:&lt;br /&gt;
 mov    ah,0&lt;br /&gt;
 int    16h					;ah = 0, int16h =&amp;gt; read keypress&lt;br /&gt;
 pop    dx&lt;br /&gt;
 pop    ds					;restore handler address at exit&lt;br /&gt;
 salc						;al = 0 at exit&lt;br /&gt;
&lt;br /&gt;
init:&lt;br /&gt;
 out    40h,al				;al = 0 or 108 =&amp;gt; write PIT counter 0 divisor = 108 low byte&lt;br /&gt;
 salc&lt;br /&gt;
 out    40h,al				;al = 0 =&amp;gt; write PIT counter 0 divisor again = 0 high byte&lt;br /&gt;
							;=&amp;gt; this results in a frequency for the interrupt call of 11025 Hz.&lt;br /&gt;
							;as clock is 1,19318181818 MHz =&amp;gt; 1,19318181818 MHz / 108 = 11025 Hz&lt;br /&gt;
 mov    ax,2508h			;21h, ah=25h set interrupt handler | al=08h interrupt number (PIT timer)&lt;br /&gt;
 int    21h                                                             &lt;br /&gt;
 retn&lt;br /&gt;
&lt;br /&gt;
;bytebeat: ((t&amp;amp;4096)?((t*(t^t%255)|(t&amp;gt;&amp;gt;4))&amp;gt;&amp;gt;1):(t&amp;gt;&amp;gt;3)|((t&amp;amp;8192)?t&amp;lt;&amp;lt;2:t))&lt;br /&gt;
irq:&lt;br /&gt;
 pusha&lt;br /&gt;
 mov    bp,255&lt;br /&gt;
 mov    ax,0				;ax: t&lt;br /&gt;
.counter:&lt;br /&gt;
 mov    cx,ax&lt;br /&gt;
 shr    cx,3				;cx: (t&amp;gt;&amp;gt;3)&lt;br /&gt;
 test   ax,4096				;(t&amp;amp;4096)?&lt;br /&gt;
 jz     .1&lt;br /&gt;
 mov    bx,ax				;bx: t&lt;br /&gt;
 sub    dx,dx				;dx: ax t&lt;br /&gt;
 div    bp					;dx: (t%255)&lt;br /&gt;
 xor    dx,bx				;dx: (t^(t%255))&lt;br /&gt;
 shr    cx,1				;cx: (t&amp;gt;&amp;gt;4)&lt;br /&gt;
 xchg   ax,bx				;ax: t&lt;br /&gt;
 mul    dx					;ax: t*(t^(t%255))&lt;br /&gt;
 or     ax,cx				;ax: t*(t^(t%255))|(t&amp;gt;&amp;gt;4)&lt;br /&gt;
 shr    ax,1				;ax: (t*(t^(t%255))|(t&amp;gt;&amp;gt;4))&amp;gt;&amp;gt;1&lt;br /&gt;
 jmp   .3&lt;br /&gt;
.1:&lt;br /&gt;
 test   ax,8192				;(t&amp;amp;8192)?&lt;br /&gt;
 jz    .2&lt;br /&gt;
 shl    ax,2				;ax: (t&amp;lt;&amp;lt;2)&lt;br /&gt;
.2:&lt;br /&gt;
 or     ax,cx				;ax: ax|(t&amp;gt;&amp;gt;3)&lt;br /&gt;
.3:&lt;br /&gt;
 mov    dx,0378h			;LPT1 parallel port address&lt;br /&gt;
 out    dx,al				;write 8 Bit sample data&lt;br /&gt;
 inc    word[bp-255+irq.counter-2]&lt;br /&gt;
 mov    al,20h              ;00100000b&lt;br /&gt;
 out    20h,al				;Bit 5 = 1 send End Of Interrupt (EOI) signal&lt;br /&gt;
 popa&lt;br /&gt;
 iret&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Further notes on the two variants ====&lt;br /&gt;
&lt;br /&gt;
It's important to set and know the sample frequency you want. E.g. if you want to port the frequency from 11025 Hz to e.g. 18939 Hz&lt;br /&gt;
for the same sound you need to change the following code parts e.g. for COVOX. Pay attention that also the bytebeat parameters where adjusted to fit more or less the double frequency:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;;...snip...&lt;br /&gt;
 mov    al,63				;PIT counter divisor instead of 108 =&amp;gt; 1,19318181818 MHz / 63 = 18939 Hz&lt;br /&gt;
;...snip...&lt;br /&gt;
;bytebeat: ((t&amp;amp;8192)?((t*(t^t%255)|(t&amp;gt;&amp;gt;5))&amp;gt;&amp;gt;1):(t&amp;gt;&amp;gt;4)|((t&amp;amp;16192)?t&amp;lt;&amp;lt;2:t))&lt;br /&gt;
;...snip...&lt;br /&gt;
.counter:&lt;br /&gt;
 mov    cx,ax&lt;br /&gt;
 shr    cx,4            	;cx: (t&amp;gt;&amp;gt;4)&lt;br /&gt;
 test   ax,8192         	;(t&amp;amp;8192)?&lt;br /&gt;
 jz     .1              	&lt;br /&gt;
 mov    bx,ax           	;bx: t&lt;br /&gt;
 sub    dx,dx           	;dx:ax t&lt;br /&gt;
 div    bp              	;dx: (t%255)&lt;br /&gt;
 xor    dx,bx           	;dx: (t^(t%255))&lt;br /&gt;
 shr    cx,1            	;cx: (t&amp;gt;&amp;gt;5)&lt;br /&gt;
 xchg   ax,bx           	;ax: t&lt;br /&gt;
 mul    dx              	;ax: t*(t^(t%255))&lt;br /&gt;
 or     ax,cx           	;ax: t*(t^(t%255))|(t&amp;gt;&amp;gt;4)&lt;br /&gt;
 shr    ax,1            	;ax: (t*(t^(t%255))|(t&amp;gt;&amp;gt;4))&amp;gt;&amp;gt;1&lt;br /&gt;
 jmp    .3&lt;br /&gt;
.1:&lt;br /&gt;
 test   ax,16384        	;(t&amp;amp;16384)?&lt;br /&gt;
 jz     .2&lt;br /&gt;
 shl    ax,2            	;ax: (t&amp;lt;&amp;lt;2)&lt;br /&gt;
.2:&lt;br /&gt;
 or     ax,cx           	;ax: ax|(t&amp;gt;&amp;gt;3)&lt;br /&gt;
;...snip...&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The routine here uses a frequency of 18939 Hz. So regarding the 16 bit timer used here this would result in a length of a maximum&lt;br /&gt;
of 65535/18939 = 3.46 seconds before everything loops. Usually that would be enough for some drumbeat, but not for a complete song&lt;br /&gt;
or melody. In that case you have to use another register as a 'top' timer to trigger your changes for the sound.&lt;br /&gt;
&lt;br /&gt;
One more thing to check and maybe modify if you hear an imperfect sound is the timing regarding when a sample value is actually &amp;quot;played&amp;quot;.&lt;br /&gt;
Preferable you would want to play each sample value at exactly the same time. But as your sample generation routine might need a different&lt;br /&gt;
amount of CPU cycles each time the interrupt is called this can differ all the time, when code is used like above.&lt;br /&gt;
&lt;br /&gt;
One solution for this is to play the sample calculated from the last interrupt call right away when the interrupt is called the next time.&lt;br /&gt;
You can do that via self-modifying code like shown here. It takes 5 Bytes more:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;;...snip...&lt;br /&gt;
irq:&lt;br /&gt;
pusha&lt;br /&gt;
mov    dx,0378h&lt;br /&gt;
mov    al,0           &lt;br /&gt;
.sample:&lt;br /&gt;
out    dx,al&lt;br /&gt;
mov    bp,255&lt;br /&gt;
mov    ax,0            ; ax: t&lt;br /&gt;
.counter:&lt;br /&gt;
;...snip...&lt;br /&gt;
inc    word [bp-255+irq.counter-2]&lt;br /&gt;
mov    byte [bp-255+irq.sample-1],al&lt;br /&gt;
mov    al,20h&lt;br /&gt;
out    20h,al&lt;br /&gt;
;...snip...&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some remarks: All the code above is not optimized to the max regarding size due to educational reasons.&lt;br /&gt;
Depending on your code and dependency of the interrupt subroutine you can do several size optimizations.&lt;br /&gt;
&lt;br /&gt;
Instead of using the interrupt &amp;lt;code&amp;gt;08&amp;lt;/code&amp;gt; theoretically the user defined interrupt number &amp;lt;code&amp;gt;1c&amp;lt;/code&amp;gt; could be&lt;br /&gt;
used also, but by now this seems to work only with DOSBox but not on a real system with FreeDOS. Further tests&lt;br /&gt;
are needed to see what is the problem here. The use of interrupt &amp;lt;code&amp;gt;1c&amp;lt;/code&amp;gt; would save 4 bytes as the following&lt;br /&gt;
code lines to finalize the interrupt could be omitted in the examples above:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;;...snip...&lt;br /&gt;
mov al,20h&lt;br /&gt;
out 20h,al&lt;br /&gt;
;...snip...&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
What can be done with bytebeats ? Basically everything :-) You can look at this [http://www.pouet.net/topic.php?which=8357&amp;amp;page=1 thread] on pouet to get an idea and check the first PC Speaker intro with bytebeat: [http://www.pouet.net/prod.php?which=71766 TCTRONIC by TomCat/Abaddon].&lt;br /&gt;
 &lt;br /&gt;
Some basic waveforms can be encoded like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;;sawtooth wave&lt;br /&gt;
t &amp;amp; 127&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;;square wave&lt;br /&gt;
t &amp;amp; 128&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;;triangle wave&lt;br /&gt;
t ^ ((t &amp;amp; 128) * 127)&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How would you go from here to create a specific tone, e.g. an &amp;quot;A4&amp;quot;, which would have a frequency of 440 Hz (Check this &lt;br /&gt;
[https://en.wikipedia.org/wiki/Piano_key_frequencies link] to get a list for the frequencies of the notes) ?&lt;br /&gt;
For that you have to relate the set frequency of the interrupt to the tone frequency and the length of one wave of your&lt;br /&gt;
wave generator. If we have a sawtooth of &amp;lt;code&amp;gt;t&amp;amp;127&amp;lt;/code&amp;gt; at 22050 Hz this would result in a tone of 22050/128 = 172.3 Hz.&lt;br /&gt;
To reach 440 Hz we can simply stretch/multiply the timer by 440/172.3 = 2.554 to hear the desired note:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
(t*2.554) &amp;amp; 127&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://naivesound.comglitch Glitch] is another online tool to create bytebeat sounds with some enhanced syntax. There are several good reads on this tool and the theorise behind it and byte beat [https://medium.com/@naive_sound here]&lt;br /&gt;
&lt;br /&gt;
Of course there are endless possibilities and the whole world of real time sound calculation/generation is open to you. Here are some tiny intros which use this techniques already: [http://www.pouet.net/prod.php?which=77741 Plasmifier cover 256B], [http://www.pouet.net/prod.php?which=70599 2(56)unlimited], [http://www.pouet.net/prod.php?which=78505 somehow].&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1327</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1327"/>
				<updated>2024-02-15T16:36:12Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. Watch the video [https://www.youtube.com/watch?v=AX5OYQzzi1g here]. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Any commercial use is not permitted.&lt;br /&gt;
&lt;br /&gt;
Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to play a bytebeat through the PC speaker at lower quality, as described here: [[Output#PC_Speaker_variant]]&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases, but very Lo-Fi then. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter&lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between handler and non-interrupt code. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update, which might eat up quite some performance.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
            salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
  suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
  _tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
            jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
            mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
            inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
  ntick:&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
            jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
[http://wiki.osdev.org/Programmable_Interval_Timer Programmable Interval Timer]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This solution may result in the shortest code. But one drawback is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the time represented by the number of the sample, which is also equal to the byte offset of the stream. It will start generation of the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will play the sample ''f(8000)'' after exactly 1 second. Since this is actually a Softsynth (music synthesis done by software), in theory any sound or music can be aproximated this way. There are no limits except the increasing complexity of the resulting function.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample to be written to the COVOX LPT1. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. [https://www.pouet.net/prod.php?which=96071 MUZAK58] is to some degree a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from sequence tables stored in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast. The sqeuence table of this reference example is 10 bytes in total and looks like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
  seqtbl:   dw      0x1413&lt;br /&gt;
            dw      0x6C66&lt;br /&gt;
            dw      0x2242&lt;br /&gt;
            dw      0x6495&lt;br /&gt;
            dw      0x4484&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The method used to achieve a size-optimized, but still flexible bytebeat is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequencer ==&lt;br /&gt;
As you can read in many Bytebeat tutorials, like [[Steady_On_Tim]] by Gasman or inside the paper published by viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function ''o(t)'', like saw-tooth, square, triangle or sine waveforms, by multiplying the time parameter ''t'' by a scale factor ''p'': ''f(t) = o(t*p)''. This multiplication factor will modulate the pitch. If we then use a sequence table ''s(t)'' to replace ''p'', that will change the pitch of our oscillator over time, then we can already play some simple melody using this formula: 'f(t) = o(t*s(t))'.&lt;br /&gt;
&lt;br /&gt;
Accordingly, we implement a single pitch-modulated oscillator with saw-tooth waveform:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;255&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9473119236434ffff34df800 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as this should be defined by a sequence table. This example demonstrates a sequence of 8 steps, with ''S=8'' specifying the number of steps. Each step will change the pitch of the resulting saw-tooth waveform.&lt;br /&gt;
&lt;br /&gt;
Replacing the trailing &amp;quot;&amp;amp;255&amp;quot; (implicit for a bytebeat) by &amp;quot;&amp;amp;128&amp;quot; would change the saw-tooth waveform into a square-wave function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;128&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9dc1d9b391be7fffcfb76000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Also other waveforms are possible here. Here we use the sine function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  sin(t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8]/14)*127+127&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d000001002f0000000000000000399a4a1a8bae05d329e28520c901366398da262860ce3ea49cc63383ad4015395d56ced153c2b5712a75c831dca7c583fffcb53000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Or distortion-like effects can be applied, like demonstrated here by using the XOR operator in the last step:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])^64&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100240000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024f067719de4f113fffded5400 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of using only one modulated waveform or one sequence, also 2 or more can be used and combined, like i.e.: ''f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2''&lt;br /&gt;
&lt;br /&gt;
So far, this are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with one sequence table that holds as many steps ''S'' as we need for our tune, or at least as many as we can effort due to size constraints. The more steps ''S'' we spent, the longer the tune will durate before it repeats. The larger the value of each sequence step could be, with a value range limited by ''log2(M)'' bits per step, the larger the range of notes we can use in the end. Both parameters ''S'' and ''M'' will define the final byte-size of our sequence table.&lt;br /&gt;
&lt;br /&gt;
== Cascaded Sequences ==&lt;br /&gt;
&lt;br /&gt;
The issue we will face with this approach in size-coding is, that such a sequence table will quickly grow and consumes quite a lot of bytes in the end. Our reference example here, [https://www.pouet.net/prod.php?which=96071 MUZAK58], spends 10 bytes for it's entire tune. If we take our knowledge at this point, then we would be able to use those 10 bytes to divide them into a sequence of 40 steps (''S = 40''), as long as the limited range per step given by 4 bits (''M = 2^4 = 16'') is enough for the music composition we have in mind. 40 steps is not less, but the heavily limited range of less than 1 octave will limit us to something that we would very likely call a children's song in the end. The reference tune instead sounds like being build out of at least a multiple of 32 steps, before it starts to repeat. And the octave range also does not appear to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table in this way?&lt;br /&gt;
&lt;br /&gt;
The trick discovered by wiRe here is to cascade multiple sequencers and combine all their output into a single sequence of much longer sequence duration (before repetition) and of wider pitch range per sequence step: ''s(t) = (s0(t) * s1(t) * s2(t) * ...) / C''&lt;br /&gt;
&lt;br /&gt;
But this will limit the freedom of the composer, you could think now. True! But you will see that the results you can achieve this way are not that bad as you may expect first. Indeed, the resulting limitation can even turn out to give new impulses to creativeness; something we already know as the sizecoding effect.&lt;br /&gt;
&lt;br /&gt;
Here is some attempt to visualize how such an cascaded sequence will develop over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequencer step count. ''O'' is the time divider to derive the step count ''stepcnt = t / O'' with ''O = log2(ticks_per_step)'' to avoid any integer division.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;text&amp;quot;&amp;gt;&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | stpcnt | (t&amp;gt;&amp;gt;O)       | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | seq0ix | (t&amp;gt;&amp;gt;(O+0))%S | 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ... |&lt;br /&gt;
  | seq1ix | (t&amp;gt;&amp;gt;(O+1))%S | 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 ... |&lt;br /&gt;
  | seq2ix | (t&amp;gt;&amp;gt;(O+2))%S | 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 ... |&lt;br /&gt;
  | seq3ix | (t&amp;gt;&amp;gt;(O+3))%S | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 ... |&lt;br /&gt;
  | seq4ix | (t&amp;gt;&amp;gt;(O+4))%S | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In combination with our oscillator function, the entire bytebeat will finally look like this: ''f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )''&lt;br /&gt;
&lt;br /&gt;
== Final Bytebeat Implementation ==&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple saw-tooth oscillator, we have an easy game with our oscillator function beeing as simple as ''o(t) = t''. As we figured out, the function ''f(t)'' is then only comprised of ''N+1'' terms, all multiplied together like this: ''f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C''. On each loop iteration of the final bytebeat player, the current sequencer ''sN(t)'' is evaluated by calculating the current sequencer index and looking that up inside the sequencer table. The value stored there for this step is then multiplied to the total result of ''f(t)''. If we keep ''M'' low, then even a 16-bit multiplication is enough here. The final scale factor ''C'' depends on the range of the values derived from the sequencer functions ''sN(t)''. Scaling happens as a shift-right-operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift by 8.&lt;br /&gt;
&lt;br /&gt;
In the reference implementation, a total of 5 cascaded sequencers was used: ''N=5''. Each sequencer's table was chosen to store 4 sequence steps: ''S=4''. Which sequencer step to index is then based on 2-bits of parameter ''t''. The fastest sequencer step time was chosen to be ''2^10'' timer ticks or samples, giving us ''O=10''. This means the lookup index for each sequencer ''i'' with ''0 &amp;lt;= i &amp;lt; N'' is derived by ''(t&amp;gt;&amp;gt;(O+i))%S'', where O as start offset can be chosen different for each music tune. Each step value is limited by ''M=16''. Putting this all together, we can now start composing one music tune this way:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  static constexpr unsigned O = 10;&lt;br /&gt;
  static constexpr unsigned N = 5;&lt;br /&gt;
  static constexpr unsigned S = 4;&lt;br /&gt;
  static constexpr unsigned C = 256;&lt;br /&gt;
&lt;br /&gt;
  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };&lt;br /&gt;
&lt;br /&gt;
  uint8_t get_next_sample( uint16_t t ) {&lt;br /&gt;
    for( unsigned i = 0; i &amp;lt; N; i++ ) t *= seqtbl[i][(t&amp;gt;&amp;gt;(O+i))%S];&lt;br /&gt;
    return static_cast&amp;lt;uint8_t&amp;gt;(t / C);&lt;br /&gt;
  }&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;v=circles&amp;amp;bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
= The Sourcecode =&lt;br /&gt;
&lt;br /&gt;
With all these parameters being chosen carefully, the final bytebeat implementation and sequence tables will become very small. Here is the documented source code of [https://www.pouet.net/prod.php?which=96071 MUZAK58]:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
           ; MUZAK58 by wiRe/NpM&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
            section .text&lt;br /&gt;
            org     100h&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;muzak sequence table&lt;br /&gt;
  seqtbl:   dw      0x1413                    ;1314       ;  t * [3,1,4,1][3&amp;amp;t&amp;gt;&amp;gt;10]       ;! 1314       adc dx,[si]&lt;br /&gt;
            dw      0x6C66                    ;666C       ;    * [6,6,12,6][3&amp;amp;t&amp;gt;&amp;gt;11]      ;! 666C       o32 insb&lt;br /&gt;
            dw      0x2242                    ;4222       ;    * [2,4,2,2][3&amp;amp;t&amp;gt;&amp;gt;12]       ;! 42         inc dx&lt;br /&gt;
            dw      0x6495                    ;9564       ;    * [5,9,4,6][3&amp;amp;t&amp;gt;&amp;gt;13]       ;! 22956484   and dl,[di-0x7b9c]&lt;br /&gt;
            dw      0x4484                    ;8444       ;    * [4,8,4,4][3&amp;amp;t&amp;gt;&amp;gt;14] &amp;gt;&amp;gt; 8  ;! 44         inc sp&lt;br /&gt;
&lt;br /&gt;
            mov     al, 0b00010000            ;B010       ;write 8253/8254 PIT command/mode register: resets PIT channel #0&lt;br /&gt;
            out     43h, al                   ;E643       ;  [7:6] channel #0, [5:4] LSB only, [3:1] mode0 (one-shot), [0] 16-bit binary&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;present next audio sample (DX:BX = 32-bit sample counter)&lt;br /&gt;
  bbeat:    add     al, 149                   ;04xx       ;  calculate new timer period (AL = 42..148)&lt;br /&gt;
            out     40h, al                   ;E640       ;  rearm timer&lt;br /&gt;
&lt;br /&gt;
            inc     bx                        ;43         ;  increment 16-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            pusha                             ;60         ;  store all registers&lt;br /&gt;
           ;mov     si, seqtbl                ;BExxxx     ;  load address of sequence table into SI (here SI already points to seqtbl by default)&lt;br /&gt;
            mov     dx, bx                    ;89DA       ;  load start value into DX&lt;br /&gt;
            mov     cl, 5                     ;B1xx       ;  init index counter inside CX (CH must be zero already!)&lt;br /&gt;
  bbeat_lp: push    cx                        ;51         ;  store CX counter&lt;br /&gt;
            mov     cl, 01100b                ;B1xx       ;  get bit sequence from time into CL&lt;br /&gt;
            and     cl, bh                    ;20F9       ;    CL := offset to 1 out of 4 entries&lt;br /&gt;
            lodsw                             ;AD         ;  load next sequence table entry (AX := DS:[SI]; SI := SI + 2)&lt;br /&gt;
            ror     ax, cl                    ;D3C8       ;  select sequence entry at bit-offset 0, 4, 8 or 12&lt;br /&gt;
            and     ax, 01111b                ;83E00F     ;  each sequence entry is 4 bits only (AX &amp;amp;= 15)&lt;br /&gt;
            mul     dx                        ;F7E2       ;  multiply (DX:AX := AX ∗ DX)&lt;br /&gt;
            xchg    ax, dx                    ;92         ;    DX := updated 16-bit sample&lt;br /&gt;
            pop     cx                        ;59         ;  restore CX counter&lt;br /&gt;
            shr     bx, 1                     ;D1ED       ;  get next bit sequence from time&lt;br /&gt;
            loop    bbeat_lp                  ;E2xx       ;  loop until all bits are out&lt;br /&gt;
&lt;br /&gt;
            mov     al, dh                    ;88F0       ;  get sample data into AL&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;  load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;  send 8-bit sample data to COVOX device&lt;br /&gt;
            popa                              ;61         ;  restore all registers (especially BX, CX, DX, SI)&lt;br /&gt;
&lt;br /&gt;
  suplp:   ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)&lt;br /&gt;
            in      al, 40h                   ;E440       ;  read low-byte&lt;br /&gt;
            cmp     al, 148                   ;3Cxx       ;  did timer counter overflowed to 149..0FFh?&lt;br /&gt;
            jo      bbeat                     ;71xx       ;    yes -&amp;gt; play&lt;br /&gt;
&lt;br /&gt;
  bbeat_sk: jmp     short suplp               ;75xx       ;  loop forever&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1326</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1326"/>
				<updated>2024-02-15T16:33:10Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. Watch the video [https://www.youtube.com/watch?v=AX5OYQzzi1g here]. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Any commercial use is not permitted.&lt;br /&gt;
&lt;br /&gt;
Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to play a bytebeat through the PC speaker at lower quality, as described here: [[Output#PC_Speaker_variant]]&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases, but very Lo-Fi then. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter&lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between handler and non-interrupt code. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update, which might eat up quite some performance.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
            salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
  suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
  _tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
            jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
            mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
            inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
  ntick:&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
            jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
[http://wiki.osdev.org/Programmable_Interval_Timer Programmable Interval Timer]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This solution may result in the shortest code. But one drawback is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the time represented by the number of the sample, which is also equal to the byte offset of the stream. It will start generation of the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will play the sample ''f(8000)'' after exactly 1 second. Since this is actually a Softsynth (music synthesis done by software), in theory any sound or music can be aproximated this way. There are no limits except the increasing complexity of the resulting function.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample to be written to the COVOX LPT1. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. [https://www.pouet.net/prod.php?which=96071 MUZAK58] is to some degree a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from sequence tables stored in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast. The sqeuence table of this reference example is 10 bytes in total and looks like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
  seqtbl:   dw      0x1413&lt;br /&gt;
            dw      0x6C66&lt;br /&gt;
            dw      0x2242&lt;br /&gt;
            dw      0x6495&lt;br /&gt;
            dw      0x4484&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The method used to achieve a size-optimized, but still flexible bytebeat is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequencer ==&lt;br /&gt;
As you can read in many Bytebeat tutorials, like [[Steady_On_Tim]] by Gasman or inside the paper published by viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function ''o(t)'', like saw-tooth, square, triangle or sine waveforms, by multiplying the time parameter ''t'' by a scale factor ''p'': ''f(t) = o(t*p)''. This multiplication factor will modulate the pitch. If we then use a sequence table ''s(t)'' to replace ''p'', that will change the pitch of our oscillator over time, then we can already play some simple melody using this formula: 'f(t) = o(t*s(t))'.&lt;br /&gt;
&lt;br /&gt;
Accordingly, we implement a single pitch-modulated oscillator with saw-tooth waveform:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;255&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9473119236434ffff34df800 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as this should be defined by a sequence table. This example demonstrates a sequence of 8 steps, with ''S=8'' specifying the number of steps. Each step will change the pitch of the resulting saw-tooth waveform.&lt;br /&gt;
&lt;br /&gt;
Replacing the trailing &amp;quot;&amp;amp;255&amp;quot; (implicit for a bytebeat) by &amp;quot;&amp;amp;128&amp;quot; would change the saw-tooth waveform into a square-wave function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;128&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9dc1d9b391be7fffcfb76000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Also other waveforms are possible here. Here we use the sine function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  sin(t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8]/14)*127+127&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d000001002f0000000000000000399a4a1a8bae05d329e28520c901366398da262860ce3ea49cc63383ad4015395d56ced153c2b5712a75c831dca7c583fffcb53000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Or distortion-like effects can be applied, like demonstrated here by using the XOR operator in the last step:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])^64&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100240000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024f067719de4f113fffded5400 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of using only one modulated waveform or one sequence, also 2 or more can be used and combined, like i.e.: ''f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2''&lt;br /&gt;
&lt;br /&gt;
So far, this are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with one sequence table that holds as many steps ''S'' as we need for our tune, or at least as many as we can effort due to size constraints. The more steps ''S'' we spent, the longer the tune will durate before it repeats. The larger the value of each sequence step could be, with a value range limited by ''log2(M)'' bits per step, the larger the range of notes we can use in the end. Both parameters ''S'' and ''M'' will define the final byte-size of our sequence table.&lt;br /&gt;
&lt;br /&gt;
== Cascaded Sequences ==&lt;br /&gt;
&lt;br /&gt;
The issue we will face with this approach in size-coding is, that such a sequence table will quickly grow and consumes quite a lot of bytes in the end. Our reference example here, [https://www.pouet.net/prod.php?which=96071 MUZAK58], spends 10 bytes for it's entire tune. If we take our knowledge at this point, then we would be able to use those 10 bytes to divide them into a sequence of 40 steps (''S = 40''), as long as the limited range per step given by 4 bits (''M = 2^4 = 16'') is enough for the music composition we have in mind. 40 steps is not less, but the heavily limited range of less than 1 octave will limit us to something that we would very likely call a children's song in the end. The reference tune instead sounds like being build out of at least a multiple of 32 steps, before it starts to repeat. And the octave range also does not appear to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table in this way?&lt;br /&gt;
&lt;br /&gt;
The trick discovered by wiRe here is to cascade multiple sequencers and combine all their output into a single sequence of much longer sequence duration (before repetition) and of wider pitch range per sequence step: ''s(t) = (s0(t) * s1(t) * s2(t) * ...) / C''&lt;br /&gt;
&lt;br /&gt;
But this will limit the freedom of the composer, you could think now. True! But you will see that the results you can achieve this way are not that bad as you may expect first. Indeed, the resulting limitation can even turn out to give new impulses to creativeness; something we already know as the sizecoding effect.&lt;br /&gt;
&lt;br /&gt;
Here is some attempt to visualize how such an cascaded sequence will develop over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequencer step count. ''O'' is the time divider to derive the step count ''stepcnt = t / O'' with ''O = log2(ticks_per_step)'' to avoid any integer division.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;text&amp;quot;&amp;gt;&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | stpcnt | (t&amp;gt;&amp;gt;O)       | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | seq0ix | (t&amp;gt;&amp;gt;(O+0))%S | 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ... |&lt;br /&gt;
  | seq1ix | (t&amp;gt;&amp;gt;(O+1))%S | 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 ... |&lt;br /&gt;
  | seq2ix | (t&amp;gt;&amp;gt;(O+2))%S | 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 ... |&lt;br /&gt;
  | seq3ix | (t&amp;gt;&amp;gt;(O+3))%S | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 ... |&lt;br /&gt;
  | seq4ix | (t&amp;gt;&amp;gt;(O+4))%S | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In combination with our oscillator function, the entire bytebeat will finally look like this: ''f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )''&lt;br /&gt;
&lt;br /&gt;
== Final Bytebeat Implementation ==&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple saw-tooth oscillator, we have an easy game with our oscillator function beeing as simple as ''o(t) = t''. As we figured out, the function ''f(t)'' is then only comprised of ''N+1'' terms, all multiplied together like this: ''f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C''. On each loop iteration of the final bytebeat player, the current sequencer ''sN(t)'' is evaluated by calculating the current sequencer index and looking that up inside the sequencer table. The value stored there for this step is then multiplied to the total result of ''f(t)''. If we keep ''M'' low, then even a 16-bit multiplication is enough here. The final scale factor ''C'' depends on the range of the values derived from the sequencer functions ''sN(t)''. Scaling happens as a shift-right-operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift by 8.&lt;br /&gt;
&lt;br /&gt;
In the reference implementation, a total of 5 cascaded sequencers was used: ''N=5''. Each sequencer's table was chosen to store 4 sequence steps: ''S=4''. Which sequencer step to index is then based on 2-bits of parameter ''t''. The fastest sequencer step time was chosen to be ''2^10'' timer ticks or samples, giving us ''O=10''. This means the lookup index for each sequencer ''i'' with ''0 &amp;lt;= i &amp;lt; N'' is derived by ''(t&amp;gt;&amp;gt;(O+i))%S'', where O as start offset can be chosen different for each music tune. Each step value is limited by ''M=16''. Putting this all together, we can now start composing one music tune this way:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  static constexpr unsigned O = 10;&lt;br /&gt;
  static constexpr unsigned N = 5;&lt;br /&gt;
  static constexpr unsigned S = 4;&lt;br /&gt;
  static constexpr unsigned A = 256;&lt;br /&gt;
&lt;br /&gt;
  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };&lt;br /&gt;
&lt;br /&gt;
  uint8_t get_next_sample( uint16_t t ) {&lt;br /&gt;
    for( unsigned i = 0; i &amp;lt; N; i++ ) t *= seqtbl[i][(t&amp;gt;&amp;gt;(O+i))%S];&lt;br /&gt;
    return static_cast&amp;lt;uint8_t&amp;gt;(t / A);&lt;br /&gt;
  }&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;v=circles&amp;amp;bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
= The Sourcecode =&lt;br /&gt;
&lt;br /&gt;
With all these parameters being chosen carefully, the final bytebeat implementation and sequence tables will become very small. Here is the documented source code of [https://www.pouet.net/prod.php?which=96071 MUZAK58]:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
           ; MUZAK58 by wiRe/NpM&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
            section .text&lt;br /&gt;
            org     100h&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;muzak sequence table&lt;br /&gt;
  seqtbl:   dw      0x1413                    ;1314       ;  t * [3,1,4,1][3&amp;amp;t&amp;gt;&amp;gt;10]       ;! 1314       adc dx,[si]&lt;br /&gt;
            dw      0x6C66                    ;666C       ;    * [6,6,12,6][3&amp;amp;t&amp;gt;&amp;gt;11]      ;! 666C       o32 insb&lt;br /&gt;
            dw      0x2242                    ;4222       ;    * [2,4,2,2][3&amp;amp;t&amp;gt;&amp;gt;12]       ;! 42         inc dx&lt;br /&gt;
            dw      0x6495                    ;9564       ;    * [5,9,4,6][3&amp;amp;t&amp;gt;&amp;gt;13]       ;! 22956484   and dl,[di-0x7b9c]&lt;br /&gt;
            dw      0x4484                    ;8444       ;    * [4,8,4,4][3&amp;amp;t&amp;gt;&amp;gt;14] &amp;gt;&amp;gt; 8  ;! 44         inc sp&lt;br /&gt;
&lt;br /&gt;
            mov     al, 0b00010000            ;B010       ;write 8253/8254 PIT command/mode register: resets PIT channel #0&lt;br /&gt;
            out     43h, al                   ;E643       ;  [7:6] channel #0, [5:4] LSB only, [3:1] mode0 (one-shot), [0] 16-bit binary&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;present next audio sample (DX:BX = 32-bit sample counter)&lt;br /&gt;
  bbeat:    add     al, 149                   ;04xx       ;  calculate new timer period (AL = 42..148)&lt;br /&gt;
            out     40h, al                   ;E640       ;  rearm timer&lt;br /&gt;
&lt;br /&gt;
            inc     bx                        ;43         ;  increment 16-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            pusha                             ;60         ;  store all registers&lt;br /&gt;
           ;mov     si, seqtbl                ;BExxxx     ;  load address of sequence table into SI (here SI already points to seqtbl by default)&lt;br /&gt;
            mov     dx, bx                    ;89DA       ;  load start value into DX&lt;br /&gt;
            mov     cl, 5                     ;B1xx       ;  init index counter inside CX (CH must be zero already!)&lt;br /&gt;
  bbeat_lp: push    cx                        ;51         ;  store CX counter&lt;br /&gt;
            mov     cl, 01100b                ;B1xx       ;  get bit sequence from time into CL&lt;br /&gt;
            and     cl, bh                    ;20F9       ;    CL := offset to 1 out of 4 entries&lt;br /&gt;
            lodsw                             ;AD         ;  load next sequence table entry (AX := DS:[SI]; SI := SI + 2)&lt;br /&gt;
            ror     ax, cl                    ;D3C8       ;  select sequence entry at bit-offset 0, 4, 8 or 12&lt;br /&gt;
            and     ax, 01111b                ;83E00F     ;  each sequence entry is 4 bits only (AX &amp;amp;= 15)&lt;br /&gt;
            mul     dx                        ;F7E2       ;  multiply (DX:AX := AX ∗ DX)&lt;br /&gt;
            xchg    ax, dx                    ;92         ;    DX := updated 16-bit sample&lt;br /&gt;
            pop     cx                        ;59         ;  restore CX counter&lt;br /&gt;
            shr     bx, 1                     ;D1ED       ;  get next bit sequence from time&lt;br /&gt;
            loop    bbeat_lp                  ;E2xx       ;  loop until all bits are out&lt;br /&gt;
&lt;br /&gt;
            mov     al, dh                    ;88F0       ;  get sample data into AL&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;  load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;  send 8-bit sample data to COVOX device&lt;br /&gt;
            popa                              ;61         ;  restore all registers (especially BX, CX, DX, SI)&lt;br /&gt;
&lt;br /&gt;
  suplp:   ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)&lt;br /&gt;
            in      al, 40h                   ;E440       ;  read low-byte&lt;br /&gt;
            cmp     al, 148                   ;3Cxx       ;  did timer counter overflowed to 149..0FFh?&lt;br /&gt;
            jo      bbeat                     ;71xx       ;    yes -&amp;gt; play&lt;br /&gt;
&lt;br /&gt;
  bbeat_sk: jmp     short suplp               ;75xx       ;  loop forever&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1325</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1325"/>
				<updated>2024-02-15T16:22:03Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. Watch the video [https://www.youtube.com/watch?v=AX5OYQzzi1g here]. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Any commercial use is not permitted.&lt;br /&gt;
&lt;br /&gt;
Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to play a bytebeat through the PC speaker at lower quality, as described here: [[Output#PC_Speaker_variant]]&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases, but very Lo-Fi then. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter&lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between handler and non-interrupt code. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update, which might eat up quite some performance.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
            mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
            salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
            out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
  suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
  _tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
            jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
            mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
            inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
  ntick:&lt;br /&gt;
            ; ...&lt;br /&gt;
&lt;br /&gt;
            jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
[http://wiki.osdev.org/Programmable_Interval_Timer Programmable Interval Timer]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This solution may result in the shortest code. But one drawback is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the time represented by the number of the sample, which is also equal to the byte offset of the stream. It will start generation of the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will play the sample ''f(8000)'' after exactly 1 second. Since this is actually a Softsynth (music synthesis done by software), in theory any sound or music can be aproximated this way. There are no limits except the increasing complexity of the resulting function.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample to be written to the COVOX LPT1. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. MUZAK58 is to some degree a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from sequence tables stored in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast. The sqeuence table of this reference example is 10 bytes in total and looks like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
  seqtbl:   dw      0x1413&lt;br /&gt;
            dw      0x6C66&lt;br /&gt;
            dw      0x2242&lt;br /&gt;
            dw      0x6495&lt;br /&gt;
            dw      0x4484&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The method used to achieve a size-optimized, but still flexible bytebeat is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequencer ==&lt;br /&gt;
As you can read in many Bytebeat tutorials, like [[Steady_On_Tim]] by Gasman or inside the paper published by viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function ''o(t)'', like saw-tooth, square, triangle or sine waveforms, by multiplying the time parameter ''t'' by a scale factor ''p'': ''f(t) = o(t*p)''. This multiplication factor will modulate the pitch. If we then use a sequence table ''s(t)'' to replace ''p'', that will change the pitch of our oscillator over time, then we can already play some simple melody using this formula: 'f(t) = o(t*s(t))'.&lt;br /&gt;
&lt;br /&gt;
Accordingly, we implement a single pitch-modulated oscillator with saw-tooth waveform:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;255&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9473119236434ffff34df800 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as this should be defined by a sequence table. This example demonstrates a sequence of 8 steps, with ''S=8'' specifying the number of steps. Each step will change the pitch of the resulting saw-tooth waveform.&lt;br /&gt;
&lt;br /&gt;
Replacing the trailing &amp;quot;&amp;amp;255&amp;quot; (implicit for a bytebeat) by &amp;quot;&amp;amp;128&amp;quot; would change the saw-tooth waveform into a square-wave function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])&amp;amp;128&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9dc1d9b391be7fffcfb76000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Also other waveforms are possible here. Here we use the sine function:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  sin(t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8]/14)*127+127&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d000001002f0000000000000000399a4a1a8bae05d329e28520c901366398da262860ce3ea49cc63383ad4015395d56ced153c2b5712a75c831dca7c583fffcb53000 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
Or distortion-like effects can be applied, like demonstrated here by using the XOR operator in the last step:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;javascript&amp;quot;&amp;gt;&lt;br /&gt;
  (t*[1,2,4,8,16,8,4,2][(t&amp;gt;&amp;gt;11)%8])^64&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;bb=5d00000100240000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024f067719de4f113fffded5400 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of using only one modulated waveform or one sequence, also 2 or more can be used and combined, like i.e.: ''f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2''&lt;br /&gt;
&lt;br /&gt;
So far, this are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with one sequence table that holds as many steps ''S'' as we need for our tune, or at least as many as we can effort due to size constraints. The more steps ''S'' we spent, the longer the tune will durate before it repeats. The larger the value of each sequence step could be, with a value range limited by ''log2(M)'' bits per step, the larger the range of notes we can use in the end. Both parameters ''S'' and ''M'' will define the final byte-size of our sequence table.&lt;br /&gt;
&lt;br /&gt;
== Cascaded Sequences ==&lt;br /&gt;
&lt;br /&gt;
The issue we will face with this approach in size-coding is, that such a sequence table will quickly grow and consumes quite a lot of bytes in the end. Our reference example here, [https://www.pouet.net/prod.php?which=96071 MUZAK58], spends 10 bytes for it's entire tune. If we take our knowledge at this point, then we would be able to use those 10 bytes to divide them into a sequence of 40 steps (''S = 40''), as long as the limited range per step given by 4 bits (''M = 2^4 = 16'') is enough for the music composition we have in mind. 40 steps is not less, but the heavily limited range of less than 1 octave will limit us to something that we would very likely call a children's song in the end. The reference tune instead sounds like being build out of at least a multiple of 32 steps, before it starts to repeat. And the octave range also does not appear to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table in this way?&lt;br /&gt;
&lt;br /&gt;
The trick discovered by wiRe here is to cascade multiple sequencers and combine all their output into a single sequence of much longer sequence duration (before repetition) and of wider pitch range per sequence step: ''s(t) = (s0(t) * s1(t) * s2(t) * ...) / C''&lt;br /&gt;
&lt;br /&gt;
But this will limit the freedom of the composer, you could think now. True! But you will see that the results you can achieve this way are not that bad as you may expect first. Indeed, the resulting limitation can even turn out to give new impulses to creativeness; something we already know as the sizecoding effect.&lt;br /&gt;
&lt;br /&gt;
Here is some attempt to visualize how such an cascaded sequence will develop over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequencer step count. ''O'' is the time divider to derive the step count ''stepcnt = t / O'' with ''O = log2(ticks_per_step)'' to avoid any integer division.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;text&amp;quot;&amp;gt;&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | stpcnt | (t&amp;gt;&amp;gt;O)       | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
  | seq0ix | (t&amp;gt;&amp;gt;(O+0))%S | 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ... |&lt;br /&gt;
  | seq1ix | (t&amp;gt;&amp;gt;(O+1))%S | 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 ... |&lt;br /&gt;
  | seq2ix | (t&amp;gt;&amp;gt;(O+2))%S | 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 ... |&lt;br /&gt;
  | seq3ix | (t&amp;gt;&amp;gt;(O+3))%S | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 ... |&lt;br /&gt;
  | seq4ix | (t&amp;gt;&amp;gt;(O+4))%S | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 ... |&lt;br /&gt;
  +--------+--------------+---------------------------------------------+&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In combination with our oscillator function, the entire bytebeat will finally look like this: ''f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )''&lt;br /&gt;
&lt;br /&gt;
== Final Bytebeat Implementation ==&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple saw-tooth oscillator, we have an easy game with our oscillator function beeing as simple as ''o(t) = t''. As we figured out, the function ''f(t)'' is then only comprised of ''N+1'' terms, all multiplied together like this: ''f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C''. On each loop iteration of the final bytebeat player, the current sequencer ''sN(t)'' is evaluated by looking up dtermining it's unique index and looking that up inside it's sequencer table. The value stored there for this step is then multiplied to the total result of ''f(t)''. If we keep ''M'' low, then even a 16-bit multiplication is enough here. The final scale factor ''C'' depends on the range of the values derived from the sequencer functions ''sN(t)''. Scaling happens as a shift-right-operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift by 8.&lt;br /&gt;
&lt;br /&gt;
In the reference implementation, a total of 5 cascaded sequencers was used: ''N=5''. Each sequencer's table was chosen to store 4 sequence steps: ''S=4''. Which sequencer step to index is then based on 2-bits of parameter ''t''. The fastest sequencer step time was chosen to be ''2^10'' timer ticks or samples, giving us ''O=10''. This means the lookup index for each sequencer ''i'' with ''0 &amp;lt;= i &amp;lt; N'' is derived by ''(t&amp;gt;&amp;gt;(O+i))%S'', where O as start offset can be chosen different for each music tune. Each step value is limited by ''M=16''. Putting this all together, we can now start composing one music tune this way:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  static constexpr unsigned O = 10;&lt;br /&gt;
  static constexpr unsigned N = 5;&lt;br /&gt;
  static constexpr unsigned S = 4;&lt;br /&gt;
  static constexpr unsigned A = 256;&lt;br /&gt;
&lt;br /&gt;
  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };&lt;br /&gt;
&lt;br /&gt;
  uint8_t get_next_sample( uint16_t t ) {&lt;br /&gt;
    for( unsigned i = 0; i &amp;lt; N; i++ ) t *= seqtbl[i][(t&amp;gt;&amp;gt;(O+i))%S];&lt;br /&gt;
    return static_cast&amp;lt;uint8_t&amp;gt;(t / A);&lt;br /&gt;
  }&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
([https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;v=circles&amp;amp;bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 listen to this bytebeat here])&lt;br /&gt;
&lt;br /&gt;
= The Sourcecode =&lt;br /&gt;
&lt;br /&gt;
With all these parameters being chosen carefully, the final bytebeat implementation and sequence tables will become very small. Here is the documented source code of [https://www.pouet.net/prod.php?which=96071 MUZAK58]:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
           ; MUZAK58 by wiRe/NpM&lt;br /&gt;
           ;-----------------------------------&lt;br /&gt;
            section .text&lt;br /&gt;
            org     100h&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;muzak sequence table&lt;br /&gt;
  seqtbl:   dw      0x1413                    ;1314       ;  t * [3,1,4,1][3&amp;amp;t&amp;gt;&amp;gt;10]       ;! 1314       adc dx,[si]&lt;br /&gt;
            dw      0x6C66                    ;666C       ;    * [6,6,12,6][3&amp;amp;t&amp;gt;&amp;gt;11]      ;! 666C       o32 insb&lt;br /&gt;
            dw      0x2242                    ;4222       ;    * [2,4,2,2][3&amp;amp;t&amp;gt;&amp;gt;12]       ;! 42         inc dx&lt;br /&gt;
            dw      0x6495                    ;9564       ;    * [5,9,4,6][3&amp;amp;t&amp;gt;&amp;gt;13]       ;! 22956484   and dl,[di-0x7b9c]&lt;br /&gt;
            dw      0x4484                    ;8444       ;    * [4,8,4,4][3&amp;amp;t&amp;gt;&amp;gt;14] &amp;gt;&amp;gt; 8  ;! 44         inc sp&lt;br /&gt;
&lt;br /&gt;
            mov     al, 0b00010000            ;B010       ;write 8253/8254 PIT command/mode register: resets PIT channel #0&lt;br /&gt;
            out     43h, al                   ;E643       ;  [7:6] channel #0, [5:4] LSB only, [3:1] mode0 (one-shot), [0] 16-bit binary&lt;br /&gt;
&lt;br /&gt;
           ;--------------------------------- ;---------- ;present next audio sample (DX:BX = 32-bit sample counter)&lt;br /&gt;
  bbeat:    add     al, 149                   ;04xx       ;  calculate new timer period (AL = 42..148)&lt;br /&gt;
            out     40h, al                   ;E640       ;  rearm timer&lt;br /&gt;
&lt;br /&gt;
            inc     bx                        ;43         ;  increment 16-bit timer counter&lt;br /&gt;
&lt;br /&gt;
            pusha                             ;60         ;  store all registers&lt;br /&gt;
           ;mov     si, seqtbl                ;BExxxx     ;  load address of sequence table into SI (here SI already points to seqtbl by default)&lt;br /&gt;
            mov     dx, bx                    ;89DA       ;  load start value into DX&lt;br /&gt;
            mov     cl, 5                     ;B1xx       ;  init index counter inside CX (CH must be zero already!)&lt;br /&gt;
  bbeat_lp: push    cx                        ;51         ;  store CX counter&lt;br /&gt;
            mov     cl, 01100b                ;B1xx       ;  get bit sequence from time into CL&lt;br /&gt;
            and     cl, bh                    ;20F9       ;    CL := offset to 1 out of 4 entries&lt;br /&gt;
            lodsw                             ;AD         ;  load next sequence table entry (AX := DS:[SI]; SI := SI + 2)&lt;br /&gt;
            ror     ax, cl                    ;D3C8       ;  select sequence entry at bit-offset 0, 4, 8 or 12&lt;br /&gt;
            and     ax, 01111b                ;83E00F     ;  each sequence entry is 4 bits only (AX &amp;amp;= 15)&lt;br /&gt;
            mul     dx                        ;F7E2       ;  multiply (DX:AX := AX ∗ DX)&lt;br /&gt;
            xchg    ax, dx                    ;92         ;    DX := updated 16-bit sample&lt;br /&gt;
            pop     cx                        ;59         ;  restore CX counter&lt;br /&gt;
            shr     bx, 1                     ;D1ED       ;  get next bit sequence from time&lt;br /&gt;
            loop    bbeat_lp                  ;E2xx       ;  loop until all bits are out&lt;br /&gt;
&lt;br /&gt;
            mov     al, dh                    ;88F0       ;  get sample data into AL&lt;br /&gt;
            mov     dx, 0378h                 ;BA7803     ;  load LPT1 port address into DX&lt;br /&gt;
            out     dx, al                    ;EE         ;  send 8-bit sample data to COVOX device&lt;br /&gt;
            popa                              ;61         ;  restore all registers (especially BX, CX, DX, SI)&lt;br /&gt;
&lt;br /&gt;
  suplp:   ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)&lt;br /&gt;
            in      al, 40h                   ;E440       ;  read low-byte&lt;br /&gt;
            cmp     al, 148                   ;3Cxx       ;  did timer counter overflowed to 149..0FFh?&lt;br /&gt;
            jo      bbeat                     ;71xx       ;    yes -&amp;gt; play&lt;br /&gt;
&lt;br /&gt;
  bbeat_sk: jmp     short suplp               ;75xx       ;  loop forever&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1321</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1321"/>
				<updated>2024-02-14T07:53:19Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the number of the sample. It will start generation the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will reach the sample ''f(8000)'' after exactly 1 second.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample inside register AL, as it is required by the previous example code. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. The method used to achieve a size-optimized, but still flexible bytebeat formula is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequence ==&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Final Bytebeat ==&lt;br /&gt;
&lt;br /&gt;
Putting this all together, we can now start composing one music tune this way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  static constexpr unsigned O = 10;&lt;br /&gt;
  static constexpr unsigned N = 5;&lt;br /&gt;
  static constexpr unsigned S = 4;&lt;br /&gt;
  static constexpr unsigned A = 256;&lt;br /&gt;
&lt;br /&gt;
  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };&lt;br /&gt;
&lt;br /&gt;
  uint8_t get_sample( uint16_t t ) {&lt;br /&gt;
    for( unsigned i = 0; i &amp;lt; N; i++ ) t *= seqtbl[i][(t&amp;gt;&amp;gt;(O+i))%S];&lt;br /&gt;
    return static_cast&amp;lt;uint8_t&amp;gt;(t / A);&lt;br /&gt;
  }&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code can be executed here live: [https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;v=circles&amp;amp;bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 bytebeat.demozoo.org]&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1320</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1320"/>
				<updated>2024-02-13T20:32:59Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Music Sequence Table */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the number of the sample. It will start generation the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will reach the sample ''f(8000)'' after exactly 1 second.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample inside register AL, as it is required by the previous example code. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. The method used to achieve a size-optimized, but still flexible bytebeat formula is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequence Table ==&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from sequence tables stored in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs always the same math-operations to achieve the smallest possible size. Following this concept, we break down the bytebeat function ''f(t)'' into ''N'' smaller terms ''gN(t)''. Then function ''f(t)'' is comprised of ''N'' terms multiplied together like this: ''f(t) = Π(gN(t)) = g0(t) * g1(t) * g2(t) * ... * gN-1(t)''. On each loop iteration of the final bytebeat player, ''gN(t)'' is evaluated and it's result is multiplied to the total result of ''f(t)''.&lt;br /&gt;
&lt;br /&gt;
The function ''g0'' has a special meaning compared to the other terms, and will deliver the base waveform for the final music tune. To keep the code short, this function will deliver a saw-tooth or triangle waveform based on ''t'' and scaled down by constant factor D. Both can be implemented with very few instructions and scaling can happen as a shift operation in the last step. All other terms ''g1..gN'' will translate ''t'' into factors that will modify the frequency of this waveform. Those translations will be done through N different sequences, one sequence table per term.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of only using one modulated waveform, also 2 or more can be used, like i.e.: ''f(t) = Π(gN(t)) + Π(hN(t))''&lt;br /&gt;
&lt;br /&gt;
In the reference implementation, each sequence table was chosen to store 4 sequence steps or &amp;quot;notes&amp;quot; (''S=4''). Which note to select is based on 2-bits of parameter ''t''. Again considering the size, each sequence ''i=0..N-1'' is using the lookup index ''(t&amp;gt;&amp;gt;(i+O))&amp;amp;3'', where O as start offset can be chosen different for each music tune.&lt;br /&gt;
&lt;br /&gt;
Putting this all together, we can now start composing one music tune this way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  static constexpr unsigned O = 10;&lt;br /&gt;
  static constexpr unsigned N = 5;&lt;br /&gt;
  static constexpr unsigned S = 4;&lt;br /&gt;
  static constexpr unsigned D = 256;&lt;br /&gt;
&lt;br /&gt;
  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };&lt;br /&gt;
&lt;br /&gt;
  uint8_t get_sample( uint16_t t ) {&lt;br /&gt;
    for( unsigned i = 0; i &amp;lt; N; i++ ) t *= seqtbl[i][(t&amp;gt;&amp;gt;(i+O))&amp;amp;3];&lt;br /&gt;
    return static_cast&amp;lt;uint8_t&amp;gt;(t / D);&lt;br /&gt;
  }&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code can be executed here live: [https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;v=circles&amp;amp;bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 bytebeat.demozoo.org]&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1319</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1319"/>
				<updated>2024-02-13T20:32:25Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Music Sequence Table */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the number of the sample. It will start generation the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will reach the sample ''f(8000)'' after exactly 1 second.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample inside register AL, as it is required by the previous example code. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. The method used to achieve a size-optimized, but still flexible bytebeat formula is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequence Table ==&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from sequence tables stored in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs always the same math-operations to achieve the smallest possible size. Following this concept, we break down the bytebeat function ''f(t)'' into ''N'' smaller terms ''gN(t)''. Then function ''f(t)'' is comprised of ''N'' terms multiplied together like this: ''f(t) = Π(gN(t)) = g0(t) * g1(t) * g2(t) * ... * gN-1(t)''. On each loop iteration of the final bytebeat player, ''gN(t)'' is evaluated and it's result is multiplied to the total result of ''f(t)''.&lt;br /&gt;
&lt;br /&gt;
The function ''g0'' has a special meaning compared to the other terms, and will deliver the base waveform for the final music tune. To keep the code short, this function will deliver a saw-tooth or triangle waveform based on ''t'' and scaled down by constant factor D. Both can be implemented with very few instructions and scaling can happen as a shift operation in the last step. All other terms ''g1..gN'' will translate ''t'' into factors that will modify the frequency of this waveform. Those translations will be done through N different sequences, one sequence table per term.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of only using one modulated waveform, also 2 or more can be used, like i.e.: ''f(t) = Π(gN(t)) + Π(hN(t))''&lt;br /&gt;
&lt;br /&gt;
In the reference implementation, each sequence table was chosen to store 4 sequence steps or &amp;quot;notes&amp;quot; (''S=4''). Which note to select is based on 2-bits of parameter ''t''. Again considering the size, each sequence ''i=0..N-1'' is using the lookup index ''(t&amp;gt;&amp;gt;(i+O))&amp;amp;3'', where O as start offset can be chosen different for each music tune.&lt;br /&gt;
&lt;br /&gt;
Putting this all together, we can now start composing one music tune this way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  static constexpr unsigned O = 10;&lt;br /&gt;
  static constexpr unsigned N = 5;&lt;br /&gt;
  static constexpr unsigned S = 4;&lt;br /&gt;
  static constexpr unsigned D = 8;&lt;br /&gt;
&lt;br /&gt;
  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };&lt;br /&gt;
&lt;br /&gt;
  uint8_t get_sample( uint16_t t ) {&lt;br /&gt;
    for( unsigned i = 0; i &amp;lt; N; i++ ) t *= seqtbl[i][(t&amp;gt;&amp;gt;(i+O))&amp;amp;3];&lt;br /&gt;
    return static_cast&amp;lt;uint8_t&amp;gt;(t / D);&lt;br /&gt;
  }&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code can be executed here live: [https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;v=circles&amp;amp;bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 bytebeat.demozoo.org]&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1318</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1318"/>
				<updated>2024-02-13T20:29:14Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Music Sequence Table */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the number of the sample. It will start generation the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will reach the sample ''f(8000)'' after exactly 1 second.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample inside register AL, as it is required by the previous example code. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. The method used to achieve a size-optimized, but still flexible bytebeat formula is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequence Table ==&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from sequence tables stored in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs always the same math-operations to achieve the smallest possible size. Following this concept, we break down the bytebeat function ''f(t)'' into ''N'' smaller terms ''gN(t)''. Then function ''f(t)'' is comprised of ''N'' terms multiplied together like this: ''f(t) = Π(gN(t))/D = g0(t) * g1(t) * g2(t) * ... * gN-1(t) / D''. On each loop iteration of the final bytebeat player, ''gN(t)'' is evaluated and it's result is multiplied to the total result of ''f(t)''.&lt;br /&gt;
&lt;br /&gt;
The function ''g0'' has a special meaning compared to the other terms, and will deliver the base waveform for the final music tune. To keep the code short, this function will deliver a saw-tooth or triangle waveform based on ''t''. Both can be implemented with very few instructions. All other terms ''g1..gN'' will translate ''t'' into factors that will modify the frequency of this waveform. Those translations will be done through N different sequences, one sequence table per term.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of only using one modulated waveform, also 2 or more can be used, like i.e.: ''f(t) = (Π(gN(t))/D1) + (Π(hN(t))/D2)''&lt;br /&gt;
&lt;br /&gt;
In the reference implementation, each sequence table was chosen to store 4 sequence steps or &amp;quot;notes&amp;quot; (''S=4''). Which note to select is based on 2-bits of parameter ''t''. Again considering the size, each sequence ''i=0..N-1'' is using the lookup index ''(t&amp;gt;&amp;gt;(i+O))&amp;amp;3'', where O as start offset can be chosen different for each music tune.&lt;br /&gt;
&lt;br /&gt;
Putting this all together, we can now start composing one music tune this way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  static constexpr unsigned O = 10;&lt;br /&gt;
  static constexpr unsigned N = 5;&lt;br /&gt;
  static constexpr unsigned S = 4;&lt;br /&gt;
  static constexpr unsigned D = 8;&lt;br /&gt;
&lt;br /&gt;
  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };&lt;br /&gt;
&lt;br /&gt;
  uint8_t get_sample( uint16_t t ) {&lt;br /&gt;
    for( unsigned i = 0; i &amp;lt; N; i++ ) t *= seqtbl[i][(t&amp;gt;&amp;gt;(i+O))&amp;amp;3];&lt;br /&gt;
    return static_cast&amp;lt;uint8_t&amp;gt;(t / D);&lt;br /&gt;
  }&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code can also be executed here: [https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;v=circles&amp;amp;bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 bytebeat.demozoo.org]&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1317</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1317"/>
				<updated>2024-02-13T20:27:30Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Music Sequence Table */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the number of the sample. It will start generation the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will reach the sample ''f(8000)'' after exactly 1 second.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample inside register AL, as it is required by the previous example code. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. The method used to achieve a size-optimized, but still flexible bytebeat formula is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequence Table ==&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from sequence tables stored in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs always the same math-operations to achieve the smallest possible size. Following this concept, we break down the bytebeat function ''f(t)'' into ''N'' smaller terms ''gN(t)''. Then function ''f(t)'' is comprised of ''N'' terms multiplied together like this: ''f(t) = Π(gN(t))/D = g0(t) * g1(t) * g2(t) * ... * gN-1(t) / D''. On each loop iteration of the final bytebeat player, ''gN(t)'' is evaluated and it's result is multiplied to the total result of ''f(t)''.&lt;br /&gt;
&lt;br /&gt;
The function ''g0'' has a special meaning compared to the other terms, and will deliver the base waveform for the final music tune. To keep the code short, this function will deliver a saw-tooth or triangle waveform based on ''t''. Both can be implemented with very few instructions. All other terms ''g1..gN'' will translate ''t'' into factors that will modify the frequency of this waveform. Those translations will be done through N different sequences, one sequence table per term.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of only using one modulated waveform, also 2 or more can be used, like i.e.: ''f(t) = (Π(gN(t)) + Π(hN(t))) / D''&lt;br /&gt;
&lt;br /&gt;
In the reference implementation, each sequence table was chosen to store 4 sequence steps or &amp;quot;notes&amp;quot; (''S=4''). Which note to select is based on 2-bits of parameter ''t''. Again considering the size, each sequence ''i=0..N-1'' is using the lookup index ''(t&amp;gt;&amp;gt;(i+O))&amp;amp;3'', where O as start offset can be chosen different for each music tune.&lt;br /&gt;
&lt;br /&gt;
Putting this all together, we can now start composing one music tune this way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  static constexpr unsigned O = 10;&lt;br /&gt;
  static constexpr unsigned N = 5;&lt;br /&gt;
  static constexpr unsigned S = 4;&lt;br /&gt;
  static constexpr unsigned D = 8;&lt;br /&gt;
&lt;br /&gt;
  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };&lt;br /&gt;
&lt;br /&gt;
  uint8_t get_sample( uint16_t t ) {&lt;br /&gt;
    for( unsigned i = 0; i &amp;lt; N; i++ ) t *= seqtbl[i][(t&amp;gt;&amp;gt;(i+O))&amp;amp;3];&lt;br /&gt;
    return static_cast&amp;lt;uint8_t&amp;gt;(t / D);&lt;br /&gt;
  }&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code can also be executed here: [https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;v=circles&amp;amp;bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 bytebeat.demozoo.org]&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1316</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1316"/>
				<updated>2024-02-13T20:24:29Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the number of the sample. It will start generation the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will reach the sample ''f(8000)'' after exactly 1 second.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample inside register AL, as it is required by the previous example code. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. The method used to achieve a size-optimized, but still flexible bytebeat formula is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequence Table ==&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from sequence tables stored in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs always the same math-operations to achieve the smallest possible size. Following this concept, we break down the bytebeat function ''f(t)'' into ''N'' smaller terms ''gN(t)''. Then function ''f(t)'' is comprised of ''N'' terms multiplied together like this: ''f(t) = Π(gN(t))/D = g0(t) * g1(t) * g2(t) * ... * gN-1(t) / D''. On each loop iteration of the final bytebeat player, ''gN(t)'' is evaluated and it's result is multiplied to the total result of ''f(t)''.&lt;br /&gt;
&lt;br /&gt;
The function ''g0'' has a special meaning compared to the other terms, and will deliver the base waveform for the final music tune. To keep the code short, this function will deliver a saw-tooth or triangle waveform based on ''t''. Both can be implemented with very few instructions. All other terms ''g1..gN'' will translate ''t'' into factors that will modify the frequency of this waveform. Those translations will be done through N different sequences, one sequence table per term.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of only using one modulated waveform, also 2 or more can be used, like i.e.: ''f(t) = (Π(gN(t)) + Π(hN(t))) / D''&lt;br /&gt;
&lt;br /&gt;
In the reference implementation, each sequence table was chosen to store 4 sequence steps or &amp;quot;notes&amp;quot; (''S=4''). Which note to select is based on 2-bits of parameter ''t''. Again considering the size, each sequence ''i=0..N-1'' is using the lookup index ''(t&amp;gt;&amp;gt;(i+O))&amp;amp;3'', where O as start offset can be chosen different for each music tune.&lt;br /&gt;
&lt;br /&gt;
Putting this all together, we can now start composing one music tune this way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp&amp;quot;&amp;gt;&lt;br /&gt;
  static constexpr unsigned O = 10;&lt;br /&gt;
  static constexpr unsigned N = 5;&lt;br /&gt;
  static constexpr unsigned S = 4;&lt;br /&gt;
  static constexpr unsigned D = 8;&lt;br /&gt;
&lt;br /&gt;
  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };&lt;br /&gt;
&lt;br /&gt;
  uint8_t getByteBeat( uint16_t t ) {&lt;br /&gt;
    uint16_t sample = t;&lt;br /&gt;
    for( unsigned i = 0; i &amp;lt; N; i++ ) sample *= seqtbl[i][(t&amp;gt;&amp;gt;(i+O))&amp;amp;3];&lt;br /&gt;
    return sample / D;&lt;br /&gt;
  }&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above code can also be executed here: [https://bytebeat.demozoo.org/#t=0&amp;amp;e=0&amp;amp;s=8000&amp;amp;v=circles&amp;amp;bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 bytebeat.demozoo.org]&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1315</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1315"/>
				<updated>2024-02-13T19:49:17Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* The Bytebeat */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the number of the sample. It will start generation the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will reach the sample ''f(8000)'' after exactly 1 second.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample inside register AL, as it is required by the previous example code. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. The method used to achieve a size-optimized bytebeat formula is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequence Table ==&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from sequence tables stored in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs always the same math-operations to achieve the smallest possible size. Following this concept, we break down the bytebeat function ''f(t)'' into ''N'' smaller terms ''gN(t)''. Then function ''f(t)'' is comprised of ''N'' terms multiplied together like this: ''f(t) = Π(gN(t)) = g0(t) * g1(t) * g2(t) * ... * gN-1(t)''. On each loop iteration of the final bytebeat player, ''gN(t)'' is evaluated and it's result is multiplied to the total result of ''f(t)''.&lt;br /&gt;
&lt;br /&gt;
The function ''g0'' has a special meaning compared to the other terms, and will deliver the base waveform for the final music tune. To keep the code short, this function will deliver a saw-tooth or triangle waveform based on ''t''. Both can be implemented with very few instructions. All other terms ''g1..gN'' will translate ''t'' into factors that will modify the frequency of this waveform. Those translations will be done through N different sequence tables, one per term.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of only using one modulated waveform, also 2 or more can be used, like i.e.: ''f(t) = Π(gN(t)) + Π(hN(t))''&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1314</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1314"/>
				<updated>2024-02-13T19:48:29Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Music Sequence Table */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where t is the number of the sample. It will start generation the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will reach the sample ''f(8000)'' after exactly 1 second.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample inside register AL, as it is required by the previous example code. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. The method used to achieve a size-optimized bytebeat formula is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequence Table ==&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from sequence tables stored in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs always the same math-operations to achieve the smallest possible size. Following this concept, we break down the bytebeat function ''f(t)'' into ''N'' smaller terms ''gN(t)''. Then function ''f(t)'' is comprised of ''N'' terms multiplied together like this: ''f(t) = Π(gN(t)) = g0(t) * g1(t) * g2(t) * ... * gN-1(t)''. On each loop iteration of the final bytebeat player, ''gN(t)'' is evaluated and it's result is multiplied to the total result of ''f(t)''.&lt;br /&gt;
&lt;br /&gt;
The function ''g0'' has a special meaning compared to the other terms, and will deliver the base waveform for the final music tune. To keep the code short, this function will deliver a saw-tooth or triangle waveform based on ''t''. Both can be implemented with very few instructions. All other terms ''g1..gN'' will translate ''t'' into factors that will modify the frequency of this waveform. Those translations will be done through N different sequence tables, one per term.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of only using one modulated waveform, also 2 or more can be used, like i.e.: ''f(t) = Π(gN(t)) + Π(hN(t))''&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1313</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1313"/>
				<updated>2024-02-13T19:43:57Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Music Sequence Table */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where t is the number of the sample. It will start generation the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will reach the sample ''f(8000)'' after exactly 1 second.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample inside register AL, as it is required by the previous example code. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. The method used to achieve a size-optimized bytebeat formula is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequence Table ==&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from a sequence table in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs always the same math-operations to achieve the smallest possible size. Following this concept, we break down the bytebeat function ''f(t)'' into ''N'' smaller terms ''gN(t)''. Then function ''f(t)'' is comprised of ''N'' terms multiplied together like this: ''f(t) = Π(gN(t)) = g0(t) * g1(t) * g2(t) * ... * gN-1(t)''. On each loop iteration of the final bytebeat player, ''gN(t)'' is evaluated and it's result is multiplied to the total result of ''f(t)''.&lt;br /&gt;
&lt;br /&gt;
The function ''g0'' has a special meaning compared to the other terms, and will deliver the base waveform for the final music tune. To keep the code short, this function will deliver a saw-tooth or triangle waveform based on ''t''. Both can be implemented with very few instructions. All other terms ''g1..gN'' will translate ''t'' into factors that will modify the frequency of the basic waveform. Those translations will be done through N different sequence tables, one per term.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead of only using one modulated waveform, also 2 or more can be used, like i.e.: ''f(t) = Π(gN(t)) + Π(hN(t))''&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1312</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1312"/>
				<updated>2024-02-13T19:42:01Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Music Sequence Table */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where t is the number of the sample. It will start generation the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will reach the sample ''f(8000)'' after exactly 1 second.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample inside register AL, as it is required by the previous example code. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. The method used to achieve a size-optimized bytebeat formula is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequence Table ==&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from a sequence table in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs always the same math-operations to achieve the smallest possible size. Following this concept, we break down the bytebeat function ''f(t)'' into ''N'' smaller terms ''gN(t)''. Then function ''f(t)'' is comprised of ''N'' terms multiplied together like this: ''f(t) = Π(gN(t)) = g0(t) * g1(t) * g2(t) * ... * gN-1(t)''. On each loop iteration of the final bytebeat player, ''gN(t)'' is evaluated and it's result is multiplied to the total result of ''f(t)''.&lt;br /&gt;
&lt;br /&gt;
The function ''g0'' has a special meaning compared to the other terms, and will deliver the base signal wave for the final music tune. To keep the code short, this function will deliver a saw-tooth- or triangle-wave based on ''t''. Both can be implemented with very few instructions. All other terms ''g1..gN'' will translate ''t'' into factors that will modify the frequency of the ground wave. Those translations will be done through N different sequence tables, one per term.&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead only using one modulated base wave also 2 or more can be used, like i.e.: ''f(t) = Π(gN(t)) + Π(hN(t))''&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1311</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1311"/>
				<updated>2024-02-13T19:36:57Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Music Sequence Table */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where t is the number of the sample. It will start generation the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will reach the sample ''f(8000)'' after exactly 1 second.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample inside register AL, as it is required by the previous example code. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. The method used to achieve a size-optimized bytebeat formula is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequence Table ==&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from a sequence table in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs always the same math-operations to achieve the smallest possible size. Following this concept, we break down the bytebeat function ''f(t)'' into ''N'' smaller terms ''gN(t)''. Then function ''f(t)'' is comprised of ''N'' terms multiplied together like this: ''f(t) = Π(gN(t)) = g0(t) * g1(t) * g2(t) * ... * gN-1(t)''. On each loop iteration of the final bytebeat player, ''gN(t)'' is evaluated and it's result is multiplied to the total result of ''f(t)''.&lt;br /&gt;
&lt;br /&gt;
The function ''g0'' has a special meaning compared to the other terms, and will deliver the base signal wave for the final music tune. To keep the code short, this function will deliver a saw-tooth- or triangle-wave based on ''t''. Both can be implemented with very few instructions. All other terms ''g1..gN'' will translate ''t'' into factors that will modify this single ground wave. Those translations will be done through N different sequence tables, one per term.&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1310</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1310"/>
				<updated>2024-02-13T19:35:49Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Music Sequence Table */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where t is the number of the sample. It will start generation the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will reach the sample ''f(8000)'' after exactly 1 second.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample inside register AL, as it is required by the previous example code. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. The method used to achieve a size-optimized bytebeat formula is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequence Table ==&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from a sequence table in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs always the same math-operations to achieve the smallest possible size. Following this concept, we break down the bytebeat function ''f(t)'' into ''N'' smaller terms ''gN(t)''. Then function ''f(t)'' is comprised of ''N'' terms multiplied together like this: ''f(t) = Π(gN(t)) = g0(t) * g1(t) * g2(t) * ... * gN-1(t)''. On each loop iteration, ''gN(t)'' is evaluated and it's result is multiplied to the total result of ''f(t)''.&lt;br /&gt;
&lt;br /&gt;
The function ''g0'' has a special meaning compared to the other terms, and will deliver the base signal wave for the final music tune. To keep the code short, this function will deliver a saw-tooth- or triangle-wave based on ''t''. Both can be implemented with very few instructions. All other terms ''g1..gN'' will translate ''t'' into factors that will modify this single ground wave. Those translations will be done through N different sequence tables, one per term.&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1309</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1309"/>
				<updated>2024-02-13T19:34:27Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where t is the number of the sample. It will start generation the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will reach the sample ''f(8000)'' after exactly 1 second.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample inside register AL, as it is required by the previous example code. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. The method used to achieve a size-optimized bytebeat formula is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequence Table ==&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from a sequence table in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs always the same math-operations to achieve the smallest possible size. Following this concept, we break down the bytebeat function ''f(t)'' into ''N'' smaller terms ''gN(t)''. Then function ''f(t)'' is comprised of ''N'' terms multiplied together like this: ''f(t) = Π(gN(t)) = g0(t) * g1(t) * g2(t) * ... * g3(t)''. On each loop iteration, ''gN(t)'' is evaluated and its result is multiplied to the total result of ''f(t)''.&lt;br /&gt;
&lt;br /&gt;
The function ''g0'' has a special meaning compared to the other terms, and will deliver the base signal wave for the final music tune. To keep the code short, this function will deliver a saw-tooth- or triangle-wave based on ''t''. Both can be implemented with very few instructions. All other terms ''g1..gN'' will translate ''t'' into factors that will modify this single ground wave. Those translations will be done through N different sequence tables, one per term.&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1308</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1308"/>
				<updated>2024-02-13T19:32:57Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
= The Bytebeat =&lt;br /&gt;
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where t is the number of the sample. It will start generation the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will reach the sample ''f(8000)'' after exactly 1 second.&lt;br /&gt;
&lt;br /&gt;
In general, any bytebeat algorithm can be implemented now to present the next sample inside register AL, as it is required by the previous example code. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. The method used here is described next.&lt;br /&gt;
&lt;br /&gt;
== Music Sequence Table ==&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from a sequence table in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
The basic idea is to design the bytebeat algorithm as a loop that performs always the same math-operations to achieve the smallest possible size. Following this concept, we break down the bytebeat function ''f(t)'' into ''N'' smaller terms ''gN(t)''. Then function ''f(t)'' is comprised of ''N'' terms multiplied together like this: ''f(t) = Π(gN(t)) = g0(t) * g1(t) * g2(t) * ... * g3(t)''. On each loop iteration, ''gN(t)'' is evaluated and its result is multiplied to the total result of ''f(t)''.&lt;br /&gt;
&lt;br /&gt;
The function ''g0'' has a special meaning compared to the other terms, and will deliver the base signal wave for the final music tune. To keep the code short, this function will deliver a saw-tooth- or triangle-wave based on ''t''. Both can be implemented with very few instructions. All other terms ''g1..gN'' will translate ''t'' into factors that will modify this single ground wave. Those translations will be done through N different sequence tables, one per term.&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1307</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1307"/>
				<updated>2024-02-13T18:42:18Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Poll BIOS Counter */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Output#Producing_sound]] for the basic knowledge.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a byte-beat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
=== Music Sequence Table ===&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from a sequence table in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1306</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1306"/>
				<updated>2024-02-13T18:41:19Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Output#Producing_sound]] for the basic knowledge.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a byte-beat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll BIOS Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll BIOS Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and send to LPT1. This coudl look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
=== Music Sequence Table ===&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from a sequence table in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1305</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1305"/>
				<updated>2024-02-13T18:39:57Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Poll PIT Counter */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Output#Producing_sound]] for the basic knowledge.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a byte-beat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll INT8 Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and send to LPT1. This coudl look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
=== Music Sequence Table ===&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from a sequence table in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1304</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1304"/>
				<updated>2024-02-13T18:39:40Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Poll PIT Counter */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Output#Producing_sound]] for the basic knowledge.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a byte-beat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll INT8 Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and send to LPT1. This coudl look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone copuld also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
=== Music Sequence Table ===&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from a sequence table in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1303</id>
		<title>MUZAK58</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=MUZAK58&amp;diff=1303"/>
				<updated>2024-02-13T18:33:40Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: Created page with &amp;quot;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tec...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Output#Producing_sound]] for the basic knowledge.&lt;br /&gt;
&lt;br /&gt;
= COVOX LPT-DAC =&lt;br /&gt;
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a byte-beat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.&lt;br /&gt;
&lt;br /&gt;
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.&lt;br /&gt;
&lt;br /&gt;
= Time Source =&lt;br /&gt;
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:&lt;br /&gt;
* Timer Interrupt&lt;br /&gt;
* Poll INT8 Counter &lt;br /&gt;
* Poll PIT Counter&lt;br /&gt;
* Alternative Options&lt;br /&gt;
&lt;br /&gt;
== Timer Interrupt ==&lt;br /&gt;
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].&lt;br /&gt;
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
The DWORD at [0:0x046C] holds the number of timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and send to LPT1. This coudl look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
          mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
          salc                              ;D6         ;  AL := 0 (if CF=0)&lt;br /&gt;
          out     40h, al                   ;E640       ;&lt;br /&gt;
&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0&lt;br /&gt;
_tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?&lt;br /&gt;
          jz      ntick                     ;74xx       ;  no -&amp;gt; skip audio&lt;br /&gt;
          mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)&lt;br /&gt;
&lt;br /&gt;
          inc     ebp                       ;6645       ;increment 32-bit timer counter&lt;br /&gt;
&lt;br /&gt;
          ; ... set AL to next audio sample based on EBP ...&lt;br /&gt;
&lt;br /&gt;
          mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX&lt;br /&gt;
          out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device&lt;br /&gt;
&lt;br /&gt;
ntick:&lt;br /&gt;
          ; ...&lt;br /&gt;
&lt;br /&gt;
          jmp     short suplp&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Poll PIT Counter ==&lt;br /&gt;
(...more soon...)&lt;br /&gt;
&lt;br /&gt;
== Alternative Options ==&lt;br /&gt;
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.&lt;br /&gt;
&lt;br /&gt;
=== Music Sequence Table ===&lt;br /&gt;
MUZAK58 is to some degree a generic background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from a sequence table in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast.&lt;br /&gt;
&lt;br /&gt;
(...more soon...)&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Case_Studies&amp;diff=1302</id>
		<title>Case Studies</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Case_Studies&amp;diff=1302"/>
				<updated>2024-02-13T17:24:08Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it can be helpful to examine a tinyprog in detail, seeing what choices were made and why:&lt;br /&gt;
&lt;br /&gt;
:[[MELT.COM]]: We take a cute program from the 1980s and see how far we can crunch it down&lt;br /&gt;
&lt;br /&gt;
:[[m8trix 8b]]: An &amp;quot;Enter The Matrix&amp;quot; screen display in only 8 bytes&lt;br /&gt;
&lt;br /&gt;
:[[4is256]]: &amp;amp;#344;r&amp;amp;#345;ola's Tetris is a full tetris clone with scoring&lt;br /&gt;
&lt;br /&gt;
:[[paint16b]]: Hellmood's paint program in 16 bytes (really)&lt;br /&gt;
&lt;br /&gt;
:[[Indian Spirit]]: An American Indian tapestry with music in 32 bytes&lt;br /&gt;
&lt;br /&gt;
:[[Essence]]: Fakery of path tracing and lighting, with audio, in 64 bytes&lt;br /&gt;
&lt;br /&gt;
:[[Memories]]: Detailed writeup on [https://www.pouet.net/prod.php?which=85227 &amp;quot;Memories&amp;quot; (256 bytes)]&lt;br /&gt;
&lt;br /&gt;
:[[Game of Life 32b]]; Writeup for game of life in 32 bytes&lt;br /&gt;
&lt;br /&gt;
:[[Christmas Tree]]: Implementation of a Christmas Tree on several platforms &lt;br /&gt;
&lt;br /&gt;
:[[MUZAK58]]: Playing some background byte-beat music through COVOX LPTDAC in 58 bytes&lt;br /&gt;
&lt;br /&gt;
== External Case Studies ==&lt;br /&gt;
&lt;br /&gt;
[https://en.wikipedia.org/wiki/Brainfuck Brainfuck] is a very tiny language, having only 8 commands and an instruction pointer.  Several compilers and interpreters have been made; Gynvael archived [http://gynvael.coldwind.pl/bf125/ many different versions along with their source code], the smallest of which was 125 bytes.  qkumba took that as inspiration and created his own [http://pferrie.epizy.com/misc/tiny/brainfck.htm brainfuck compiler in 100 bytes] (104 for one that is fully compliant).  (Note:  The term &amp;quot;compiler&amp;quot; is used mistakenly a lot in these descriptions; the majority of brainfuck programs are actually interpreters that load brainfuck programs and execute native code token by token.  This one is an exception in that it really does compile the code entirely into native instructions first.)&lt;br /&gt;
&lt;br /&gt;
[https://revival-studios.com/marquee/blog/blog_homogenic.html Small Beginnings: The development of Homogenic] Development Write up on [https://www.pouet.net/prod.php?which=84537 Homogenic, by: Marquee Design (256 bytes)]&lt;br /&gt;
&lt;br /&gt;
[https://revival-studios.com/marquee/blog/blog_nanorail.html Moving on: The development of Nanorail] Development Write up on [https://www.pouet.net/prod.php?which=84624 Nanorail, by: Marquee Design (256 bytes)]&lt;br /&gt;
&lt;br /&gt;
[https://revival-studios.com/marquee/blog/blog_enigma.html Full Circle: The development of Enigma] Development Write up on [https://www.pouet.net/prod.php?which=85232 Enigma, by: Marquee Design (256 bytes)]&lt;br /&gt;
&lt;br /&gt;
[https://revival-studios.com/marquee/blog/blog_microdose.html Crunching content: The development of Microdose] Development Write up on [https://www.pouet.net/prod.php?which=85677 Microdose, by: Marquee Design (128 bytes)]&lt;br /&gt;
&lt;br /&gt;
[https://abaddon.hu/crevenge/history.html The long journey to 1k real-time ray-tracing] Detailed technical writeup on [https://www.pouet.net/prod.php?which=87078 Chrome Revenge by Abaddon (1024 bytes)]&lt;br /&gt;
&lt;br /&gt;
[http://canonical.org/~kragen/demo/fr-016.html Disassembly of Farbrausch's &amp;quot;fr-016: bytes&amp;quot;]&lt;br /&gt;
&lt;br /&gt;
[http://canonical.org/~kragen/demo/klappquadrat.html A disassembly of the 64-byte version of Klappquadrat]&lt;br /&gt;
&lt;br /&gt;
[https://trixter.oldskool.org/2012/12/17/maze-generation-in-thirteen-bytes/ Maze generation in 10 bytes]&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Case_Studies&amp;diff=1301</id>
		<title>Case Studies</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Case_Studies&amp;diff=1301"/>
				<updated>2024-02-13T17:23:04Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it can be helpful to examine a tinyprog in detail, seeing what choices were made and why:&lt;br /&gt;
&lt;br /&gt;
:[[MELT.COM]]: We take a cute program from the 1980s and see how far we can crunch it down&lt;br /&gt;
&lt;br /&gt;
:[[m8trix 8b]]: An &amp;quot;Enter The Matrix&amp;quot; screen display in only 8 bytes&lt;br /&gt;
&lt;br /&gt;
:[[4is256]]: &amp;amp;#344;r&amp;amp;#345;ola's Tetris is a full tetris clone with scoring&lt;br /&gt;
&lt;br /&gt;
:[[paint16b]]: Hellmood's paint program in 16 bytes (really)&lt;br /&gt;
&lt;br /&gt;
:[[Indian Spirit]]: An American Indian tapestry with music in 32 bytes&lt;br /&gt;
&lt;br /&gt;
:[[Essence]]: Fakery of path tracing and lighting, with audio, in 64 bytes&lt;br /&gt;
&lt;br /&gt;
:[[Memories]]: Detailed writeup on [https://www.pouet.net/prod.php?which=85227 &amp;quot;Memories&amp;quot; (256 bytes)]&lt;br /&gt;
&lt;br /&gt;
:[[Game of Life 32b]]; Writeup for game of life in 32 bytes&lt;br /&gt;
&lt;br /&gt;
:[[Christmas Tree]]: Implementation of a Christmas Tree on several platforms &lt;br /&gt;
&lt;br /&gt;
:[[MUZAK85]]: Playing some background byte-beat music through COVOX LPTDAC in 58 bytes&lt;br /&gt;
&lt;br /&gt;
== External Case Studies ==&lt;br /&gt;
&lt;br /&gt;
[https://en.wikipedia.org/wiki/Brainfuck Brainfuck] is a very tiny language, having only 8 commands and an instruction pointer.  Several compilers and interpreters have been made; Gynvael archived [http://gynvael.coldwind.pl/bf125/ many different versions along with their source code], the smallest of which was 125 bytes.  qkumba took that as inspiration and created his own [http://pferrie.epizy.com/misc/tiny/brainfck.htm brainfuck compiler in 100 bytes] (104 for one that is fully compliant).  (Note:  The term &amp;quot;compiler&amp;quot; is used mistakenly a lot in these descriptions; the majority of brainfuck programs are actually interpreters that load brainfuck programs and execute native code token by token.  This one is an exception in that it really does compile the code entirely into native instructions first.)&lt;br /&gt;
&lt;br /&gt;
[https://revival-studios.com/marquee/blog/blog_homogenic.html Small Beginnings: The development of Homogenic] Development Write up on [https://www.pouet.net/prod.php?which=84537 Homogenic, by: Marquee Design (256 bytes)]&lt;br /&gt;
&lt;br /&gt;
[https://revival-studios.com/marquee/blog/blog_nanorail.html Moving on: The development of Nanorail] Development Write up on [https://www.pouet.net/prod.php?which=84624 Nanorail, by: Marquee Design (256 bytes)]&lt;br /&gt;
&lt;br /&gt;
[https://revival-studios.com/marquee/blog/blog_enigma.html Full Circle: The development of Enigma] Development Write up on [https://www.pouet.net/prod.php?which=85232 Enigma, by: Marquee Design (256 bytes)]&lt;br /&gt;
&lt;br /&gt;
[https://revival-studios.com/marquee/blog/blog_microdose.html Crunching content: The development of Microdose] Development Write up on [https://www.pouet.net/prod.php?which=85677 Microdose, by: Marquee Design (128 bytes)]&lt;br /&gt;
&lt;br /&gt;
[https://abaddon.hu/crevenge/history.html The long journey to 1k real-time ray-tracing] Detailed technical writeup on [https://www.pouet.net/prod.php?which=87078 Chrome Revenge by Abaddon (1024 bytes)]&lt;br /&gt;
&lt;br /&gt;
[http://canonical.org/~kragen/demo/fr-016.html Disassembly of Farbrausch's &amp;quot;fr-016: bytes&amp;quot;]&lt;br /&gt;
&lt;br /&gt;
[http://canonical.org/~kragen/demo/klappquadrat.html A disassembly of the 64-byte version of Klappquadrat]&lt;br /&gt;
&lt;br /&gt;
[https://trixter.oldskool.org/2012/12/17/maze-generation-in-thirteen-bytes/ Maze generation in 10 bytes]&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=General_Coding_Tricks&amp;diff=1300</id>
		<title>General Coding Tricks</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=General_Coding_Tricks&amp;diff=1300"/>
				<updated>2024-02-13T17:20:38Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Need a constant? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Data is code, code is data ==&lt;br /&gt;
&lt;br /&gt;
Code is nothing more than data that the CPU interprets.  For example, consider this multi-byte instruction:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov ah,37h&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This assembles to &amp;lt;code&amp;gt;B4 37&amp;lt;/code&amp;gt;.  &amp;lt;code&amp;gt;B4&amp;lt;/code&amp;gt; by itself isn't interesting, but &amp;lt;code&amp;gt;37&amp;lt;/code&amp;gt; happens to be the opcode for &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt;.  Let's say you had this code before a loop, and you needed to perform &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt; at the top of a loop.  Rather than put AAS at the top of the loop, you can reuse the opcode that will already be there as part of the &amp;lt;code&amp;gt;mov ah,37&amp;lt;/code&amp;gt; that comes before it.  Just jump directly into the middle of the &amp;lt;code&amp;gt;mov ah,37h&amp;lt;/code&amp;gt;, which will get interpreted and executed as &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt;:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
&lt;br /&gt;
label:&lt;br /&gt;
        mov ah,37h&lt;br /&gt;
        ;misc. stuff&lt;br /&gt;
        loop label+1&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;+1&amp;lt;/code&amp;gt; specifies the jump should go to 1 byte past the actual location.&lt;br /&gt;
&lt;br /&gt;
=== Reuse ===&lt;br /&gt;
&lt;br /&gt;
You can use opcodes hidden in your existing data.  For example, .COM files can end with &amp;lt;code&amp;gt;RET&amp;lt;/code&amp;gt;, which is opcode &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt;.  If you already have a &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt; somewhere else in your code, even as part of data, just &amp;lt;code&amp;gt;JMP&amp;lt;/code&amp;gt; to that pre-existing &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt; instead of adding a &amp;lt;code&amp;gt;RET&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== If your environment holds you back, change it ==&lt;br /&gt;
&lt;br /&gt;
The [https://en.wikipedia.org/wiki/Video_Graphics_Array#/media/File:VGA_palette_with_black_borders.svg default MCGA palette] is fairly horrible, but can be size advantages to changing it:  While setting a new palette costs bytes, the new palette arrangement could save you headaches down the road.  For example, if your code is calculating pixel colors that fall into goofy ranges, rather than constantly adjust the colors to sane ranges (ie. aligned to powers of 2), just set the palette so that values falling into those ranges look the way you want.  (This assumes you have very small ways of redefining the palette, of course.)&lt;br /&gt;
&lt;br /&gt;
The above is maybe not the best example.  Rewrites to this section are welcome.&lt;br /&gt;
&lt;br /&gt;
== Need a constant? ==&lt;br /&gt;
&lt;br /&gt;
If you need a constant value but you're out of space, search your assembled code for a byte value you can use.&lt;br /&gt;
&lt;br /&gt;
If you need more than a byte, then another method could be to create some '''literal pool''' in memory which can be addressed for constants. This technique was first used inside &lt;br /&gt;
[https://www.pouet.net/prod.php?which=94080 TERRA256] (please update if there are earlier examples) and is especially useful if FPU instructions are used, since those can not address byte integer constants in memory but only word and dword. Different to byte constants, it is very unlikely to find word or dword constants inside the opcodes.&lt;br /&gt;
&lt;br /&gt;
The construction of the literal pool can look like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov cx, 255 ;very likely this can be replaced by a shorter version or even skipped; value of CX can also be larger than 255&lt;br /&gt;
initlp: ;very likely there are other things to initialize, like i.e. setting the DAC color palette&lt;br /&gt;
        push cx ;push 16-bit word to the stack with the current counter value&lt;br /&gt;
        loop initlp&lt;br /&gt;
        push cx ;optionally push final zero constant, if required&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
This example will construct a literal pool of 256 words on the stack, starting with 0x00FF and ending with 0x0000. The additional &amp;quot;push cx&amp;quot; after the loop will add a final zero constant, if required. The final zero also allows to exit from a COM executable using the &amp;quot;ret&amp;quot; instruction. &lt;br /&gt;
&lt;br /&gt;
Practically almost any existing initialization loop can be used to setup such a pool, why this construction in the best case only costs a single additional byte for the &amp;quot;push cx&amp;quot; instruction.&lt;br /&gt;
&lt;br /&gt;
To use the literal pool, some index register like SI, DI, BX or BP should be initialized to point to the literal pool. The init value can be the content of the stack pointer (SP) but also a fixed offset, since the position of the literal pool and the current stack position is typically well known. Also the initial value of DI (0xFFFE) or a zero offset may work. Accessing the literal pool can look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov bx, sp ;point index register to literal pool&lt;br /&gt;
        fild word [bx+0x13*2] ;load integer value 0x0013 into FPU register&lt;br /&gt;
        fild word [bx+0x31*2] ;load integer value 0x0031 into FPU register&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To address larger values, some further tricks can be used to keep size low:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        fild word [bx+0x087*2]  ;take care that this instruction is 1 byte larger than the others&lt;br /&gt;
        fild word [bx+si+0x087*2-0x100] ;but this trick could help: load integer value 0x0087 into FPU register, assuming SI is 0x0100&lt;br /&gt;
        fild word [bx+0x03*2-1] ;load integer value 0x0300 into FPU register&lt;br /&gt;
        fidiv dword [bx+0x05*2-3] ;divide by integer value 0x05000400&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== A smaller way to point to Mode 13's screen segment ==&lt;br /&gt;
&lt;br /&gt;
Rather than &amp;lt;code&amp;gt;mov ah,0a0h; mov es,ax&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;push word 0a000h; pop es&amp;lt;/code&amp;gt;, try this 2-byte wonder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
les bp,[bx]&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This sets ES=9FFF, only one away from A000.  You can write to the screen with ES: this way as long as you are aware the segment is one paragraph (16 bytes) behind, so just increase your offset by 16 if you need exact placement.&lt;br /&gt;
&lt;br /&gt;
How does this work?   At start of execution of a .COM file, BX=0, and DS=CS. The contents of the COM file get loaded to offset 0x100 in that segment, but loaded before that is the PSP (program segment prefix), which DOS populates with information about the loaded program and other info. The PSP starts with &amp;lt;code&amp;gt;CD 20&amp;lt;/code&amp;gt; (&amp;lt;code&amp;gt;INT 20&amp;lt;/code&amp;gt;, which exits the program), so that's what gets loaded into BP.  The next word is the number of the last free conventional memory segment, typically 0x9fff (but can be something different if parts of the upper memory range are either not installed or allocated).&lt;br /&gt;
&lt;br /&gt;
'''Warning:''' This trick doesn't always work.  On FreeDos, this can set ES=9FE0 and there is something resident at that location that can screw up the system after normal program exit if you overwrite it.&lt;br /&gt;
&lt;br /&gt;
== Accessing the timer tick for free ==&lt;br /&gt;
&lt;br /&gt;
If using a 386+, FS=0 at .COM start.  So, &amp;lt;code&amp;gt;FS:[046C]&amp;lt;/code&amp;gt; gets you the DOS timer tick variable, which you could use for timing/pacing, or a random seed. Some environments, especially EMS/XMS programs, can modify the FS register, so it can't always assumed to be 0000h. &amp;lt;code&amp;gt;POP DS&amp;lt;/code&amp;gt; right after the start and accessing &amp;lt;code&amp;gt;[046C]&amp;lt;/code&amp;gt; does the trick then with equal size.&lt;br /&gt;
&lt;br /&gt;
== Looping twice ==&lt;br /&gt;
&lt;br /&gt;
If you need to repeat a section of code that doesn't modify the carry flag, and you know the carry flag is clear, you can loop once in only 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
looping:&lt;br /&gt;
        ;do stuff here&lt;br /&gt;
        cmc&lt;br /&gt;
        jc      looping&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Looping three times ==&lt;br /&gt;
&lt;br /&gt;
If you need to repeat a section of code and you have a register whose value is zero and can be incremented freely, or whose value is -1 and can be decremented freely, you can loop twice in only 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
looping:&lt;br /&gt;
        ;do stuff here&lt;br /&gt;
        inc     bx ;if decrementing instead, parity check must be reversed&lt;br /&gt;
        jpo     looping ;1 (01b) and 2 (10b) have odd parity, 3 (11b) has even parity&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Obtaining X and Y without DIV (The Rrrola Trick) ==&lt;br /&gt;
&lt;br /&gt;
In [[Output#Outputting_in_mode_13h_.28320x200.29|320x200 mode]], instead of constructing X and Y from the screen pointer DI with DIV, you can get a decent estimation with multiplying the screen pointer with 0xCCCD and read X and Y from the 8bit registers DH (+DL as 16bit value) and DL (+AH as 16bit value). The idea is to interpret DI as a kind of 16 bit fixed point in the range [0,1], from start to end. Multiplying this number in [0,1] with 65536 / 320 = 204,8 results in the row before the comma, and again as a kind of a fixed point, the column after the comma. The representation 0xCCCD is the nearest rounding of 204,8 * 256 ( = 52428,8 ~ 52429 = 0xCCCD). As long as the 16 bit representations are used, there is no precision loss.&lt;br /&gt;
&lt;br /&gt;
This is adapted from [http://www.pouet.net/prod.php?which=53816 &amp;quot;Puls&amp;quot; by Rrrola] where X and Y are directly modified on the stack by performing &amp;lt;code&amp;gt;add dword[di],0000CCCDh&amp;lt;/code&amp;gt; on each pixel iteration, which requires 7 bytes of code. The vertical alignment correction is solved with a good starting value on said DWORD on the stack before each frame, which requires 2 additional bytes. Both approaches are too different to directly compare, but share the core idea of [http://www.pouet.net/topic.php?which=8791&amp;amp;page=8#c411796 multiplying with &amp;lt;code&amp;gt;0xCCCD&amp;lt;/code&amp;gt;], so &amp;quot;Rrrolas trick&amp;quot; is an appropriate term to use.&lt;br /&gt;
&lt;br /&gt;
=== Alternative explanation by [https://news.ycombinator.com/user?id=pjc50 pjc50]===&lt;br /&gt;
[https://gistpreview.github.io/?9b252f267cd1fdf9754059bb73a18487 Interactive snippet]&lt;br /&gt;
More clearly: DI = (y * 320) + x&lt;br /&gt;
&lt;br /&gt;
Multiply by 0xCCCD =&amp;gt; (y * 0x1000040) + (x * 0xcccd)&lt;br /&gt;
&lt;br /&gt;
Take top byte is equivalent to divide by 0x1000000. So that gives you Y.&lt;br /&gt;
The next lower (third) byte is then (x * 0xcccd / 0x10000) == (x * 52429 / 65536) =~ (x * 256/320).&lt;br /&gt;
And the lower two bytes are noise.&lt;br /&gt;
&lt;br /&gt;
== Use the entire register for a smaller opcode form ==&lt;br /&gt;
&lt;br /&gt;
As you know e.g. &amp;lt;code&amp;gt;add cl,1&amp;lt;/code&amp;gt; produces 3 Bytes of code while &amp;lt;code&amp;gt;inc cl&amp;lt;/code&amp;gt; compiles to 2 Bytes. If ch does not matter (or you know that it won't be affected) use &amp;lt;code&amp;gt;inc cx&amp;lt;/code&amp;gt; instead and get the most out of that 1 Byte. This is no real trick but sometimes such things can be overlooked - while the 2 saved Bytes could be invested wisely.&lt;br /&gt;
&lt;br /&gt;
== Use the carry flag in your calculations ==&lt;br /&gt;
&lt;br /&gt;
Let's say you have to &amp;lt;code&amp;gt;add si,128&amp;lt;/code&amp;gt;. Unfortunately this takes 1 Byte more than &amp;lt;code&amp;gt;add si,127&amp;lt;/code&amp;gt;. But you can add 128 without that extra Byte. If your previous code sets the carry flag simply include it into your calculation and &amp;lt;code&amp;gt;adc si,127&amp;lt;/code&amp;gt;. The same goes for &amp;lt;code&amp;gt;sub si,128&amp;lt;/code&amp;gt; vs &amp;lt;code&amp;gt;sbb si,127&amp;lt;/code&amp;gt;.&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=General_Coding_Tricks&amp;diff=1299</id>
		<title>General Coding Tricks</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=General_Coding_Tricks&amp;diff=1299"/>
				<updated>2024-02-13T17:16:53Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Need a constant? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Data is code, code is data ==&lt;br /&gt;
&lt;br /&gt;
Code is nothing more than data that the CPU interprets.  For example, consider this multi-byte instruction:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov ah,37h&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This assembles to &amp;lt;code&amp;gt;B4 37&amp;lt;/code&amp;gt;.  &amp;lt;code&amp;gt;B4&amp;lt;/code&amp;gt; by itself isn't interesting, but &amp;lt;code&amp;gt;37&amp;lt;/code&amp;gt; happens to be the opcode for &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt;.  Let's say you had this code before a loop, and you needed to perform &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt; at the top of a loop.  Rather than put AAS at the top of the loop, you can reuse the opcode that will already be there as part of the &amp;lt;code&amp;gt;mov ah,37&amp;lt;/code&amp;gt; that comes before it.  Just jump directly into the middle of the &amp;lt;code&amp;gt;mov ah,37h&amp;lt;/code&amp;gt;, which will get interpreted and executed as &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt;:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
&lt;br /&gt;
label:&lt;br /&gt;
        mov ah,37h&lt;br /&gt;
        ;misc. stuff&lt;br /&gt;
        loop label+1&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;+1&amp;lt;/code&amp;gt; specifies the jump should go to 1 byte past the actual location.&lt;br /&gt;
&lt;br /&gt;
=== Reuse ===&lt;br /&gt;
&lt;br /&gt;
You can use opcodes hidden in your existing data.  For example, .COM files can end with &amp;lt;code&amp;gt;RET&amp;lt;/code&amp;gt;, which is opcode &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt;.  If you already have a &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt; somewhere else in your code, even as part of data, just &amp;lt;code&amp;gt;JMP&amp;lt;/code&amp;gt; to that pre-existing &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt; instead of adding a &amp;lt;code&amp;gt;RET&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== If your environment holds you back, change it ==&lt;br /&gt;
&lt;br /&gt;
The [https://en.wikipedia.org/wiki/Video_Graphics_Array#/media/File:VGA_palette_with_black_borders.svg default MCGA palette] is fairly horrible, but can be size advantages to changing it:  While setting a new palette costs bytes, the new palette arrangement could save you headaches down the road.  For example, if your code is calculating pixel colors that fall into goofy ranges, rather than constantly adjust the colors to sane ranges (ie. aligned to powers of 2), just set the palette so that values falling into those ranges look the way you want.  (This assumes you have very small ways of redefining the palette, of course.)&lt;br /&gt;
&lt;br /&gt;
The above is maybe not the best example.  Rewrites to this section are welcome.&lt;br /&gt;
&lt;br /&gt;
== Need a constant? ==&lt;br /&gt;
&lt;br /&gt;
If you need a constant value but you're out of space, search your assembled code for a byte value you can use.&lt;br /&gt;
&lt;br /&gt;
If you need more than a byte, then another method could be to create some '''literal pool''' in memory which can be addressed for constants. This technique was first used inside &lt;br /&gt;
[https://www.pouet.net/prod.php?which=94080 TERRA256] (please update if there are earlier examples) and is especially useful if FPU instructions are used, since those can not address byte integer constants in memory but only word and dword. Different to byte constants, it is very unlikely to find word or dword constants inside the opcodes.&lt;br /&gt;
&lt;br /&gt;
The construction of the literal pool can look like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov cx, 255 ;very likely this can be replaced by a shorter version or even skipped; value of CX can also be larger than 255&lt;br /&gt;
initlp: ;very likely there are other things to initialize, like i.e. setting the DAC color palette&lt;br /&gt;
        push cx ;push 16-bit word to the stack with the current counter value&lt;br /&gt;
        loop initlp&lt;br /&gt;
        push cx ;optionally push final zero constant, if required&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
This example will construct a literal pool of 256 words on the stack, starting with 0x00FF and ending with 0x0000. The additional &amp;quot;push cx&amp;quot; after the loop will add a final zero constant, if required. The final zero also allows to exit from a COM executable using the &amp;quot;ret&amp;quot; instruction. &lt;br /&gt;
&lt;br /&gt;
Practically almost any existing initialization loop can be used to setup such a pool, why this construction in the best case only costs a single additional byte for the &amp;quot;push cx&amp;quot; instruction.&lt;br /&gt;
&lt;br /&gt;
To use the literal pool, some index register like SI, DI, BX or BP should be initialized to point to the literal pool. The init value can be the content of the stack pointer (SP) but also a fixed offset, since the position of the literal pool and the current stack position is typically well known. Also the initial value of DI (0xFFFE) or a zero offset may work. Accessing the literal pool can look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov bx, sp ;point index register to literal pool&lt;br /&gt;
        fild word [bx+0x13*2] ;load integer value 0x0013 into FPU register&lt;br /&gt;
        fild word [bx+0x31*2] ;load integer value 0x0031 into FPU register&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To address larger values, some tricks can be used to keep size low:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        fild word [bx+0x087*2]  ;take care that this instruction is 1 byte larger than the others&lt;br /&gt;
        fild word [bx+si+0x087*2-0x100] ;but this trick could help: load integer value 0x0087 into FPU register, assuming SI is 0x0100&lt;br /&gt;
        fild word [bx+0x03*2-1] ;load integer value 0x0300 into FPU register&lt;br /&gt;
        fidiv dword [bx+0x05*2-3] ;divide by integer value 0x05000400&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== A smaller way to point to Mode 13's screen segment ==&lt;br /&gt;
&lt;br /&gt;
Rather than &amp;lt;code&amp;gt;mov ah,0a0h; mov es,ax&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;push word 0a000h; pop es&amp;lt;/code&amp;gt;, try this 2-byte wonder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
les bp,[bx]&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This sets ES=9FFF, only one away from A000.  You can write to the screen with ES: this way as long as you are aware the segment is one paragraph (16 bytes) behind, so just increase your offset by 16 if you need exact placement.&lt;br /&gt;
&lt;br /&gt;
How does this work?   At start of execution of a .COM file, BX=0, and DS=CS. The contents of the COM file get loaded to offset 0x100 in that segment, but loaded before that is the PSP (program segment prefix), which DOS populates with information about the loaded program and other info. The PSP starts with &amp;lt;code&amp;gt;CD 20&amp;lt;/code&amp;gt; (&amp;lt;code&amp;gt;INT 20&amp;lt;/code&amp;gt;, which exits the program), so that's what gets loaded into BP.  The next word is the number of the last free conventional memory segment, typically 0x9fff (but can be something different if parts of the upper memory range are either not installed or allocated).&lt;br /&gt;
&lt;br /&gt;
'''Warning:''' This trick doesn't always work.  On FreeDos, this can set ES=9FE0 and there is something resident at that location that can screw up the system after normal program exit if you overwrite it.&lt;br /&gt;
&lt;br /&gt;
== Accessing the timer tick for free ==&lt;br /&gt;
&lt;br /&gt;
If using a 386+, FS=0 at .COM start.  So, &amp;lt;code&amp;gt;FS:[046C]&amp;lt;/code&amp;gt; gets you the DOS timer tick variable, which you could use for timing/pacing, or a random seed. Some environments, especially EMS/XMS programs, can modify the FS register, so it can't always assumed to be 0000h. &amp;lt;code&amp;gt;POP DS&amp;lt;/code&amp;gt; right after the start and accessing &amp;lt;code&amp;gt;[046C]&amp;lt;/code&amp;gt; does the trick then with equal size.&lt;br /&gt;
&lt;br /&gt;
== Looping twice ==&lt;br /&gt;
&lt;br /&gt;
If you need to repeat a section of code that doesn't modify the carry flag, and you know the carry flag is clear, you can loop once in only 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
looping:&lt;br /&gt;
        ;do stuff here&lt;br /&gt;
        cmc&lt;br /&gt;
        jc      looping&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Looping three times ==&lt;br /&gt;
&lt;br /&gt;
If you need to repeat a section of code and you have a register whose value is zero and can be incremented freely, or whose value is -1 and can be decremented freely, you can loop twice in only 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
looping:&lt;br /&gt;
        ;do stuff here&lt;br /&gt;
        inc     bx ;if decrementing instead, parity check must be reversed&lt;br /&gt;
        jpo     looping ;1 (01b) and 2 (10b) have odd parity, 3 (11b) has even parity&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Obtaining X and Y without DIV (The Rrrola Trick) ==&lt;br /&gt;
&lt;br /&gt;
In [[Output#Outputting_in_mode_13h_.28320x200.29|320x200 mode]], instead of constructing X and Y from the screen pointer DI with DIV, you can get a decent estimation with multiplying the screen pointer with 0xCCCD and read X and Y from the 8bit registers DH (+DL as 16bit value) and DL (+AH as 16bit value). The idea is to interpret DI as a kind of 16 bit fixed point in the range [0,1], from start to end. Multiplying this number in [0,1] with 65536 / 320 = 204,8 results in the row before the comma, and again as a kind of a fixed point, the column after the comma. The representation 0xCCCD is the nearest rounding of 204,8 * 256 ( = 52428,8 ~ 52429 = 0xCCCD). As long as the 16 bit representations are used, there is no precision loss.&lt;br /&gt;
&lt;br /&gt;
This is adapted from [http://www.pouet.net/prod.php?which=53816 &amp;quot;Puls&amp;quot; by Rrrola] where X and Y are directly modified on the stack by performing &amp;lt;code&amp;gt;add dword[di],0000CCCDh&amp;lt;/code&amp;gt; on each pixel iteration, which requires 7 bytes of code. The vertical alignment correction is solved with a good starting value on said DWORD on the stack before each frame, which requires 2 additional bytes. Both approaches are too different to directly compare, but share the core idea of [http://www.pouet.net/topic.php?which=8791&amp;amp;page=8#c411796 multiplying with &amp;lt;code&amp;gt;0xCCCD&amp;lt;/code&amp;gt;], so &amp;quot;Rrrolas trick&amp;quot; is an appropriate term to use.&lt;br /&gt;
&lt;br /&gt;
=== Alternative explanation by [https://news.ycombinator.com/user?id=pjc50 pjc50]===&lt;br /&gt;
[https://gistpreview.github.io/?9b252f267cd1fdf9754059bb73a18487 Interactive snippet]&lt;br /&gt;
More clearly: DI = (y * 320) + x&lt;br /&gt;
&lt;br /&gt;
Multiply by 0xCCCD =&amp;gt; (y * 0x1000040) + (x * 0xcccd)&lt;br /&gt;
&lt;br /&gt;
Take top byte is equivalent to divide by 0x1000000. So that gives you Y.&lt;br /&gt;
The next lower (third) byte is then (x * 0xcccd / 0x10000) == (x * 52429 / 65536) =~ (x * 256/320).&lt;br /&gt;
And the lower two bytes are noise.&lt;br /&gt;
&lt;br /&gt;
== Use the entire register for a smaller opcode form ==&lt;br /&gt;
&lt;br /&gt;
As you know e.g. &amp;lt;code&amp;gt;add cl,1&amp;lt;/code&amp;gt; produces 3 Bytes of code while &amp;lt;code&amp;gt;inc cl&amp;lt;/code&amp;gt; compiles to 2 Bytes. If ch does not matter (or you know that it won't be affected) use &amp;lt;code&amp;gt;inc cx&amp;lt;/code&amp;gt; instead and get the most out of that 1 Byte. This is no real trick but sometimes such things can be overlooked - while the 2 saved Bytes could be invested wisely.&lt;br /&gt;
&lt;br /&gt;
== Use the carry flag in your calculations ==&lt;br /&gt;
&lt;br /&gt;
Let's say you have to &amp;lt;code&amp;gt;add si,128&amp;lt;/code&amp;gt;. Unfortunately this takes 1 Byte more than &amp;lt;code&amp;gt;add si,127&amp;lt;/code&amp;gt;. But you can add 128 without that extra Byte. If your previous code sets the carry flag simply include it into your calculation and &amp;lt;code&amp;gt;adc si,127&amp;lt;/code&amp;gt;. The same goes for &amp;lt;code&amp;gt;sub si,128&amp;lt;/code&amp;gt; vs &amp;lt;code&amp;gt;sbb si,127&amp;lt;/code&amp;gt;.&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Floating-point_Opcodes&amp;diff=1298</id>
		<title>Floating-point Opcodes</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Floating-point_Opcodes&amp;diff=1298"/>
				<updated>2024-02-13T17:14:43Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* FPU Tricks */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The [https://en.wikipedia.org/wiki/Floating-point_unit FPU] offers a lot of complex and powerful floating-point operations, like &amp;lt;code&amp;gt;SIN&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;COS&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;ATAN&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;SQRT&amp;lt;/code&amp;gt;, etc. ([http://www.website.masmforum.com/tutorials/fptute/appen1.htm SIMPLY FPU] by Raymond Filiatreault has a compact overview of all FPU commands for the curious.)  Use of the FPU can drastically increase what you can do in a tiny production while simultaneously keeping code size down.  All x86 CPUs can use the FPU, although older CPUs (80486sx and earlier) need the FPU physically installed in the FPU socket.&lt;br /&gt;
&lt;br /&gt;
This guide assumes intermediate to advanced-level proficiency with sizecoding.&lt;br /&gt;
&lt;br /&gt;
== FPU Basics ==&lt;br /&gt;
&lt;br /&gt;
Usage and communication with the FPU is quite uncommon and takes a bit to get used to, so we'll start with a simple example.  This is what [[Output#Basic_Animation_and_user_interaction|our code example]] looks like originally:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;cwd             	; &amp;quot;clear&amp;quot; DX for perfect alignment&lt;br /&gt;
mov 	al,0x13&lt;br /&gt;
X: 		int 0x10	; set video mode AND draw pixel&lt;br /&gt;
mov 	ax,cx		; get column in AH&lt;br /&gt;
add		ax,di		; offset by framecounter	          &amp;lt;-- REPLACE THIS WITH FPU CODE&lt;br /&gt;
xor 	al,ah		; the famous XOR pattern&lt;br /&gt;
and 	al,32+8		; a more interesting variation of it&lt;br /&gt;
mov 	ah,0x0C		; set subfunction &amp;quot;set pixel&amp;quot; for int 0x10&lt;br /&gt;
loop 	X			; loop 65536 times&lt;br /&gt;
inc 	di			; increment framecounter&lt;br /&gt;
in 		al,0x60		; check keyboard ...&lt;br /&gt;
dec 	al			; ... for ESC&lt;br /&gt;
jnz 	X			; rinse and repeat&lt;br /&gt;
ret					; quit program&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and this is how the code looks if we replace the instruction with FPU code :&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;cwd             	; &amp;quot;clear&amp;quot; DX for perfect alignment&lt;br /&gt;
mov 	al,0x13&lt;br /&gt;
X: 		int 0x10	; set video mode AND draw pixel&lt;br /&gt;
mov 	ax,cx		; get &amp;quot;column&amp;quot; in AX&lt;br /&gt;
&lt;br /&gt;
fninit				; init FPU first&lt;br /&gt;
mov		[si],ax		; write first addend to a memory location&lt;br /&gt;
fild	word [si]	; F(pu) I(nteger) L(oad)D a WORD from memory location to the FPU stack&lt;br /&gt;
mov		[si],di		; write second addend to a memory location&lt;br /&gt;
fiadd	word [si]	; Directly add the word in the memory location to the top FPU stack&lt;br /&gt;
fist	word [si]	; F(pu) I(nteger) ST(ore) the result into a memory location&lt;br /&gt;
mov		ax,[si]		; Get the word from the memory location into AX&lt;br /&gt;
&lt;br /&gt;
xor 	al,ah		; the famous XOR pattern&lt;br /&gt;
and 	al,32+8		; a more interesting variation of it&lt;br /&gt;
mov 	ah,0x0C		; set subfunction &amp;quot;set pixel&amp;quot; for int 0x10&lt;br /&gt;
loop 	X			; loop 65536 times&lt;br /&gt;
inc 	di			; increment framecounter&lt;br /&gt;
in 		al,0x60		; check keyboard ...&lt;br /&gt;
dec 	al			; ... for ESC&lt;br /&gt;
jnz 	X			; rinse and repeat&lt;br /&gt;
ret					; quit program&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(Obviously that made the program bigger instead of smaller, but the point of this exercise is to illustrate how to work with the FPU.)  Looking at our changes, you get a sense of what usual interaction with the FPU is:&lt;br /&gt;
# &amp;lt;code&amp;gt;F(N)INIT&amp;lt;/code&amp;gt;: Initialize the FPU&lt;br /&gt;
# transfer values from CPU registers to memory location(s)&lt;br /&gt;
# transfer values from memory location(s) onto FPU stack&lt;br /&gt;
# do the actual calculations using the FPU (more on this soon)&lt;br /&gt;
# transfer result from the FPU stack into memory location(s)&lt;br /&gt;
# transfer result from memory location(s) back into registers&lt;br /&gt;
&lt;br /&gt;
That is a lot of extra code for a single integer addition, but that's a simple example; ''once more complex floating point operations are involved, it starts to pay off.''&lt;br /&gt;
&lt;br /&gt;
== Optimizing with the FPU ==&lt;br /&gt;
&lt;br /&gt;
=== Distance function example ===&lt;br /&gt;
&lt;br /&gt;
For more advanced FPU operation, let's start from scratch with an unoptimized program which plots the distance of each pixel to the screen center as color, in 49 bytes.&lt;br /&gt;
&lt;br /&gt;
[[File:Distance to center example.png|thumb|Distance to center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;push 	0a000h			&lt;br /&gt;
pop 	es				; get start of video memory in ES&lt;br /&gt;
mov 	al,0x13			; switch to video mode 13h&lt;br /&gt;
int 	0x10			; 320 * 200 in 256 colors&lt;br /&gt;
fninit					; -	&lt;br /&gt;
						; it's useful to comment what's on the&lt;br /&gt;
						; stack after each FPU operation&lt;br /&gt;
						; to not get lost ;) start is : empty (-)&lt;br /&gt;
X:&lt;br /&gt;
xor 	dx,dx			; reset the high word before division&lt;br /&gt;
mov 	bx,320			; 320 columns&lt;br /&gt;
mov 	ax,di			; get screen pointer in AX&lt;br /&gt;
div 	bx				; construct X,Y from screen pointer into AX,DX&lt;br /&gt;
sub 	ax,100			; subtract the origin&lt;br /&gt;
sub 	dx,160			; = (160,100) ... center of 320x200 screen	&lt;br /&gt;
mov 	[si],ax			; move X into a memory location&lt;br /&gt;
fild 	word [si]		; X&lt;br /&gt;
fmul 	st0				; X²&lt;br /&gt;
mov 	[si],dx			; move Y into a memory location&lt;br /&gt;
fild 	word [si]		; Y X²&lt;br /&gt;
fmul 	st0				; Y² X²&lt;br /&gt;
fadd 	st0,st1			; Y²+X²&lt;br /&gt;
fsqrt					; R&lt;br /&gt;
fistp 	word [si]		; -&lt;br /&gt;
mov 	ax,[si]			; get the result from memory&lt;br /&gt;
stosb					; write to screen (DI) and increment DI&lt;br /&gt;
jmp short X				; next pixel&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A few words on this : &lt;br /&gt;
* The FPU registers (st0, st1, ...) are organized as a stack. When you load something to the FPU, everything else will be moved one location further away from the top (implicitly!) Some FPU instructions work only on the top, other allow the explicit parametrization with arbitrary FPU registers.&lt;br /&gt;
* Depending on what you do, sometimes &amp;lt;code&amp;gt;F(N)INIT&amp;lt;/code&amp;gt; can be omitted. Real hardware will refuse to work more often than emulators, but it's always worth the try.&lt;br /&gt;
* Accessing memory (size) efficiently can be a real pain. The safest way is to reference absolute memory locations (f.e &amp;lt;code&amp;gt;[1234]&amp;lt;/code&amp;gt;) but that's two bytes more per instruction than referencing memory with &amp;lt;code&amp;gt;[BX]&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;[SI]&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;[BX+SI]&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;[BP+DI]&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;[BP+SI]&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;[DI]&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;[BX+DI]&amp;lt;/code&amp;gt;. When working with FPU and this ''classic'' approach of FPU communication, you have to design your codeflow to have one or some of these locations available.&lt;br /&gt;
* Accessing the memory is always with regard to the segment register &amp;lt;code&amp;gt;DS&amp;lt;/code&amp;gt; unless you perform segment overrides. When accessing memory with &amp;lt;code&amp;gt;[BP+??]&amp;lt;/code&amp;gt; be aware that the memory is accessed with regard to the segment register &amp;lt;code&amp;gt;SS&amp;lt;/code&amp;gt; (see [https://courses.engr.illinois.edu/ece390/books/artofasm/CH04/CH04-2.html#HEADING2-35 Register Indirect Addressing Modes])&lt;br /&gt;
* There are a few conventions which help you identify FPU commands. &amp;quot;i&amp;quot; stands for integer (WORD or DWORD), &amp;quot;p&amp;quot; means &amp;quot;pop stack afterwards&amp;quot;, so &amp;lt;code&amp;gt;FST&amp;lt;/code&amp;gt; means just &amp;quot;store&amp;quot; while &amp;lt;code&amp;gt;FISTP&amp;lt;/code&amp;gt; means &amp;quot;store as integer, then pop the stack&amp;quot;&lt;br /&gt;
&lt;br /&gt;
=== Stack addressing, &amp;quot;Rrrolas Trick&amp;quot;, alignment optimization, Aspect Ratio ===&lt;br /&gt;
&lt;br /&gt;
Now let's unleash the state of the art sizecoding arsenal onto this, to bring it down to 37 bytes (40 bytes with aspect correction)&lt;br /&gt;
&lt;br /&gt;
[[File:Distant to center optimized.png|thumb|Distance to center : optimized]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;push 	0a000h - 70		; modified to center to 160,100&lt;br /&gt;
aas						; aspect ratio constant part&lt;br /&gt;
pop 	es				; get start of video memory in ES&lt;br /&gt;
mov 	al,0x13			; switch to video mode 13h&lt;br /&gt;
int 	0x10			; 320 * 200 in 256 colors&lt;br /&gt;
X:&lt;br /&gt;
mov 	ax,0xCCCD		; perform the famous...&lt;br /&gt;
mul		di				; ... Rrrola trick =)&lt;br /&gt;
sub 	dh,[si]			; align vertically&lt;br /&gt;
pusha 					; push all registers on stack&lt;br /&gt;
fild 	word [bx-8]		; X&lt;br /&gt;
fmul 	st0				; X²&lt;br /&gt;
fild 	word [bx-9]		; Y X²&lt;br /&gt;
fmul	dword [bx+si]	; aspect ratio correction&lt;br /&gt;
fmul 	st0				; Y² X²&lt;br /&gt;
fadd 	st0,st1			; Y²+X²&lt;br /&gt;
fsqrt					; R&lt;br /&gt;
fistp 	dword [bx-5]	; -&lt;br /&gt;
popa					; pop all registers from stack&lt;br /&gt;
stosb					; write to screen (DI) and increment DI&lt;br /&gt;
jmp short X				; next pixel&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The resulting image is almost identical to to the former. Let's go through this step by step:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;push 	0a000h - 70&amp;lt;/code&amp;gt;&lt;br /&gt;
Instead of aligning horizontally with &amp;lt;code&amp;gt;sub	dx,160&amp;lt;/code&amp;gt; we can code this implicitly by moving our segment register ten units - that is 10 * 16 = 160 pixels - to the left (see [https://en.wikipedia.org/wiki/X86_memory_segmentation#Real_mode Real Mode Addressing]). With further multiple subtraction of 20 units - that is 320 pixels, we can shift the visible screen towards the top, to finetune vertical alignment. As long as this shift is no more than 4 lines ''( 65536 / 320 - 200 = 4,8 )'' there is no further visual impact.&lt;br /&gt;
* &amp;lt;code&amp;gt;aas&amp;lt;/code&amp;gt;&lt;br /&gt;
This is the high byte of a constant, placed in a way that &amp;lt;code&amp;gt;[SI]&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;[BX+SI]&amp;lt;/code&amp;gt; resolves to ~1.24 when read as 32bit float. The last byte of segment &amp;lt;code&amp;gt;ES&amp;lt;/code&amp;gt; is also of importance. Check yourself with the [http://www.h-schmidt.net/FloatConverter/IEEE754.html IEEE 754 Converter]&lt;br /&gt;
* &amp;lt;code&amp;gt;mov ax,0xCCCD&amp;lt;/code&amp;gt; &amp;amp; &amp;lt;code&amp;gt;mul di&amp;lt;/code&amp;gt; (The &amp;quot;Rrrola trick&amp;quot;)&lt;br /&gt;
Instead of constructing X and Y from the screen pointer &amp;lt;code&amp;gt;DI&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;DIV&amp;lt;/code&amp;gt; you can get a decent estimation with multiplying the screen pointer with &amp;lt;code&amp;gt;0xCCCD&amp;lt;/code&amp;gt; and read X and Y from the 8bit registers &amp;lt;code&amp;gt;DH&amp;lt;/code&amp;gt; (+DL as 16bit value) and &amp;lt;code&amp;gt;DL&amp;lt;/code&amp;gt; (+AH as 16bit value). The idea is to interpret &amp;lt;code&amp;gt;DI&amp;lt;/code&amp;gt; as a kind of 16 bit float in the range &amp;lt;code&amp;gt;[0,1]&amp;lt;/code&amp;gt;, from start to end. Multiplying this number in [0,1] with 65536 / 320 = 204,8 results in the row before the comma, and again as a kind of a float, the column after the comma. The representation &amp;lt;code&amp;gt;0xCCCD&amp;lt;/code&amp;gt; is the nearest rounding of 204,8 * 256 ( = 52428,8 ~ 52429 = 0xCCCD). As long as the 16 bit representations are used, there is no precision loss.&lt;br /&gt;
* &amp;lt;code&amp;gt;sub dh,[si]&amp;lt;/code&amp;gt;&lt;br /&gt;
The instruction at &amp;lt;code&amp;gt;[SI]&amp;lt;/code&amp;gt; is &amp;lt;code&amp;gt;push &amp;lt;word&amp;gt;&amp;lt;/code&amp;gt; and has the opcode &amp;lt;code&amp;gt;0x68&amp;lt;/code&amp;gt; which is 104 in decimal. Combined with the fine tuned vertical alignment above ( ~4 lines) this results in (virtually) subtracting 100 for perfect vertical alignment. This is one byte shorter than &amp;lt;code&amp;gt;sub dh,100&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;pusha / popa&amp;lt;/code&amp;gt;&lt;br /&gt;
Instead of going the ''classical'' way of communicating with the FPU, we push all the registers, read/write values with memory addressing to/from the FPU, then pop all registers again. This works when &amp;lt;code&amp;gt;DS&amp;lt;/code&amp;gt; = &amp;lt;code&amp;gt;SS&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;SP&amp;lt;/code&amp;gt; is &amp;quot;close enough&amp;quot; to &amp;lt;code&amp;gt;BX&amp;lt;/code&amp;gt; (initially zero and kept that way) to allow &amp;lt;code&amp;gt;[BX+&amp;lt;signed byte&amp;gt;]&amp;lt;/code&amp;gt; addressing. It comes with the special benefit of implicit 8bit shifts. One serious drawback is loss of precision, since the registers &amp;lt;code&amp;gt;DL&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;AH&amp;lt;/code&amp;gt; &amp;quot;lose connection&amp;quot; when using &amp;lt;code&amp;gt;PUSHA&amp;lt;/code&amp;gt; (see the order of registers : [http://x86.renejeschke.de/html/file_module_x86_id_270.html PUSHA/PUSHAD documentation]&lt;br /&gt;
*&amp;lt;code&amp;gt;fild word [bx+&amp;lt;signed byte&amp;gt;]&amp;lt;/code&amp;gt; &amp;amp; *&amp;lt;code&amp;gt;fistp dword [bx+&amp;lt;signed byte&amp;gt;]&amp;lt;/code&amp;gt;&lt;br /&gt;
This is the so called &amp;quot;stack addressing&amp;quot;. We assume that &amp;lt;code&amp;gt;BX=0&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;SP=0xFFFE&amp;lt;/code&amp;gt; at start, so we know where the registers are in memory after &amp;lt;code&amp;gt;pusha&amp;lt;/code&amp;gt; (AX at [BX-4], CX at [BX-6] etc.). It's important to realize that we work with signed 16 bit values now, in the full range of [-32768,32767]. That is also why we need &amp;lt;code&amp;gt;DWORD&amp;lt;/code&amp;gt; when storing the result : &amp;lt;code&amp;gt;sqrt(x²+y²)&amp;lt;/code&amp;gt; exceeds the signed 16bit range for quite some value pairs. Note that there are already implicit 8bit shifts (bx-9,bx-5)&lt;br /&gt;
*&amp;lt;code&amp;gt;fmul dword [bx+si]&amp;lt;/code&amp;gt;&lt;br /&gt;
With the &amp;quot;Rrrola&amp;quot; trick above, we have the row number to be 204 at maximum, but also the column can't be greater than 256. This results in a wrong aspect ratio, but it can almost completely be fixed with this two byte instruction (+ one byte for the &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt; instruction) : 256 * 1,24 = 317,44 which is quite close to 320. If aspect ratio is of no meaning to the desired effect, this three bytes can be shaved off. &lt;br /&gt;
&lt;br /&gt;
=== FPU Constants ===&lt;br /&gt;
Here is a list of FPU constants available on the X87 FPU as well as accessable from code or memory:&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
#define FLD0 0.0f&lt;br /&gt;
#define FLD1 1.0f&lt;br /&gt;
#define FLDPI M_PI&lt;br /&gt;
#define FLDL2T 3.32192809489f&lt;br /&gt;
#define FLDLN2 0.69314718056f&lt;br /&gt;
#define FLDLG2 0.301029995664f&lt;br /&gt;
#define FP_DSPREFIX 0.125f&lt;br /&gt;
#define FP_AAS 0.5f&lt;br /&gt;
#define FP_MOVAX 2.0f&lt;br /&gt;
#define FP_MOVBX 8.0f&lt;br /&gt;
#define FP_MOVCX 32.0f&lt;br /&gt;
#define FP_MOVDX 128.0f&lt;br /&gt;
#define FP_ZEROADDR 3.48975066878&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== The &amp;quot;Tunnel&amp;quot; ===&lt;br /&gt;
&lt;br /&gt;
Now let's add some features:&lt;br /&gt;
* extract angle as opposed to the distance and combine both&lt;br /&gt;
* reverse divide the distance to create the &amp;quot;tunnel&amp;quot; effect&lt;br /&gt;
* animate with smooth steps along the distance&lt;br /&gt;
* improve on the colors with subselecting from the standard palette&lt;br /&gt;
* quit the program on ESC&lt;br /&gt;
&lt;br /&gt;
This results in the following program with a size of 63 bytes :&lt;br /&gt;
&lt;br /&gt;
[[File:Tunnel effect neontube.png|thumb|tunnel effect neontube]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;push 	0xa000 - 10 - 3 * 20	; video base - 3.5 lines&lt;br /&gt;
or 		al, 0x13				; mode 13h = 320 x 200 in 256 colors&lt;br /&gt;
pop 	es						; get aligned video memory base&lt;br /&gt;
int 	0x10					; switch videomode&lt;br /&gt;
X: &lt;br /&gt;
sub		dh, [si]				; vertical alignment&lt;br /&gt;
pusha							; push all registers on stack&lt;br /&gt;
fild 	word	[bx-9]			; fpustack :  x&lt;br /&gt;
fild 	word	[bx-8]			; fpustack :  y  x&lt;br /&gt;
fpatan							; fpustack :  arc&lt;br /&gt;
fst 	st1						; fpustack :  arc  arc&lt;br /&gt;
fcos							; fpustack :  cos(arc)  arc&lt;br /&gt;
fimul	dword	[si]			; fpustack :  l*cos(arc)  arc&lt;br /&gt;
fidiv	word	[bx-8]			; fpustack :  l*cos(arc)/x  arc&lt;br /&gt;
fiadd	word	[bp+si]			; fpustack :  l*cos(arc)/x+offset  arc&lt;br /&gt;
fistp	dword	[bx-7]			; fpustack :  arc&lt;br /&gt;
fimul	word	[byte si+val]	; fpustack :  scaled_arc&lt;br /&gt;
fistp	word	[bx-5]			; fpustack :  -&lt;br /&gt;
popa							; pop all registers from stack&lt;br /&gt;
xor 	al, cl					; XOR scaled_arc with distance&lt;br /&gt;
and 	al, 16 + 8 + 2			; sub selecting palette part&lt;br /&gt;
stosb							; writing to screen&lt;br /&gt;
mov 	ax, 0xCCCD				; Performing the famous&lt;br /&gt;
mul 	di						; Rrrola trick&lt;br /&gt;
jo 		X						; next frame check&lt;br /&gt;
add 	word [bp+si], byte 23	; change offset smoothly&lt;br /&gt;
in 		al, 0x60				; check for ...&lt;br /&gt;
dec 	ax						; ...ESC key&lt;br /&gt;
jnz 	X						; otherwise continue&lt;br /&gt;
ret								; quit program&lt;br /&gt;
val:	dw 6519 				; n = 160 * 256 / pi / 2 ; 0x1977&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Many other tiny tunnel effects have been coded, so it is highly recommended to check out the documented source code of [http://web.archive.org/web/20050216133244/http://www.farb-rausch.de/ryg/tunnel.asm &amp;quot;Constant Evolution&amp;quot; by ryg/Farbrausch] and the [http://www.pouet.net/prod.php?which=29412 &amp;quot;Heart shaped tunnel&amp;quot; from Lord Kelvin], both with a size of 64 bytes. While &amp;quot;Constant Evolution&amp;quot; takes a slightly different route than the example here (''classic'' FPU communication, ''classic'' X Y construction, sqrt(x²+y²) instead of using &amp;lt;code&amp;gt;fcos&amp;lt;/code&amp;gt;+&amp;lt;code&amp;gt;fimul&amp;lt;/code&amp;gt;), &amp;quot;Heart shaped tunnel&amp;quot; uses no FPU at all. &lt;br /&gt;
&lt;br /&gt;
The takeaways from this example are:&lt;br /&gt;
&lt;br /&gt;
* Loading a constant from the code with some degrees of freedom&lt;br /&gt;
&amp;lt;code&amp;gt;fimul dword [si]&amp;lt;/code&amp;gt; multiplies with a 32 bit integer dividend for the tunnel effect. The highest byte of this constant points to our code, to the opcode from &amp;lt;code&amp;gt;or al, 0x13&amp;lt;/code&amp;gt;. This instruction puts &amp;lt;code&amp;gt;0x13&amp;lt;/code&amp;gt; into &amp;lt;code&amp;gt;al&amp;lt;/code&amp;gt;, and since there a lot of possibilities to achieve this, there is a direct way of changing the appearance of the tunnel with changing this instruction to one of the following : &amp;lt;code&amp;gt;mov al,0x13&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;add al,0x13&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;xor al,0x13&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;adc al,0x13&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sbb al,0xED&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;sub al,0xED&amp;lt;/code&amp;gt;. In this special case, the instruction can also be swapped with &amp;lt;code&amp;gt;pop es&amp;lt;/code&amp;gt; to gain a further degree of freedom.&lt;br /&gt;
&lt;br /&gt;
* normal loading of a constant which can't be reused as opcode&lt;br /&gt;
Although it's the ultimate goal to not even use a single extra byte for constants, sometimes the required sequence simply does not appear in the code. In this case, a constant is needed to convert the angle from the range [-pi,pi] to the color space in a way that no gaps appear while stepping from 359° to 0°. In the last line at &amp;lt;code&amp;gt;val:&amp;lt;/code&amp;gt; the value &amp;lt;code&amp;gt;160&amp;lt;/code&amp;gt; in the comment is 32 * 5 where 5 is number of &amp;quot;spiral arms&amp;quot; the tunnel effect has. The 8bit shift (*256) is to increase precision. It turns out that 16bit precision is enough to get a decent &amp;quot;gap closer&amp;quot; for values obtained by &amp;lt;code&amp;gt;fpatan&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
* operating directly on an indirect memory location without offset&lt;br /&gt;
The location &amp;lt;code&amp;gt;[bp+si]&amp;lt;/code&amp;gt; is used as animation variable, while both participatory registers are kept fixed (the value is &amp;lt;code&amp;gt;0x0A??&amp;lt;/code&amp;gt; and therefor way above our code). Since we work with 16bit values and the top 8bit are the measurement in pixels, the instruction &amp;lt;code&amp;gt;add word [bp+si], byte 23&amp;lt;/code&amp;gt; allows for sub pixel precision in animation, while occupying 3 bytes of space. Depending on the target hardware, this value 23 can be increased/decreased to achieve faster/smoother animation.&lt;br /&gt;
&lt;br /&gt;
* Optimizing the check for the next frame&lt;br /&gt;
Normally, there is a check like &amp;lt;code&amp;gt;test di,di&amp;lt;/code&amp;gt; with direct consecutive branch necessary. The used approach allows for direct branching after &amp;lt;code&amp;gt;mul di&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;jo&amp;lt;/code&amp;gt;, since the overflow flag is always but twice triggered for a frame. This saves two bytes, but also requires adjustment of the animation constant, because the animation constant is also added twice. A further benefit is that in one of these two cases, &amp;lt;code&amp;gt;AX&amp;lt;/code&amp;gt; is zero which save a further byte on the following ESC check (&amp;lt;code&amp;gt;dec ax&amp;lt;/code&amp;gt; instead of &amp;lt;code&amp;gt;dec al&amp;lt;/code&amp;gt; )&lt;br /&gt;
&lt;br /&gt;
=== Size optimizing the &amp;quot;Tunnel&amp;quot; ===&lt;br /&gt;
&lt;br /&gt;
Now if we abandon all the comfort, alignment, smoothness and convenience, and optimize this straight for size, we end up with a 52 byte version. This does not include the possible exclusion of color tuning (2 bytes), after all the effect is supposed to look at least somewhat appealing ;)&lt;br /&gt;
&lt;br /&gt;
[[File:Neontube 52b optimized.png|thumb|Neontube 52b optimized]]&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;mov		al,0x13				; mode 13h = 320 x 200 in 256 colors&lt;br /&gt;
int		0x10				; switch videomode&lt;br /&gt;
X: &lt;br /&gt;
or		al, [bp+si]			; *illusion* - executed ONCE&lt;br /&gt;
xor		al, 0x68			; *illusion* - executed ONCE&lt;br /&gt;
mov		dx, 0x79F			; *illusion* - executed ONCE&lt;br /&gt;
pusha						; push all registers on stack&lt;br /&gt;
fild 	word	[bx-9]		; x&lt;br /&gt;
fild 	word	[bx-8]		; y x&lt;br /&gt;
fpatan						; arc&lt;br /&gt;
fst 	st1					; arc arc&lt;br /&gt;
fcos						; cos(arc) arc&lt;br /&gt;
fimul	dword	[si]		; l*cos(arc) arc&lt;br /&gt;
fidiv	word	[bx-8]		; l*cos(arc)/x arc&lt;br /&gt;
fistp	dword	[bx-4]		; arc&lt;br /&gt;
fimul	word	[bx]		; scaled_arc&lt;br /&gt;
fistp	word	[bx-5]		; -&lt;br /&gt;
popa						; pop all registers from stack&lt;br /&gt;
sub		ah, [bp+si]			; animation along distance&lt;br /&gt;
xor		al, ah				; XOR scaled_arc with distance&lt;br /&gt;
and		al, 16 + 8 + 4		; sub palette selection&lt;br /&gt;
stosb						; write to screen, advance DI&lt;br /&gt;
mov		ax, 0xCCCD			; the famous&lt;br /&gt;
mul		di					; Rrrola trick&lt;br /&gt;
jmp 	short X-1			; *ODD* jump into &amp;quot;int 0x10&amp;quot;&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since this code contains an &amp;quot;odd jump&amp;quot; into the middle of the instruction &amp;lt;code&amp;gt;int 0x10&amp;lt;/code&amp;gt; it helps to disassemble the code from address &amp;lt;code&amp;gt;0x103&amp;lt;/code&amp;gt; on, until both code pathes realign at &amp;lt;code&amp;gt;pusha&amp;lt;/code&amp;gt; :&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;adc 	[bp+si],cl		; decrement framecounter&lt;br /&gt;
add 	dh,[si]			; vertical alignment&lt;br /&gt;
push 	word 0x9FBA		; video base - 3.5 lines&lt;br /&gt;
pop 	es				; get aligned video memory&lt;br /&gt;
						; continues with &amp;quot;pusha&amp;quot;&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Like before the &amp;lt;code&amp;gt;mul di&amp;lt;/code&amp;gt; instruction triggers the overflow flag - and the carry flag - always but twice per frame. &amp;lt;code&amp;gt;CL&amp;lt;/code&amp;gt; remains &amp;lt;code&amp;gt;0xFF&amp;lt;/code&amp;gt; unchanged from start, so &amp;lt;code&amp;gt;adc [bp+si],cl&amp;lt;/code&amp;gt; effectively decrements the framecounter twice per frame. &lt;br /&gt;
&lt;br /&gt;
The dividend for scaling the arc is now taken from the location &amp;lt;code&amp;gt;[bx]&amp;lt;/code&amp;gt;, which is &amp;lt;code&amp;gt;0x20CD&amp;lt;/code&amp;gt;. Calculating the number of &amp;quot;spirals&amp;quot; backwards from this number (*pi*2/256/32) gives about 6.44 which is close enough to 6.5, so that the visual gap in the top is almost not recognizable - besides from the miscoloring which can be fixed by removing the 16 from &amp;lt;code&amp;gt;and al, 16 + 8 + 4&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
It's noteworthy that the location &amp;lt;code&amp;gt;[si]&amp;lt;/code&amp;gt; does not contain a good offset anymore, since &amp;lt;code&amp;gt;push &amp;lt;word&amp;gt;&amp;lt;/code&amp;gt; moved away from the top, to help create *illusion* code that not only realigns soon in the second code path but also does not modify registers and memory environment in a hindering way (only &amp;lt;code&amp;gt;AL&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;DX&amp;lt;/code&amp;gt; are modified)&lt;br /&gt;
&lt;br /&gt;
The takeaway here is to study the modbytes of instructions that are very likely to appear in tiny intros, as well as using their constants as code. &amp;lt;code&amp;gt;int 0x10&amp;lt;/code&amp;gt; will almost always be in your code, so there is always &amp;lt;code&amp;gt;adc byte[??], 8bit_reg&amp;lt;/code&amp;gt; too in between.&lt;br /&gt;
&lt;br /&gt;
Finally, you might have noticed that these tunnels don't &amp;quot;spin&amp;quot;. For the cost of two bytes this can be easily added. &amp;lt;code&amp;gt;add		al, [bp+si]&amp;lt;/code&amp;gt; before applying XOR in the last example would offset the &amp;lt;code&amp;gt;arc&amp;lt;/code&amp;gt; with the framecounter and produce a spinning effect. For a coherent visual experience the sub palette selection must be changed to &amp;lt;code&amp;gt;and al, 8 + 4&amp;lt;/code&amp;gt;, too.&lt;br /&gt;
&lt;br /&gt;
== FPU Tricks ==&lt;br /&gt;
&lt;br /&gt;
=== Need a constant? ===&lt;br /&gt;
If you need an integer constant for your FPU operation, also see: [[General_Coding_Tricks#Need_a_constant.3F]]&lt;br /&gt;
&lt;br /&gt;
=== Comparing 2 float numbers on the FPU ===&lt;br /&gt;
&lt;br /&gt;
Sometimes when doing fpu algebra, you need to compare 2 float numbers using the FPU (for example when calculating an intersection between a point and an object). &lt;br /&gt;
&lt;br /&gt;
To do so will take a few steps:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
fcomp      ; compare the contents of ST0 and ST1&lt;br /&gt;
fstsw ax   ; copy the fpu flags to ax (ah)&lt;br /&gt;
sahf       ; copy the contens of AH register to the CPU-flags&lt;br /&gt;
jb inside  ; you can now check the cpu flags and jump accordingly&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If a Pentium Pro instruction set is available or emulated (Real DOS setups or Dosbox-X), you can also use the FCOMI instruction instead to do the same thing, which saves a few bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
fcomip&lt;br /&gt;
jb inside &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Truncate float numbers / get fractional part ===&lt;br /&gt;
May be you would want to have the truncated number of your floating point value for some purpose or you would want the fractional part. In that case an instruction comes in handy that was added quite late to the FPU. It came with the SSE3 insutrction set and is called &amp;lt;code&amp;gt;fisttp&amp;lt;/code&amp;gt; (Store Integer with Truncation). &lt;br /&gt;
&lt;br /&gt;
To get the fractional part you can do it like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
fild st0              ;duplicate your number =&amp;gt; st0=f, st1=f &lt;br /&gt;
fisttp dword[si]      ;INT(f) - store truncated value somewhere in memory&lt;br /&gt;
fild dword[si]        ;load truncated value&lt;br /&gt;
fsubp st1,st0         ;calculate f-INT(f) &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
Generally it's shorter and faster than fiddling arount with rounding mode and &amp;lt;code&amp;gt;frndint&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
=== Simple Floor Casting ===&lt;br /&gt;
Another classic effect that works well with the FPU is a simple Perspective Floor Caster. &lt;br /&gt;
While this effect can also be achieved with just CPU code, using the FPU gives a bit finer control over things like camera height and other things.&lt;br /&gt;
&lt;br /&gt;
Here is the basic code for the effect as used in e.g. [https://www.pouet.net/prod.php?which=85681 Rush, by: Marquee Design]. &lt;br /&gt;
&lt;br /&gt;
[[File:Fpu_floor.png|thumb]]&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
fninit&lt;br /&gt;
fild word [bx-8]	; load y-value &lt;br /&gt;
fidiv word [viewheight] ; scalingvalue/height to plane&lt;br /&gt;
fabs 			; abs(y)&lt;br /&gt;
fild word [bx-9]        ; x abs(y)&lt;br /&gt;
fdiv st1                ; x/abs(y) abs(y)&lt;br /&gt;
fistp word [bx-4]       ; store U (plane x) to ax&lt;br /&gt;
fidivr word [floorval]  ; 32767/abs(y) &lt;br /&gt;
fist word [bx-6]        ; store V (plane y) to cx		&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following constants are used in the above calculation:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
floorval dw 32767&lt;br /&gt;
viewheight dw 10 &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Needless to say, the code above is simplified and unoptimised or educational purposes. &lt;br /&gt;
It can be optimised further using various tricks mentioned above and integrated easily into already existing FPU calculations for example to save space.&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=General_Coding_Tricks&amp;diff=1297</id>
		<title>General Coding Tricks</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=General_Coding_Tricks&amp;diff=1297"/>
				<updated>2024-02-13T17:04:05Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Need a constant? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Data is code, code is data ==&lt;br /&gt;
&lt;br /&gt;
Code is nothing more than data that the CPU interprets.  For example, consider this multi-byte instruction:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov ah,37h&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This assembles to &amp;lt;code&amp;gt;B4 37&amp;lt;/code&amp;gt;.  &amp;lt;code&amp;gt;B4&amp;lt;/code&amp;gt; by itself isn't interesting, but &amp;lt;code&amp;gt;37&amp;lt;/code&amp;gt; happens to be the opcode for &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt;.  Let's say you had this code before a loop, and you needed to perform &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt; at the top of a loop.  Rather than put AAS at the top of the loop, you can reuse the opcode that will already be there as part of the &amp;lt;code&amp;gt;mov ah,37&amp;lt;/code&amp;gt; that comes before it.  Just jump directly into the middle of the &amp;lt;code&amp;gt;mov ah,37h&amp;lt;/code&amp;gt;, which will get interpreted and executed as &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt;:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
&lt;br /&gt;
label:&lt;br /&gt;
        mov ah,37h&lt;br /&gt;
        ;misc. stuff&lt;br /&gt;
        loop label+1&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;+1&amp;lt;/code&amp;gt; specifies the jump should go to 1 byte past the actual location.&lt;br /&gt;
&lt;br /&gt;
=== Reuse ===&lt;br /&gt;
&lt;br /&gt;
You can use opcodes hidden in your existing data.  For example, .COM files can end with &amp;lt;code&amp;gt;RET&amp;lt;/code&amp;gt;, which is opcode &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt;.  If you already have a &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt; somewhere else in your code, even as part of data, just &amp;lt;code&amp;gt;JMP&amp;lt;/code&amp;gt; to that pre-existing &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt; instead of adding a &amp;lt;code&amp;gt;RET&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== If your environment holds you back, change it ==&lt;br /&gt;
&lt;br /&gt;
The [https://en.wikipedia.org/wiki/Video_Graphics_Array#/media/File:VGA_palette_with_black_borders.svg default MCGA palette] is fairly horrible, but can be size advantages to changing it:  While setting a new palette costs bytes, the new palette arrangement could save you headaches down the road.  For example, if your code is calculating pixel colors that fall into goofy ranges, rather than constantly adjust the colors to sane ranges (ie. aligned to powers of 2), just set the palette so that values falling into those ranges look the way you want.  (This assumes you have very small ways of redefining the palette, of course.)&lt;br /&gt;
&lt;br /&gt;
The above is maybe not the best example.  Rewrites to this section are welcome.&lt;br /&gt;
&lt;br /&gt;
== Need a constant? ==&lt;br /&gt;
&lt;br /&gt;
If you need a constant value but you're out of space, search your assembled code for a byte value you can use.&lt;br /&gt;
&lt;br /&gt;
If you need more than a byte, then another method could be to create some '''literal pool''' in memory which can be addressed for constants. This technique was first used inside &lt;br /&gt;
[https://www.pouet.net/prod.php?which=94080 TERRA256] (please update if there are earlier examples) and is especially useful if FPU instructions are used, since those can not address byte integer constants in memory but only word and dword. Different to byte constants, it is very unlikely to find word or dword constants inside the opcodes.&lt;br /&gt;
&lt;br /&gt;
The construction of the literal pool can look like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov cx, 255 ;very likely this can be replaced by a shorter version or even skipped; value of CX can also be larger than 255&lt;br /&gt;
initlp: ;very likely there are other things to initialize, like i.e. setting the DAC color palette&lt;br /&gt;
        push cx ;push 16-bit word to the stack with the current counter value&lt;br /&gt;
        loop initlp&lt;br /&gt;
        push cx ;optionally push final zero constant, if required&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
This example will construct a literal pool of 256 words on the stack, starting with 0x00FF and ending with 0x0000. The additional &amp;quot;push cx&amp;quot; after the loop is used to add a final zero constant, if required. The final zero also allows to exit from a COM executable using the &amp;quot;ret&amp;quot; instruction. &lt;br /&gt;
&lt;br /&gt;
Practically almost any existing initialization loop can be used to setup such a pool, why this construction in the best case only costs a single additional byte for the &amp;quot;push cx&amp;quot; instruction.&lt;br /&gt;
&lt;br /&gt;
To use the literal pool, some index register like SI, DI, BX or BP should be initialized to point to the literal pool. The init value can be the content of the stack pointer (SP) but also a fixed offset, since the position of the literal pool and the current stack position is typically well known. Also the initial value of DI (0xFFFE) or a zero offset may work. Accessing the literal pool can look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov bx, sp ;point index register to literal pool&lt;br /&gt;
        fild word [bx+0x13*2] ;load integer value 0x0013 into FPU register&lt;br /&gt;
        fild word [bx+0x31*2] ;load integer value 0x0031 into FPU register&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To address larger values, some tricks can be used to keep size low:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        fild word [bx+0x087*2]  ;take care that this instruction is 1 byte larger than the others&lt;br /&gt;
        fild word [bx+si+0x087*2-0x100] ;but this trick could help: load integer value 0x0087 into FPU register, assuming SI is 0x0100&lt;br /&gt;
        fild word [bx+0x03*2-1] ;load integer value 0x0300 into FPU register&lt;br /&gt;
        fidiv dword [bx+0x05*2-3] ;divide by integer value 0x05000400&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== A smaller way to point to Mode 13's screen segment ==&lt;br /&gt;
&lt;br /&gt;
Rather than &amp;lt;code&amp;gt;mov ah,0a0h; mov es,ax&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;push word 0a000h; pop es&amp;lt;/code&amp;gt;, try this 2-byte wonder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
les bp,[bx]&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This sets ES=9FFF, only one away from A000.  You can write to the screen with ES: this way as long as you are aware the segment is one paragraph (16 bytes) behind, so just increase your offset by 16 if you need exact placement.&lt;br /&gt;
&lt;br /&gt;
How does this work?   At start of execution of a .COM file, BX=0, and DS=CS. The contents of the COM file get loaded to offset 0x100 in that segment, but loaded before that is the PSP (program segment prefix), which DOS populates with information about the loaded program and other info. The PSP starts with &amp;lt;code&amp;gt;CD 20&amp;lt;/code&amp;gt; (&amp;lt;code&amp;gt;INT 20&amp;lt;/code&amp;gt;, which exits the program), so that's what gets loaded into BP.  The next word is the number of the last free conventional memory segment, typically 0x9fff (but can be something different if parts of the upper memory range are either not installed or allocated).&lt;br /&gt;
&lt;br /&gt;
'''Warning:''' This trick doesn't always work.  On FreeDos, this can set ES=9FE0 and there is something resident at that location that can screw up the system after normal program exit if you overwrite it.&lt;br /&gt;
&lt;br /&gt;
== Accessing the timer tick for free ==&lt;br /&gt;
&lt;br /&gt;
If using a 386+, FS=0 at .COM start.  So, &amp;lt;code&amp;gt;FS:[046C]&amp;lt;/code&amp;gt; gets you the DOS timer tick variable, which you could use for timing/pacing, or a random seed. Some environments, especially EMS/XMS programs, can modify the FS register, so it can't always assumed to be 0000h. &amp;lt;code&amp;gt;POP DS&amp;lt;/code&amp;gt; right after the start and accessing &amp;lt;code&amp;gt;[046C]&amp;lt;/code&amp;gt; does the trick then with equal size.&lt;br /&gt;
&lt;br /&gt;
== Looping twice ==&lt;br /&gt;
&lt;br /&gt;
If you need to repeat a section of code that doesn't modify the carry flag, and you know the carry flag is clear, you can loop once in only 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
looping:&lt;br /&gt;
        ;do stuff here&lt;br /&gt;
        cmc&lt;br /&gt;
        jc      looping&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Looping three times ==&lt;br /&gt;
&lt;br /&gt;
If you need to repeat a section of code and you have a register whose value is zero and can be incremented freely, or whose value is -1 and can be decremented freely, you can loop twice in only 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
looping:&lt;br /&gt;
        ;do stuff here&lt;br /&gt;
        inc     bx ;if decrementing instead, parity check must be reversed&lt;br /&gt;
        jpo     looping ;1 (01b) and 2 (10b) have odd parity, 3 (11b) has even parity&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Obtaining X and Y without DIV (The Rrrola Trick) ==&lt;br /&gt;
&lt;br /&gt;
In [[Output#Outputting_in_mode_13h_.28320x200.29|320x200 mode]], instead of constructing X and Y from the screen pointer DI with DIV, you can get a decent estimation with multiplying the screen pointer with 0xCCCD and read X and Y from the 8bit registers DH (+DL as 16bit value) and DL (+AH as 16bit value). The idea is to interpret DI as a kind of 16 bit fixed point in the range [0,1], from start to end. Multiplying this number in [0,1] with 65536 / 320 = 204,8 results in the row before the comma, and again as a kind of a fixed point, the column after the comma. The representation 0xCCCD is the nearest rounding of 204,8 * 256 ( = 52428,8 ~ 52429 = 0xCCCD). As long as the 16 bit representations are used, there is no precision loss.&lt;br /&gt;
&lt;br /&gt;
This is adapted from [http://www.pouet.net/prod.php?which=53816 &amp;quot;Puls&amp;quot; by Rrrola] where X and Y are directly modified on the stack by performing &amp;lt;code&amp;gt;add dword[di],0000CCCDh&amp;lt;/code&amp;gt; on each pixel iteration, which requires 7 bytes of code. The vertical alignment correction is solved with a good starting value on said DWORD on the stack before each frame, which requires 2 additional bytes. Both approaches are too different to directly compare, but share the core idea of [http://www.pouet.net/topic.php?which=8791&amp;amp;page=8#c411796 multiplying with &amp;lt;code&amp;gt;0xCCCD&amp;lt;/code&amp;gt;], so &amp;quot;Rrrolas trick&amp;quot; is an appropriate term to use.&lt;br /&gt;
&lt;br /&gt;
=== Alternative explanation by [https://news.ycombinator.com/user?id=pjc50 pjc50]===&lt;br /&gt;
[https://gistpreview.github.io/?9b252f267cd1fdf9754059bb73a18487 Interactive snippet]&lt;br /&gt;
More clearly: DI = (y * 320) + x&lt;br /&gt;
&lt;br /&gt;
Multiply by 0xCCCD =&amp;gt; (y * 0x1000040) + (x * 0xcccd)&lt;br /&gt;
&lt;br /&gt;
Take top byte is equivalent to divide by 0x1000000. So that gives you Y.&lt;br /&gt;
The next lower (third) byte is then (x * 0xcccd / 0x10000) == (x * 52429 / 65536) =~ (x * 256/320).&lt;br /&gt;
And the lower two bytes are noise.&lt;br /&gt;
&lt;br /&gt;
== Use the entire register for a smaller opcode form ==&lt;br /&gt;
&lt;br /&gt;
As you know e.g. &amp;lt;code&amp;gt;add cl,1&amp;lt;/code&amp;gt; produces 3 Bytes of code while &amp;lt;code&amp;gt;inc cl&amp;lt;/code&amp;gt; compiles to 2 Bytes. If ch does not matter (or you know that it won't be affected) use &amp;lt;code&amp;gt;inc cx&amp;lt;/code&amp;gt; instead and get the most out of that 1 Byte. This is no real trick but sometimes such things can be overlooked - while the 2 saved Bytes could be invested wisely.&lt;br /&gt;
&lt;br /&gt;
== Use the carry flag in your calculations ==&lt;br /&gt;
&lt;br /&gt;
Let's say you have to &amp;lt;code&amp;gt;add si,128&amp;lt;/code&amp;gt;. Unfortunately this takes 1 Byte more than &amp;lt;code&amp;gt;add si,127&amp;lt;/code&amp;gt;. But you can add 128 without that extra Byte. If your previous code sets the carry flag simply include it into your calculation and &amp;lt;code&amp;gt;adc si,127&amp;lt;/code&amp;gt;. The same goes for &amp;lt;code&amp;gt;sub si,128&amp;lt;/code&amp;gt; vs &amp;lt;code&amp;gt;sbb si,127&amp;lt;/code&amp;gt;.&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=General_Coding_Tricks&amp;diff=1296</id>
		<title>General Coding Tricks</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=General_Coding_Tricks&amp;diff=1296"/>
				<updated>2024-02-13T17:01:28Z</updated>
		
		<summary type="html">&lt;p&gt;WiRe: /* Need a constant? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Data is code, code is data ==&lt;br /&gt;
&lt;br /&gt;
Code is nothing more than data that the CPU interprets.  For example, consider this multi-byte instruction:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov ah,37h&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This assembles to &amp;lt;code&amp;gt;B4 37&amp;lt;/code&amp;gt;.  &amp;lt;code&amp;gt;B4&amp;lt;/code&amp;gt; by itself isn't interesting, but &amp;lt;code&amp;gt;37&amp;lt;/code&amp;gt; happens to be the opcode for &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt;.  Let's say you had this code before a loop, and you needed to perform &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt; at the top of a loop.  Rather than put AAS at the top of the loop, you can reuse the opcode that will already be there as part of the &amp;lt;code&amp;gt;mov ah,37&amp;lt;/code&amp;gt; that comes before it.  Just jump directly into the middle of the &amp;lt;code&amp;gt;mov ah,37h&amp;lt;/code&amp;gt;, which will get interpreted and executed as &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt;:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
&lt;br /&gt;
label:&lt;br /&gt;
        mov ah,37h&lt;br /&gt;
        ;misc. stuff&lt;br /&gt;
        loop label+1&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;+1&amp;lt;/code&amp;gt; specifies the jump should go to 1 byte past the actual location.&lt;br /&gt;
&lt;br /&gt;
=== Reuse ===&lt;br /&gt;
&lt;br /&gt;
You can use opcodes hidden in your existing data.  For example, .COM files can end with &amp;lt;code&amp;gt;RET&amp;lt;/code&amp;gt;, which is opcode &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt;.  If you already have a &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt; somewhere else in your code, even as part of data, just &amp;lt;code&amp;gt;JMP&amp;lt;/code&amp;gt; to that pre-existing &amp;lt;code&amp;gt;C3&amp;lt;/code&amp;gt; instead of adding a &amp;lt;code&amp;gt;RET&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== If your environment holds you back, change it ==&lt;br /&gt;
&lt;br /&gt;
The [https://en.wikipedia.org/wiki/Video_Graphics_Array#/media/File:VGA_palette_with_black_borders.svg default MCGA palette] is fairly horrible, but can be size advantages to changing it:  While setting a new palette costs bytes, the new palette arrangement could save you headaches down the road.  For example, if your code is calculating pixel colors that fall into goofy ranges, rather than constantly adjust the colors to sane ranges (ie. aligned to powers of 2), just set the palette so that values falling into those ranges look the way you want.  (This assumes you have very small ways of redefining the palette, of course.)&lt;br /&gt;
&lt;br /&gt;
The above is maybe not the best example.  Rewrites to this section are welcome.&lt;br /&gt;
&lt;br /&gt;
== Need a constant? ==&lt;br /&gt;
&lt;br /&gt;
If you need a constant value but you're out of space, search your assembled code for a byte value you can use.&lt;br /&gt;
&lt;br /&gt;
If you need more than a byte, then another method could be to create some '''literal pool''' in memory which can be addressed for constants. This technique was first used inside &lt;br /&gt;
[https://www.pouet.net/prod.php?which=94080 TERRA256] (please update if there are earlier examples) and is especially useful if FPU instructions are used, since those can not address byte integer constants in memory but only word and dword. Different to byte constants, it is very unlikely to find word or dword constants inside the opcodes.&lt;br /&gt;
&lt;br /&gt;
The construction of the literal pool can look like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov cx, 255 ;very likely this can be replaced by a shorter version or even skipped; value of CX can also be larger than 255&lt;br /&gt;
initlp: ;very likely there are other things to initialize, like i.e. setting the DAC color palette&lt;br /&gt;
        push cx ;push 16-bit word to the stack with the current counter value&lt;br /&gt;
        loop initlp&lt;br /&gt;
        push cx ;optionally push final zero constant, if required&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
This example will construct a literal pool of 256 words on the stack, starting with 0x00FF and ending with 0x0000. The additional &amp;quot;push cx&amp;quot; after the loop is used to add a final zero constant, if required. The final zero also allows to exit from a COM executable using the &amp;quot;ret&amp;quot; instruction. &lt;br /&gt;
&lt;br /&gt;
Practically almost any existing initialization loop can be used to setup such a pool, why this construction in the best case only costs a single additional byte for the &amp;quot;push cx&amp;quot; instruction.&lt;br /&gt;
&lt;br /&gt;
To use the literal pool, some index register like SI, DI, BX or BP should be initialized to point to the literal pool. The init value can be the content of the stack pointer (SP) but also a fixed offset, since the position of the literal pool and the current stack position is typically well known. Also the initial value of DI (0xFFFE) or a zero offset may work. Accessing the literal pool can look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        mov bx, sp ;point index register to literal pool&lt;br /&gt;
        fild word [bx+0x13*2] ;load integer value 0x0013 into FPU register&lt;br /&gt;
        fild word [bx+0x31*2] ;load integer value 0x0031 into FPU register&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To address larger values, some tricks can be used to keep size low:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
        fild word [bx+0x087*2]  ;take care that this instruction is 1 byte larger than the others&lt;br /&gt;
        fild word [bx+si+0x087*2-0x100] ;but this trick could help: load integer value 0x0087 into FPU register, assuming SI is 0x0100&lt;br /&gt;
        fild word [bx+0x03*2-1] ;load integer value 0x0300 into FPU register&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== A smaller way to point to Mode 13's screen segment ==&lt;br /&gt;
&lt;br /&gt;
Rather than &amp;lt;code&amp;gt;mov ah,0a0h; mov es,ax&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;push word 0a000h; pop es&amp;lt;/code&amp;gt;, try this 2-byte wonder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
les bp,[bx]&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This sets ES=9FFF, only one away from A000.  You can write to the screen with ES: this way as long as you are aware the segment is one paragraph (16 bytes) behind, so just increase your offset by 16 if you need exact placement.&lt;br /&gt;
&lt;br /&gt;
How does this work?   At start of execution of a .COM file, BX=0, and DS=CS. The contents of the COM file get loaded to offset 0x100 in that segment, but loaded before that is the PSP (program segment prefix), which DOS populates with information about the loaded program and other info. The PSP starts with &amp;lt;code&amp;gt;CD 20&amp;lt;/code&amp;gt; (&amp;lt;code&amp;gt;INT 20&amp;lt;/code&amp;gt;, which exits the program), so that's what gets loaded into BP.  The next word is the number of the last free conventional memory segment, typically 0x9fff (but can be something different if parts of the upper memory range are either not installed or allocated).&lt;br /&gt;
&lt;br /&gt;
'''Warning:''' This trick doesn't always work.  On FreeDos, this can set ES=9FE0 and there is something resident at that location that can screw up the system after normal program exit if you overwrite it.&lt;br /&gt;
&lt;br /&gt;
== Accessing the timer tick for free ==&lt;br /&gt;
&lt;br /&gt;
If using a 386+, FS=0 at .COM start.  So, &amp;lt;code&amp;gt;FS:[046C]&amp;lt;/code&amp;gt; gets you the DOS timer tick variable, which you could use for timing/pacing, or a random seed. Some environments, especially EMS/XMS programs, can modify the FS register, so it can't always assumed to be 0000h. &amp;lt;code&amp;gt;POP DS&amp;lt;/code&amp;gt; right after the start and accessing &amp;lt;code&amp;gt;[046C]&amp;lt;/code&amp;gt; does the trick then with equal size.&lt;br /&gt;
&lt;br /&gt;
== Looping twice ==&lt;br /&gt;
&lt;br /&gt;
If you need to repeat a section of code that doesn't modify the carry flag, and you know the carry flag is clear, you can loop once in only 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
looping:&lt;br /&gt;
        ;do stuff here&lt;br /&gt;
        cmc&lt;br /&gt;
        jc      looping&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Looping three times ==&lt;br /&gt;
&lt;br /&gt;
If you need to repeat a section of code and you have a register whose value is zero and can be incremented freely, or whose value is -1 and can be decremented freely, you can loop twice in only 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=nasm&amp;gt;&lt;br /&gt;
looping:&lt;br /&gt;
        ;do stuff here&lt;br /&gt;
        inc     bx ;if decrementing instead, parity check must be reversed&lt;br /&gt;
        jpo     looping ;1 (01b) and 2 (10b) have odd parity, 3 (11b) has even parity&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Obtaining X and Y without DIV (The Rrrola Trick) ==&lt;br /&gt;
&lt;br /&gt;
In [[Output#Outputting_in_mode_13h_.28320x200.29|320x200 mode]], instead of constructing X and Y from the screen pointer DI with DIV, you can get a decent estimation with multiplying the screen pointer with 0xCCCD and read X and Y from the 8bit registers DH (+DL as 16bit value) and DL (+AH as 16bit value). The idea is to interpret DI as a kind of 16 bit fixed point in the range [0,1], from start to end. Multiplying this number in [0,1] with 65536 / 320 = 204,8 results in the row before the comma, and again as a kind of a fixed point, the column after the comma. The representation 0xCCCD is the nearest rounding of 204,8 * 256 ( = 52428,8 ~ 52429 = 0xCCCD). As long as the 16 bit representations are used, there is no precision loss.&lt;br /&gt;
&lt;br /&gt;
This is adapted from [http://www.pouet.net/prod.php?which=53816 &amp;quot;Puls&amp;quot; by Rrrola] where X and Y are directly modified on the stack by performing &amp;lt;code&amp;gt;add dword[di],0000CCCDh&amp;lt;/code&amp;gt; on each pixel iteration, which requires 7 bytes of code. The vertical alignment correction is solved with a good starting value on said DWORD on the stack before each frame, which requires 2 additional bytes. Both approaches are too different to directly compare, but share the core idea of [http://www.pouet.net/topic.php?which=8791&amp;amp;page=8#c411796 multiplying with &amp;lt;code&amp;gt;0xCCCD&amp;lt;/code&amp;gt;], so &amp;quot;Rrrolas trick&amp;quot; is an appropriate term to use.&lt;br /&gt;
&lt;br /&gt;
=== Alternative explanation by [https://news.ycombinator.com/user?id=pjc50 pjc50]===&lt;br /&gt;
[https://gistpreview.github.io/?9b252f267cd1fdf9754059bb73a18487 Interactive snippet]&lt;br /&gt;
More clearly: DI = (y * 320) + x&lt;br /&gt;
&lt;br /&gt;
Multiply by 0xCCCD =&amp;gt; (y * 0x1000040) + (x * 0xcccd)&lt;br /&gt;
&lt;br /&gt;
Take top byte is equivalent to divide by 0x1000000. So that gives you Y.&lt;br /&gt;
The next lower (third) byte is then (x * 0xcccd / 0x10000) == (x * 52429 / 65536) =~ (x * 256/320).&lt;br /&gt;
And the lower two bytes are noise.&lt;br /&gt;
&lt;br /&gt;
== Use the entire register for a smaller opcode form ==&lt;br /&gt;
&lt;br /&gt;
As you know e.g. &amp;lt;code&amp;gt;add cl,1&amp;lt;/code&amp;gt; produces 3 Bytes of code while &amp;lt;code&amp;gt;inc cl&amp;lt;/code&amp;gt; compiles to 2 Bytes. If ch does not matter (or you know that it won't be affected) use &amp;lt;code&amp;gt;inc cx&amp;lt;/code&amp;gt; instead and get the most out of that 1 Byte. This is no real trick but sometimes such things can be overlooked - while the 2 saved Bytes could be invested wisely.&lt;br /&gt;
&lt;br /&gt;
== Use the carry flag in your calculations ==&lt;br /&gt;
&lt;br /&gt;
Let's say you have to &amp;lt;code&amp;gt;add si,128&amp;lt;/code&amp;gt;. Unfortunately this takes 1 Byte more than &amp;lt;code&amp;gt;add si,127&amp;lt;/code&amp;gt;. But you can add 128 without that extra Byte. If your previous code sets the carry flag simply include it into your calculation and &amp;lt;code&amp;gt;adc si,127&amp;lt;/code&amp;gt;. The same goes for &amp;lt;code&amp;gt;sub si,128&amp;lt;/code&amp;gt; vs &amp;lt;code&amp;gt;sbb si,127&amp;lt;/code&amp;gt;.&lt;/div&gt;</summary>
		<author><name>WiRe</name></author>	</entry>

	</feed>