Difference between revisions of "Output"

From SizeCoding
Jump to: navigation, search
(Producing sound)
(COVOX output (aka LPT DAC))
 
(52 intermediate revisions by 6 users not shown)
Line 5: Line 5:
 
=== Outputting in Textmode (80x25) ===
 
=== Outputting in Textmode (80x25) ===
  
First of, the obligatory "Hello World" program, using a [http://www.ctyme.com/intr/rb-2562.htm "high level" MS-DOS function]. With a small optimization already included ( using <code>XCHG BP,AX</code> instead of <code>MOV AH,09h</code>) , this snippet is 20 bytes in size.
+
==== Hello World / High Level function ====
 +
 
 +
Here's an obligatory "Hello World" program in text mode, using a [http://www.ctyme.com/intr/rb-2562.htm "high level" MS-DOS function]. With a small optimization already included (using <code>XCHG BP,AX</code> instead of <code>MOV AH,09h</code>), this snippet is 20 bytes in size.
  
 
[[File:Hello world.png|thumb|Hello World!]]
 
[[File:Hello world.png|thumb|Hello World!]]
Line 20: Line 22:
  
  
Of course, this get's shorter with each byte you remove from the text itself. Now let's look into arbitrary screen access. Right after the start of your program you are in mode 3, that is 80x25 in 16 colors. See the [http://www.columbia.edu/~em36/wpdos/videomodes.txt Video Modes List] [[File:Drawchar example.png|thumb|draw char example]] So, to show something on the screen, you would need to set a segment register to 0xB800, then write values into this segment.  
+
Of course, this gets shorter with each byte you remove from the text itself. Now let's look into arbitrary screen access. Right after the start of your program you are in mode 3, that is 80x25 in 16 colors. See the [http://www.columbia.edu/~em36/wpdos/videomodes.txt Video Modes List] [[File:Drawchar example.png|thumb|draw char example]] So, to show something on the screen, you would need to set a segment register to 0xB800, then write values into this segment.  
 
 
  
 +
==== Low level access ====
  
 
The following three snippets showcase how to draw a red smiley in three different ways. All example snippets are meant to be standalone programs, starting with the first instruction and nothing before it. The target coordinate (40,12) is about the middle of the screen. We need a multiplier 2 since one char needs two bytes in memory (char and color is a byte each). The high byte 0x04 means red (4) on black (0) while the 0x01 is the first ASCII char - a smiley.
 
The following three snippets showcase how to draw a red smiley in three different ways. All example snippets are meant to be standalone programs, starting with the first instruction and nothing before it. The target coordinate (40,12) is about the middle of the screen. We need a multiplier 2 since one char needs two bytes in memory (char and color is a byte each). The high byte 0x04 means red (4) on black (0) while the 0x01 is the first ASCII char - a smiley.
 
 
  
 
<syntaxhighlight lang="nasm">push 0xb800
 
<syntaxhighlight lang="nasm">push 0xb800
Line 60: Line 60:
  
 
That's also four bytes, but it already has the <code>stosb</code> opcode (for putting something onto the screen) integrated and even one slot free for another one-byte-instruction. It works because <code>SI</code> initially points to the start of our code, and <code>stosb</code> has the hexadecimal representation of <code>0AAh</code>. After the first command, the segment register <code>ES</code> contains the value <code>0AA90h</code>. If you repeatedly write something to the screen with <code>stosb</code> you will eventually reach the <code>0B800h</code> segment and chars will appear on the screen. With a careful selection of the free one-byte-opcode you can also reintroduce some alignment. This works also with the <code>stosw</code> opcode <code>0ABh</code>.
 
That's also four bytes, but it already has the <code>stosb</code> opcode (for putting something onto the screen) integrated and even one slot free for another one-byte-instruction. It works because <code>SI</code> initially points to the start of our code, and <code>stosb</code> has the hexadecimal representation of <code>0AAh</code>. After the first command, the segment register <code>ES</code> contains the value <code>0AA90h</code>. If you repeatedly write something to the screen with <code>stosb</code> you will eventually reach the <code>0B800h</code> segment and chars will appear on the screen. With a careful selection of the free one-byte-opcode you can also reintroduce some alignment. This works also with the <code>stosw</code> opcode <code>0ABh</code>.
 +
 +
==== Alternative high level functions  ====
  
 
Besides the direct way of accessing memory there are also other ways of bringing char to the screen (f.e)
 
Besides the direct way of accessing memory there are also other ways of bringing char to the screen (f.e)
Line 67: Line 69:
  
 
=== Outputting in mode 13h (320x200) ===
 
=== Outputting in mode 13h (320x200) ===
 +
 +
==== Basic pixel output ====
  
 
The videomemory for mode 13h is located at segment 0xA000, so you need to assign this value to a segment register. Also, after the start of your program you are normally still in textmode, so you need to switch to the videomode. The following snippet does both:
 
The videomemory for mode 13h is located at segment 0xA000, so you need to assign this value to a segment register. Also, after the start of your program you are normally still in textmode, so you need to switch to the videomode. The following snippet does both:
  
<syntaxhighlight lang="nasm">  
+
<syntaxhighlight lang="nasm">mov al,0x13  
mov al,0x13  
 
 
int 0x10    ; AH = 0 means : set video mode to AL = 0x13 (320 x 200 pixels in 256 colors)
 
int 0x10    ; AH = 0 means : set video mode to AL = 0x13 (320 x 200 pixels in 256 colors)
 
push 0xA000  ; put value on the stack
 
push 0xA000  ; put value on the stack
pop es      ; pop the stack into segment register ES
+
pop es      ; pop the top stack value into segment register ES</syntaxhighlight>
</syntaxhighlight>
 
  
 
You're free to use any of the segment register / opcode combinations to write to the screen
 
You're free to use any of the segment register / opcode combinations to write to the screen
Line 85: Line 87:
 
[[File:Mode13h-example-xor.png|left|bottom|thumb|mode13h-example-xor]]
 
[[File:Mode13h-example-xor.png|left|bottom|thumb|mode13h-example-xor]]
  
<syntaxhighlight lang="nasm">  
+
<syntaxhighlight lang="nasm">mov al,0x13
mov al,0x13
 
 
int 0x10
 
int 0x10
 
push 0xa000
 
push 0xa000
Line 97: Line 98:
 
and al,32+8 ; a more interesting variation of it
 
and al,32+8 ; a more interesting variation of it
 
stosb ; finally, draw to the screen
 
stosb ; finally, draw to the screen
jmp short X ; rinse and repeat
+
jmp short X ; rinse and repeat</syntaxhighlight>
</syntaxhighlight>
 
  
  
 
Note that there is a different way of preparing the segment register, instead of :
 
Note that there is a different way of preparing the segment register, instead of :
<syntaxhighlight lang="nasm">  
+
<syntaxhighlight lang="nasm">push 0xa000
push 0xa000
+
pop es</syntaxhighlight>
pop es
 
</syntaxhighlight>
 
 
you can also do :
 
you can also do :
<syntaxhighlight lang="nasm">  
+
<syntaxhighlight lang="nasm">mov ah,0xA0
mov ah,0xA0
+
mov es,ax</syntaxhighlight>
mov es,ax
 
</syntaxhighlight>
 
 
both variations occupy 4 bytes, but the latter is executable on processor architectures where ''push <word>'' is not available.
 
both variations occupy 4 bytes, but the latter is executable on processor architectures where ''push <word>'' is not available.
 +
 +
==== Alternative way of pixel plotting and optimization ====
  
 
Now let's optimize on the snippet. First, we can adapt the "LES" trick from the textmode section. We just exchange  
 
Now let's optimize on the snippet. First, we can adapt the "LES" trick from the textmode section. We just exchange  
<syntaxhighlight lang="nasm">  
+
<syntaxhighlight lang="nasm">push 0xa000
push 0xa000
+
pop es</syntaxhighlight>
pop es
 
</syntaxhighlight>
 
 
with:
 
with:
<syntaxhighlight lang="nasm">  
+
<syntaxhighlight lang="nasm">les bx,[bx]</syntaxhighlight>
les bx,[bx]
 
</syntaxhighlight>
 
 
to save two bytes. This works because BX is 0x0000 at start and thus, accesses the region ''before'' our code, which is called [https://en.wikipedia.org/wiki/Program_Segment_Prefix Program Segment Prefix]. The two bytes that are put into the segment register ES are bytes 2 and 3  = ''"Segment of the first byte beyond the memory allocated to the program"'' which is usually 0x9FFF. That is just off by one to our desired 0xA000. Unfortunately that means a 16 pixel offset, so if screen alignment means something to you, you can't use this optimization. Also, said two bytes are not always 0x9FFF; for example, if resident programs are above the ''"memory allocated to the program"'' (FreeDos), their content is overwritten if we take their base as our video memory base.
 
to save two bytes. This works because BX is 0x0000 at start and thus, accesses the region ''before'' our code, which is called [https://en.wikipedia.org/wiki/Program_Segment_Prefix Program Segment Prefix]. The two bytes that are put into the segment register ES are bytes 2 and 3  = ''"Segment of the first byte beyond the memory allocated to the program"'' which is usually 0x9FFF. That is just off by one to our desired 0xA000. Unfortunately that means a 16 pixel offset, so if screen alignment means something to you, you can't use this optimization. Also, said two bytes are not always 0x9FFF; for example, if resident programs are above the ''"memory allocated to the program"'' (FreeDos), their content is overwritten if we take their base as our video memory base.
  
 
Second, we can use an alternative way of putting pixels to the screen, subfunction AH = 0x0C of int 0x10. Also, instead of constructing row and column from the screen pointer, we can use some interesting properties of the screenwidth regarding logical operations. This results in the following 16 byte program:
 
Second, we can use an alternative way of putting pixels to the screen, subfunction AH = 0x0C of int 0x10. Also, instead of constructing row and column from the screen pointer, we can use some interesting properties of the screenwidth regarding logical operations. This results in the following 16 byte program:
  
<syntaxhighlight lang="nasm">  
+
<syntaxhighlight lang="nasm">cwd            ; "clear" DX for perfect alignment
cwd            ; "clear" DX for perfect alignment
 
 
mov al,0x13
 
mov al,0x13
 
X: int 0x10 ; set video mode AND draw pixel
 
X: int 0x10 ; set video mode AND draw pixel
Line 135: Line 128:
 
mov ah,0x0C ; set subfunction "set pixel" for int 0x10
 
mov ah,0x0C ; set subfunction "set pixel" for int 0x10
 
and al,32+8 ; a more interesting variation of it
 
and al,32+8 ; a more interesting variation of it
jmp short X ; rinse and repeat
+
jmp short X ; rinse and repeat</syntaxhighlight>
</syntaxhighlight>
 
  
 
The first optimization is the double usage of the same "int 0x10" as setting the videomode and drawing the pixel. The subfunction AH = 0x0C expects row and column in DX and CX. Since the screenwidth is 320, which is 5 * 64, we can ignore the row and just works with the column, if we use logical operations and just use bit 0-6 of the result. The subfunction AH = 0x0C allows for unbounded column values in CX (up to 65535) and correctly "wraps" it internally without an error.
 
The first optimization is the double usage of the same "int 0x10" as setting the videomode and drawing the pixel. The subfunction AH = 0x0C expects row and column in DX and CX. Since the screenwidth is 320, which is 5 * 64, we can ignore the row and just works with the column, if we use logical operations and just use bit 0-6 of the result. The subfunction AH = 0x0C allows for unbounded column values in CX (up to 65535) and correctly "wraps" it internally without an error.
Line 142: Line 134:
 
The major drawback of the "subfunction AH = 0x0C" approach is performance loss. While DosBox and many emulators perform just fine, real hardware will draw much much slower based on the Video BIOS.
 
The major drawback of the "subfunction AH = 0x0C" approach is performance loss. While DosBox and many emulators perform just fine, real hardware will draw much much slower based on the Video BIOS.
  
 +
==== Basic animation and user interaction ====
  
Now let's add the convenient check for the ESC key and also add a simple animation. The <code>BP</code> register is used as frame counter and incremented after the pixel counter <code>CX</code> ran through all 65536 values via <code>LOOP</code>. This frame counter is then added to the column. The resulting program is now 25 bytes in size :
+
Now let's add the convenient check for the ESC key and also add a simple animation. The <code>DI</code> register is used as frame counter and incremented after the pixel counter <code>CX</code> ran through all 65536 values via <code>LOOP</code>. This frame counter is then added to the column. The resulting program is now 25 bytes in size :
  
 
[[File:Xor anim example.gif|thumb]]
 
[[File:Xor anim example.gif|thumb]]
  
<syntaxhighlight lang="nasm">  
+
<syntaxhighlight lang="nasm">cwd            ; "clear" DX for perfect alignment
cwd            ; "clear" DX for perfect alignment
 
 
mov al,0x13
 
mov al,0x13
 
X: int 0x10 ; set video mode AND draw pixel
 
X: int 0x10 ; set video mode AND draw pixel
 
mov ax,cx ; get column in AH
 
mov ax,cx ; get column in AH
add ax,bp ; offset by framecounter
+
add ax,di ; offset by framecounter
 
xor al,ah ; the famous XOR pattern
 
xor al,ah ; the famous XOR pattern
 
and al,32+8 ; a more interesting variation of it
 
and al,32+8 ; a more interesting variation of it
 
mov ah,0x0C ; set subfunction "set pixel" for int 0x10
 
mov ah,0x0C ; set subfunction "set pixel" for int 0x10
 
loop X ; loop 65536 times
 
loop X ; loop 65536 times
inc bp ; increment framecounter
+
inc di ; increment framecounter
 
in al,0x60 ; check keyboard ...
 
in al,0x60 ; check keyboard ...
 
dec al ; ... for ESC
 
dec al ; ... for ESC
 
jnz X ; rinse and repeat
 
jnz X ; rinse and repeat
ret ; quit program
+
ret ; quit program</syntaxhighlight>
 +
 
 +
( ↑ This example is the blueprint in the [[Floating-point_Opcodes#FPU_Basics| FPU Basics Section]].)
 +
 
 +
=== Using Custom Colors ===
 +
 
 +
==== Shades of Hue within the Default VGA palette ====
 +
 
 +
You might have noticed there is a bit of structure to [https://i.stack.imgur.com/OSZ6D.png the default VGA Palette], which you can exploit for some interesting results. Looking at the pallete there is a rainbow of different hue values that start at index 32 that are repeated in a slightly different luma seperated by 72 indices. If you are okay with limiting the amount of shades you need, you can get a small colorramp for all kinds of hue values by simply calculating your color-index like this:
 +
 
 +
<code>color=((shade%3)*72)+32+huevalue</code>
 +
 
 +
For an example of how this looks for all kinds of hue values, see [https://www.pouet.net/prod.php?which=63520 Popcast] by Hellmood/Desire.
 +
 
 +
==== Setting a Custom Palette ====
 +
 
 +
Sometimes, when the [https://i.stack.imgur.com/OSZ6D.png Default VGA Palette] doesn't quite match the look you are looking for, it can be useful to set your own palette using the VGA registers, the basic setup loop looks like this:
 +
 
 +
<syntaxhighlight lang="nasm">
 +
palloop:
 +
mov ax,cx
 +
mov dx,0x3c8
 +
out dx,al    ; select palette color
 +
inc dx
 +
out dx,al    ; write red value (0..63)
 +
out dx,al    ; write green value (0..63)
 +
out dx,al    ; write blue value (0..63)
 +
loop palloop
 
</syntaxhighlight>
 
</syntaxhighlight>
 +
The above code sets a simple grayscale palette, assumes CX Register to be at 0) and is compatible with all DOS platforms.
 +
In some cases you can ommit the <code>mov dx,0x3c8, out dx,al, inc dx</code> and directly access the data register by just using <code>mov dx,0x3c9</code> instead.
 +
 +
To get more interesting colors than just grayscale, you can alter the value of the AL register in between setting the red, green and blue values.  For example by shifting, adding, substracting or performing logical operations. Just get creative and check if the result is sufficient for your usecase.
 +
 +
TomCat will show the most common color palettes grouped by functionality. Check his article: [https://abaddon.hu/256b/colors.html Colors (in tiny intros)]
  
 
== Producing sound ==
 
== Producing sound ==
Line 168: Line 193:
 
=== MIDI notes ===
 
=== MIDI notes ===
  
Creating sounds with MIDI requires a bit more preparation, but once you're familiar with it, it's even simpler than PC Speaker sound, because you basically don't have to ''create'' the sound, you just have to ''trigger'' it. For the start, you have to know, that there is a lot of [https://en.wikipedia.org/wiki/General_MIDI different instruments] and a [https://www.midi.org/specifications/item/table-1-summary-of-midi-message defined way of communication]. Imagine the MIDI interface like a keyboard, you tell it which button/key you want to press, which knob to twist, and sometimes, how hard.
+
Creating sounds with MIDI requires a bit more preparation, but once you're familiar with it, it's even simpler than PC Speaker sound, because you basically don't have to ''create'' the sound, you just have to ''trigger'' it. For the start, you have to know, that there is a lot of [https://en.wikipedia.org/wiki/General_MIDI different instruments] and a [https://www.midi.org/specifications/item/table-1-summary-of-midi-message defined way of communication]. Imagine the MIDI interface like a keyboard, you tell it which button/key you want to press, which knob to twist, and sometimes, how hard. Per default, the active instrument is the [https://en.wikipedia.org/wiki/Piano Acoustic Grand Piano].
 +
 
 +
==== Single piano note ====
  
 
Let's start of with a simple example, playing a single note on the piano :
 
Let's start of with a simple example, playing a single note on the piano :
Line 186: Line 213:
  
 
In short: you turn your keyboard on (switching to UART mode), then press a KEY with a certain VOLUME on channel ZERO, then exit. Besides switching to UART mode, all this communication uses the port <code>330h</code>. This example will work on DosBox but not on Windows XP NTVDM: for [http://www.pouet.net/topic.php?which=10720&page=1 still unclear reasons], the NTVDM emulation delays the note until it receives a second one. The simplest way of at least hearing something is to repeatedly play notes, like in the following example :
 
In short: you turn your keyboard on (switching to UART mode), then press a KEY with a certain VOLUME on channel ZERO, then exit. Besides switching to UART mode, all this communication uses the port <code>330h</code>. This example will work on DosBox but not on Windows XP NTVDM: for [http://www.pouet.net/topic.php?which=10720&page=1 still unclear reasons], the NTVDM emulation delays the note until it receives a second one. The simplest way of at least hearing something is to repeatedly play notes, like in the following example :
 +
 +
==== Repeated piano notes ====
  
 
<syntaxhighlight lang="nasm">mov al, 3Fh ; set UART mode - command
 
<syntaxhighlight lang="nasm">mov al, 3Fh ; set UART mode - command
Line 198: Line 227:
 
mov al, 67h ; data byte 2 : VOLUME = 67h
 
mov al, 67h ; data byte 2 : VOLUME = 67h
 
out dx, al ; send !
 
out dx, al ; send !
 +
mov cl,5
 
_wait:
 
_wait:
mov al, [fs:0x46c] ; read timer
+
hlt
test al, 3 ; skip 3 values
+
loop _wait
jnz _wait ;
 
inc byte [fs:0x46c] ; inc manually to prevent retrigger
 
 
in al, 0x60 ; check for ESC
 
in al, 0x60 ; check for ESC
 
dec al ;
 
dec al ;
Line 211: Line 239:
 
↑ This is the previous example, enriched with synchronizing against the timer and checking for the ESC key. It works on both DosBox and Windows XP NTVDM and plays a note on the Piano repeatedly.
 
↑ This is the previous example, enriched with synchronizing against the timer and checking for the ESC key. It works on both DosBox and Windows XP NTVDM and plays a note on the Piano repeatedly.
  
While hitting one key repeatedly is not really interesting in general, it can produce decent results when doing it with the right instrument activated, like it was done with the "French Horn" in [https://www.youtube.com/watch?v=tsbxdjP9l50 Timelord (by Baudsurfer)]. Appart from just changing the instrument, let's also optimize a little bit on the size :  
+
==== Repeated notes of other instruments ====
 +
 
 +
While hitting one key repeatedly is not really interesting in general, it can produce decent results when doing it with the right instrument activated, like it was done with the "French Horn" in [https://www.youtube.com/watch?v=tsbxdjP9l50 Timelord (by Baudsurfer)]. Apart from just changing the instrument, let's also optimize a little bit on the size:  
  
 
<syntaxhighlight lang="nasm">org 100h
 
<syntaxhighlight lang="nasm">org 100h
Line 223: Line 253:
 
_wait:
 
_wait:
 
mov al,[fs:0x46c] ; read timer value
 
mov al,[fs:0x46c] ; read timer value
test al,1 ; check parity
+
cmp al,bl ; wait until...
jnz _wait ; wait ...
+
jz _wait ; ...timer value changed
inc byte [fs:0x46c] ; increment manually to not retrigger
+
xchg bx,ax ; save old timer value
 
in al,0x60 ; check for ...
 
in al,0x60 ; check for ...
 
dec al ; ... ESC key
 
dec al ; ... ESC key
Line 239: Line 269:
 
db 93h ; play note on channel 3
 
db 93h ; play note on channel 3
 
db 35 ; deep "b" = note number 35
 
db 35 ; deep "b" = note number 35
db 110 ; play with volume = 110
+
db 127 ; play with volume = 127
 
db 3fh ; change mode to "UART"
 
db 3fh ; change mode to "UART"
 
db 0b3h ; control change on channel 3
 
db 0b3h ; control change on channel 3
Line 247: Line 277:
 
↑ This is the previous example, with changed instrument, structuring the MIDI data into a data section, optimizing the output with the usage of <code>outsb</code> instead of <code>out dx,al</code>, and finalizing the program with a special command to turn ''All Notes Off''. This is necessary for all instruments which don't stop by themself. In all the previous examples, we sent the "NOTE ON" command (<code>9Xh</code>), but not the according "NOTE OFF" command (<code>8Xh</code>). Also, the note is now played on channel <code>03h</code>, since the commandbyte for changing an instrument on channel 3 is <code>0C3h</code> which is also <code>RET</code> and can be reused. If this looks complicated at first, always remember, it's just sending [https://www.midi.org/specifications/item/table-1-summary-of-midi-message defined commands] to a single port.
 
↑ This is the previous example, with changed instrument, structuring the MIDI data into a data section, optimizing the output with the usage of <code>outsb</code> instead of <code>out dx,al</code>, and finalizing the program with a special command to turn ''All Notes Off''. This is necessary for all instruments which don't stop by themself. In all the previous examples, we sent the "NOTE ON" command (<code>9Xh</code>), but not the according "NOTE OFF" command (<code>8Xh</code>). Also, the note is now played on channel <code>03h</code>, since the commandbyte for changing an instrument on channel 3 is <code>0C3h</code> which is also <code>RET</code> and can be reused. If this looks complicated at first, always remember, it's just sending [https://www.midi.org/specifications/item/table-1-summary-of-midi-message defined commands] to a single port.
  
Now, that you're aware that there are different channels (overall: 16) to play notes on, how would you like a channel <code>09h</code> specifically for 'Drums' ? The following example plays a track of drum notes repeatedly, while further optimizing for size :
+
==== The drum channel ====
 +
 
 +
Now, that you're aware that there are different channels (overall: 16) to play notes on, how would you like a channel <code>09h</code> specifically for 'Drums' ? [http://www.voidaudio.net/percussion.html Ten different drumsets] with dozens of samples are available out of the box. Per default, the "Standard Kit" is active. The following example plays a track of drum notes repeatedly, while further optimizing for size :
  
 
<syntaxhighlight lang="nasm">org 100h
 
<syntaxhighlight lang="nasm">org 100h
Line 280: Line 312:
 
In contrast to the previous example, the data section is now at the start. That means, it's executed as code! This is dangerous of course, but also saves bytes on assigning the <code>DATA</code> offset to <code>SI</code>. Once <code>outsb</code> incremented <code>SI</code> initially two times, it is fixed and further reading from the drumdata is done with <code>[BX+SI]</code>. Unless you know exactly what you are doing, don't use that kind of "executing data" optimization!". In this special case <code>AAS</code> and <code>CWD</code> do no harm and the drum notes <code>42,38,42,35</code> are carefully crafted and arranged to resemble the instruction <code>SUB AH,[232Ah]</code> which does no harm either.
 
In contrast to the previous example, the data section is now at the start. That means, it's executed as code! This is dangerous of course, but also saves bytes on assigning the <code>DATA</code> offset to <code>SI</code>. Once <code>outsb</code> incremented <code>SI</code> initially two times, it is fixed and further reading from the drumdata is done with <code>[BX+SI]</code>. Unless you know exactly what you are doing, don't use that kind of "executing data" optimization!". In this special case <code>AAS</code> and <code>CWD</code> do no harm and the drum notes <code>42,38,42,35</code> are carefully crafted and arranged to resemble the instruction <code>SUB AH,[232Ah]</code> which does no harm either.
  
 +
==== Further Midi instrument tuning by controllers and pitch ====
 +
 +
If you are familiar with hardware synthesizers you'll definitely remember the typical pitch bend or modulation wheels beside the keys, usually two of them. Those are commonly assigned to a vibrato/tremolo effect and a +/-pitch to tune the played note. You can also use those functions in your intro code to affect the currently played midi instrument note.
 +
 +
To access these parameters the coding follows the usual midi programming like you can see here:
 +
 +
<syntaxhighlight lang="nasm">mov al,10110000b ;Controller command on Midi channel 0
 +
out dx,al
 +
mov al,00000001b ;0...127 data byte 1 => '1' is the code for the modulation wheel typically assigned to vibrato/tremolo
 +
out dx,al
 +
mov al,01111111b ;0...127 data byte 2 => e.g. '01111111' => Maximum vibrato level
 +
out dx,al</syntaxhighlight>
 +
 +
In that example the maximum vibrato level is assigned to any instrument played on midi channel 0. This effect was used in the [http://www.mikusite.de/x86/cryscom.zip Crystal Comet 128 Byte intro by Kuemmel].
 +
 +
For pitch bend the code would be like:
 +
 +
<syntaxhighlight lang="nasm">mov al,11100000b ;Pitch bend command on Midi channel 0
 +
out dx,al
 +
mov al,0lllllllb ;0...127 data byte 1 => LSB value for pitch
 +
out dx,al
 +
mov al,0mmmmmmmb ;0...127 data byte 2 => MSB value for pitch
 +
out dx,al </syntaxhighlight>
 +
 +
Pitch bend uses a 14 Bit value. The center is at 0x2000 (meaning no pitch). Numbers from 0x2000 up to 0x3fff increase the pitch and from 0x2000 down to 0x0000 will decrease it. The range of 0x2000 should refer to 2 semitones. So you can bend +/- 2 semitones. Please be aware that those values must be converted to two 7 Bit values. Therefore e.g. 0x3000 would be 0x60 (MSB) and 0x00 (LSB).
 +
 +
Of course there are more midi controller options, e.g. you could change the stereo pan level. As a reference and for more detailed information please have a look at this [http://www.music-software-development.com/midi-tutorial.html Midi tutorial page].
 +
==== Creating basic sound effects in 16 bytes ====
 +
 +
In the [https://en.wikipedia.org/wiki/General_MIDI#Sound_Effects MIDI repertoire], there are already some sound effects available. With the "data execution" optimization above, let's fire a gunshot in 16 bytes :
 +
 +
<syntaxhighlight lang="nasm">aas
 +
les di,[bx-0x6C]
 +
xor al,127
 +
mov dx,0x331
 +
outsb
 +
dec dx
 +
mov cl,5
 +
rep outsb
 +
ret</syntaxhighlight>
 +
 +
The first three instructions don't do anything (they do, but we don't care), it's just MIDI data.
 +
* <code>aas</code>
 +
the command for switching to "UART" mode, for sending to port <code>0x331</code>
 +
* <code>les di,[bx-0x6C]</code>
 +
assembles to <code>0xc4</code> (change instrument on channel 4), <code>0x7F</code> (change it to "Gunshot"), <code>0x94</code> (play note on channel 4)
 +
* <code>xor al,127</code>
 +
assembles to <code>0x34</code> (play THIS note), <code>0x7f</code> (play it THAT loud, 127 is also the allowed maximum)
 +
 +
The rest of the code basically just sends the MIDI data to the interface and exits. You can change the kind of sound effect with modifying the modbyte of the second instruction (change BX to BP or SI etc.). Changing the volume is more simple, change the byte value of <code>xor al,127</code> to any value between 0 and 127.
 +
 +
==== Procedural MIDI music generation in 64 bytes ====
  
 
With all the above you should now be able to follow the next snippet [http://www.pouet.net/prod.php?which=66313 Descent OST], a small framework for procedural MIDI sound generation in 64 bytes :
 
With all the above you should now be able to follow the next snippet [http://www.pouet.net/prod.php?which=66313 Descent OST], a small framework for procedural MIDI sound generation in 64 bytes :
Line 371: Line 455:
 
iret</syntaxhighlight>
 
iret</syntaxhighlight>
  
 +
=== PC Speaker ===
 +
 +
Producing sound with PC speakers is incredibly easy. Basically, you set a system timer to a desired frequency, then connect this timer to the speaker. [http://wiki.osdev.org/PC_Speaker The PC Speaker Article] from OSDEV Wiki has the details about it. An example for a tiny intro that uses PC speaker music is [http://www.pouet.net/prod.php?which=67833 SpeaCore]
  
=== PC Speaker ===
+
==== Basic example with melody pattern ====
  
Producing sound with PC speakers is incredibly easy. Basically, you set a system timer to a desired frequency, then connect this timer to the speaker. [http://wiki.osdev.org/PC_Speaker The PC Speaker Article] from OSDEV Wiki has the details about it. A very optimized and dirty variant of producing sound with the speaker is this 12 byte snippet (sound routine from [http://www.pouet.net/prod.php?which=67829 the tiny intro "darkweb"]):
+
A very optimized and dirty variant of producing sound with the speaker is this 12 byte snippet (sound routine from [http://www.pouet.net/prod.php?which=67829 the tiny intro "darkweb"]):
  
<syntaxhighlight lang="nasm">  
+
<syntaxhighlight lang="nasm">hlt ; sync to timer1
hlt ; sync to timer1
 
 
inc bx ; increment our counter
 
inc bx ; increment our counter
 
mov ax,bx ; work with a copy
 
mov ax,bx ; work with a copy
Line 383: Line 469:
 
out 0x42,al ; set new countdown for timer2 (two passes)
 
out 0x42,al ; set new countdown for timer2 (two passes)
 
out 0x61,al ; link timer2 to PC speaker (2 LSBs are 1)
 
out 0x61,al ; link timer2 to PC speaker (2 LSBs are 1)
jmp si ; rinse and repeat
+
jmp si ; rinse and repeat</syntaxhighlight>  
</syntaxhighlight>  
 
  
 
Instead of sending low and high byte of our divisor directly in succession, we do it the "two path" way. That reduces the amount of possible frequencies to 255, which is still good enough for some rough sounds. Linking the timer to the PC speaker might not be obvious : Normally you would read the value of port 0x61, set the two least significant bits to TRUE and write the value again. You can save on all of this, if you just send the "two path" value which you just used for the timer if that value has the two least significant bits already set (''or al,0x4B'' does this). Be aware that port 0x61 does many things apart from just connecting the timer to the speaker. A useful resource for ports in general is the [http://bochs.sourceforge.net/techspec/PORTS.LST Bochs Ports List], for port 0x61 it displays:
 
Instead of sending low and high byte of our divisor directly in succession, we do it the "two path" way. That reduces the amount of possible frequencies to 255, which is still good enough for some rough sounds. Linking the timer to the PC speaker might not be obvious : Normally you would read the value of port 0x61, set the two least significant bits to TRUE and write the value again. You can save on all of this, if you just send the "two path" value which you just used for the timer if that value has the two least significant bits already set (''or al,0x4B'' does this). Be aware that port 0x61 does many things apart from just connecting the timer to the speaker. A useful resource for ports in general is the [http://bochs.sourceforge.net/techspec/PORTS.LST Bochs Ports List], for port 0x61 it displays:
Line 410: Line 495:
  
 
So if you experience strange things with highly optimized pc speaker output, revert to the safe way. The described way works with real hardware and DosBox. Unfortunately, both Orcacle Virtual Box with MsDos 6.22 and Windows XP NTVDM seem not to properly emulate PC speakers (Investigation and citation needed here!)
 
So if you experience strange things with highly optimized pc speaker output, revert to the safe way. The described way works with real hardware and DosBox. Unfortunately, both Orcacle Virtual Box with MsDos 6.22 and Windows XP NTVDM seem not to properly emulate PC speakers (Investigation and citation needed here!)
 +
 +
==== Simple deep sound in 8 bytes ====
  
 
One of the smallest possible PC speaker sound generation might be this 8 byte snippet :
 
One of the smallest possible PC speaker sound generation might be this 8 byte snippet :
  
<syntaxhighlight lang="nasm">  
+
<syntaxhighlight lang="nasm">dec ax ; AX initially 0000h -> AL = 0xFF
dec ax ; AX initially 0000h -> AL = 0xFF
 
 
out 42h,al ; change divisor of timer2 to 0xFFFF
 
out 42h,al ; change divisor of timer2 to 0xFFFF
 
out 42h,al ; resulting in a very low frequency
 
out 42h,al ; resulting in a very low frequency
 
out 61h,al ; 2 LSBs are set, connect timer to speaker
 
out 61h,al ; 2 LSBs are set, connect timer to speaker
ret ; quit
+
ret ; quit</syntaxhighlight>
</syntaxhighlight>  
 
  
An example for a tiny intro that uses PC speaker music is [http://www.pouet.net/prod.php?which=67833 SpeaCore]
+
(Note: This may fail on actual hardware, as there might not be time for the bus to settle between the consecutive <code>out 42h,al</code> statements.)
  
 
=== COVOX output (aka LPT DAC) ===
 
=== COVOX output (aka LPT DAC) ===
Line 428: Line 513:
  
 
This method follows the [http://countercomplex.blogspot.com/2011/10/algorithmic-symphonies-from-one-line-of.html "audio from one line of C code"] style of sound generation.  [http://www.pouet.net/topic.php?which=8357&page=1 A pouet discussion] exists for more background information.
 
This method follows the [http://countercomplex.blogspot.com/2011/10/algorithmic-symphonies-from-one-line-of.html "audio from one line of C code"] style of sound generation.  [http://www.pouet.net/topic.php?which=8357&page=1 A pouet discussion] exists for more background information.
 +
 +
Here are some detailed instructions on how a very small bytebeat player can be realized: [[MUZAK58]]
 +
 +
=== Advanced PC Speaker and COVOX sound via interrupt ===
 +
 +
For a more advanced use of PC Speaker or COVOX sound output for tiny intros,
 +
also regarding a specific timing to a desired sample frequency playback, the use of an interrupt
 +
timer is recommended. To illustrate this we take a so called bytebeat and make it into a workable
 +
code example for PC Speaker and COVOX.
 +
 +
The major difference between the two is that COVOX has the benefit of a precision of 8 bits and PC Speaker usually only 6 bits.
 +
Furthermore the setup/access is different as shown in the sections before. Regarding size of the code and quality of the
 +
sound COVOX is preferable.
 +
 +
The bytebeat we are using as an example can be seen and heard [http://wurstcaptures.untergrund.net/music/?oneliner=((t%264096)%3F((t*(t%5Et%25255)%7C(t%3E%3E4))%3E%3E1)%3A(t%3E%3E3)%7C((t%268192)%3Ft%20%3C%3C2%3At))&rate=11025 here.] It's based on JavaScript syntax: ((t&4096)?((t*(t^t%255)|(t>>4))>>1):(t>>3)|((t&8192)?t <<2:t))
 +
 +
Bytebeat code like this can be directly ported to assembler by evaluating the single expressions step by step as you can
 +
see in the implementations here. Those examples work within DOSBox and should also run on real hardware with FreeDOS. It doesn't
 +
show any graphical output, it just plays the bytebeat until a key is pressed. Your graphics routine should be placed right after
 +
the 'main' label.
 +
 +
==== PC Speaker variant ====
 +
<syntaxhighlight lang="nasm">org 100h
 +
 +
mov    ax,3508h ;21h, ah=35h get interrupt handler | al=08h interrupt number (PIT timer)
 +
int    21h ;return: es:bx
 +
push  es
 +
push  bx ;backup current interrupt handler
 +
mov    cx,63 + 108*256  ;PIT counter divisor = 108 and speaker enable for init 
 +
mov    bl,90h ;10010000b => on "init" 
 +
;Bit0  =  0 counter 16 Bits set
 +
;Bit3-1 = 000 mode 0 select
 +
;Bit5-4 =  01 read/write counter bits 0-7 only
 +
;Bit7-6 =  10 counter 2 select
 +
 +
mov    dx,irq ;new handler address
 +
call  init
 +
 +
main:
 +
mov    ah,0
 +
int    16h ;ah = 0, int16h => read keypress
 +
 +
pop    dx
 +
pop    ds ;restore handler address at exit
 +
xor    cx,cx ;PIT counter divisor = 0 and speaker disable for exit
 +
mov    bl,0b6h ;bl = 10110110b => at exit
 +
init:
 +
xchg  ax,cx
 +
out    61h,al ;al = 0 or 63 => Bit0 = 1 timer 2 gate to speaker enable,
 +
mov    al,ah ;Bit1 = 1 speaker data enable ...or disable both at al = 0
 +
out    40h,al ;al = 0 or 108 => write PIT counter 0 divisor
 +
salc
 +
out    40h,al ;al = 0 => write PIT counter 0 divisor again = 0 high byte
 +
;=> this results in a frequency for the interrupt call of 11025 Hz.
 +
;as clock is 1,19318181818 MHz => 1,19318181818 MHz / 108 = 11025 Hz
 +
xchg  ax,bx ;al=bl  = 10110110b
 +
out    43h,al ;Bit0  =  0 counter 16 Bits set
 +
;Bit3-1 = 011 mode 3 select, square wave generator
 +
;Bit5-4 =  11 read/write counter bits 0-7 first, then 8-15
 +
;Bit7-6 =  10 counter 2 select
 +
mov    ax,2508h ;21h, ah=25h set interrupt handler | al=08h interrupt number (PIT timer) 
 +
int    21h
 +
retn
 +
 +
;bytebeat: ((t&4096)?((t*(t^t%255)|(t>>4))>>1):(t>>3)|((t&8192)?t<<2:t))
 +
irq:
 +
pusha
 +
mov    bp,255
 +
mov    ax,0            ;ax: t
 +
.counter:
 +
mov    cx,ax
 +
shr    cx,3            ;cx: (t>>3)
 +
test  ax,4096        ;(t&4096)?
 +
jz    .1
 +
mov    bx,ax          ;bx: t
 +
sub    dx,dx          ;dx:ax t
 +
div    bp              ;dx: (t%255)
 +
xor    dx,bx          ;dx: (t^(t%255))
 +
shr    cx,1            ;cx: (t>>4)
 +
xchg  ax,bx          ;ax: t
 +
mul    dx              ;ax: t*(t^(t%255))
 +
or    ax,cx          ;ax: t*(t^(t%255))|(t>>4)
 +
shr    ax,1            ;ax: (t*(t^(t%255))|(t>>4))>>1
 +
jmp    .3
 +
.1:
 +
test  ax,8192        ;(t&8192)?
 +
jz    .2
 +
shl    ax,2            ;ax: (t<<2)
 +
.2:
 +
or    ax,cx          ;ax: ax|(t>>3)
 +
.3:
 +
shr    al,2            ;downscale to 6 bits
 +
jz    .4
 +
out    42h,al ;write 6 Bit data to speaker (PIT counter 2)
 +
.4:
 +
 +
inc    word [bp-255+irq.counter-2]
 +
mov    al,20h ;00100000b
 +
out    20h,al ;Bit 5 = 1 send End Of Interrupt (EOI) signal
 +
popa
 +
iret</syntaxhighlight>
 +
 +
==== COVOX variant ====
 +
<syntaxhighlight lang="nasm">org 100h
 +
 +
mov    ax,3508h ;21h, ah=35h get interrupt handler | al=08h interrupt number (PIT timer)
 +
int    21h ;return: es:bx
 +
push  es
 +
push  bx ;backup current interrupt handler
 +
mov    al,108 ;PIT counter divisor
 +
mov    dx,irq ;new handler address
 +
call  init
 +
 +
main:
 +
mov    ah,0
 +
int    16h ;ah = 0, int16h => read keypress
 +
pop    dx
 +
pop    ds ;restore handler address at exit
 +
salc ;al = 0 at exit
 +
 +
init:
 +
out    40h,al ;al = 0 or 108 => write PIT counter 0 divisor = 108 low byte
 +
salc
 +
out    40h,al ;al = 0 => write PIT counter 0 divisor again = 0 high byte
 +
;=> this results in a frequency for the interrupt call of 11025 Hz.
 +
;as clock is 1,19318181818 MHz => 1,19318181818 MHz / 108 = 11025 Hz
 +
mov    ax,2508h ;21h, ah=25h set interrupt handler | al=08h interrupt number (PIT timer)
 +
int    21h                                                           
 +
retn
 +
 +
;bytebeat: ((t&4096)?((t*(t^t%255)|(t>>4))>>1):(t>>3)|((t&8192)?t<<2:t))
 +
irq:
 +
pusha
 +
mov    bp,255
 +
mov    ax,0 ;ax: t
 +
.counter:
 +
mov    cx,ax
 +
shr    cx,3 ;cx: (t>>3)
 +
test  ax,4096 ;(t&4096)?
 +
jz    .1
 +
mov    bx,ax ;bx: t
 +
sub    dx,dx ;dx: ax t
 +
div    bp ;dx: (t%255)
 +
xor    dx,bx ;dx: (t^(t%255))
 +
shr    cx,1 ;cx: (t>>4)
 +
xchg  ax,bx ;ax: t
 +
mul    dx ;ax: t*(t^(t%255))
 +
or    ax,cx ;ax: t*(t^(t%255))|(t>>4)
 +
shr    ax,1 ;ax: (t*(t^(t%255))|(t>>4))>>1
 +
jmp  .3
 +
.1:
 +
test  ax,8192 ;(t&8192)?
 +
jz    .2
 +
shl    ax,2 ;ax: (t<<2)
 +
.2:
 +
or    ax,cx ;ax: ax|(t>>3)
 +
.3:
 +
mov    dx,0378h ;LPT1 parallel port address
 +
out    dx,al ;write 8 Bit sample data
 +
inc    word[bp-255+irq.counter-2]
 +
mov    al,20h              ;00100000b
 +
out    20h,al ;Bit 5 = 1 send End Of Interrupt (EOI) signal
 +
popa
 +
iret</syntaxhighlight>
 +
 +
==== Further notes on the two variants ====
 +
 +
It's important to set and know the sample frequency you want. E.g. if you want to port the frequency from 11025 Hz to e.g. 18939 Hz
 +
for the same sound you need to change the following code parts e.g. for COVOX. Pay attention that also the bytebeat parameters where adjusted to fit more or less the double frequency:
 +
 +
<syntaxhighlight lang="nasm">;...snip...
 +
mov    al,63 ;PIT counter divisor instead of 108 => 1,19318181818 MHz / 63 = 18939 Hz
 +
;...snip...
 +
;bytebeat: ((t&8192)?((t*(t^t%255)|(t>>5))>>1):(t>>4)|((t&16192)?t<<2:t))
 +
;...snip...
 +
.counter:
 +
mov    cx,ax
 +
shr    cx,4            ;cx: (t>>4)
 +
test  ax,8192        ;(t&8192)?
 +
jz    .1             
 +
mov    bx,ax          ;bx: t
 +
sub    dx,dx          ;dx:ax t
 +
div    bp              ;dx: (t%255)
 +
xor    dx,bx          ;dx: (t^(t%255))
 +
shr    cx,1            ;cx: (t>>5)
 +
xchg  ax,bx          ;ax: t
 +
mul    dx              ;ax: t*(t^(t%255))
 +
or    ax,cx          ;ax: t*(t^(t%255))|(t>>4)
 +
shr    ax,1            ;ax: (t*(t^(t%255))|(t>>4))>>1
 +
jmp    .3
 +
.1:
 +
test  ax,16384        ;(t&16384)?
 +
jz    .2
 +
shl    ax,2            ;ax: (t<<2)
 +
.2:
 +
or    ax,cx          ;ax: ax|(t>>3)
 +
;...snip...</syntaxhighlight>
 +
 +
The routine here uses a frequency of 18939 Hz. So regarding the 16 bit timer used here this would result in a length of a maximum
 +
of 65535/18939 = 3.46 seconds before everything loops. Usually that would be enough for some drumbeat, but not for a complete song
 +
or melody. In that case you have to use another register as a 'top' timer to trigger your changes for the sound.
 +
 +
One more thing to check and maybe modify if you hear an imperfect sound is the timing regarding when a sample value is actually "played".
 +
Preferable you would want to play each sample value at exactly the same time. But as your sample generation routine might need a different
 +
amount of CPU cycles each time the interrupt is called this can differ all the time, when code is used like above.
 +
 +
One solution for this is to play the sample calculated from the last interrupt call right away when the interrupt is called the next time.
 +
You can do that via self-modifying code like shown here. It takes 5 Bytes more:
 +
<syntaxhighlight lang="nasm">;...snip...
 +
irq:
 +
pusha
 +
mov    dx,0378h
 +
mov    al,0         
 +
.sample:
 +
out    dx,al
 +
mov    bp,255
 +
mov    ax,0            ; ax: t
 +
.counter:
 +
;...snip...
 +
inc    word [bp-255+irq.counter-2]
 +
mov    byte [bp-255+irq.sample-1],al
 +
mov    al,20h
 +
out    20h,al
 +
;...snip...
 +
</syntaxhighlight>
 +
 +
Some remarks: All the code above is not optimized to the max regarding size due to educational reasons.
 +
Depending on your code and dependency of the interrupt subroutine you can do several size optimizations.
 +
 +
Instead of using the interrupt <code>08</code> theoretically the user defined interrupt number <code>1c</code> could be
 +
used also, but by now this seems to work only with DOSBox but not on a real system with FreeDOS. Further tests
 +
are needed to see what is the problem here. The use of interrupt <code>1c</code> would save 4 bytes as the following
 +
code lines to finalize the interrupt could be omitted in the examples above:
 +
<syntaxhighlight lang="nasm">;...snip...
 +
mov al,20h
 +
out 20h,al
 +
;...snip...
 +
</syntaxhighlight>
 +
 +
What can be done with bytebeats ? Basically everything :-) You can look at this [http://www.pouet.net/topic.php?which=8357&page=1 thread] on pouet to get an idea and check the first PC Speaker intro with bytebeat: [http://www.pouet.net/prod.php?which=71766 TCTRONIC by TomCat/Abaddon].
 +
 +
Some basic waveforms can be encoded like this:
 +
<syntaxhighlight lang="nasm">;sawtooth wave
 +
t & 127
 +
</syntaxhighlight>
 +
<syntaxhighlight lang="nasm">;square wave
 +
t & 128
 +
</syntaxhighlight>
 +
<syntaxhighlight lang="nasm">;triangle wave
 +
t ^ ((t & 128) * 127)
 +
</syntaxhighlight>
 +
 +
How would you go from here to create a specific tone, e.g. an "A4", which would have a frequency of 440 Hz (Check this
 +
[https://en.wikipedia.org/wiki/Piano_key_frequencies link] to get a list for the frequencies of the notes) ?
 +
For that you have to relate the set frequency of the interrupt to the tone frequency and the length of one wave of your
 +
wave generator. If we have a sawtooth of <code>t&127</code> at 22050 Hz this would result in a tone of 22050/128 = 172.3 Hz.
 +
To reach 440 Hz we can simply stretch/multiply the timer by 440/172.3 = 2.554 to hear the desired note:
 +
<syntaxhighlight lang="nasm">
 +
(t*2.554) & 127
 +
</syntaxhighlight>
 +
 +
[https://naivesound.comglitch Glitch] is another online tool to create bytebeat sounds with some enhanced syntax. There are several good reads on this tool and the theorise behind it and byte beat [https://medium.com/@naive_sound here]
 +
 +
Of course there are endless possibilities and the whole world of real time sound calculation/generation is open to you. Here are some tiny intros which use this techniques already: [http://www.pouet.net/prod.php?which=77741 Plasmifier cover 256B], [http://www.pouet.net/prod.php?which=70599 2(56)unlimited], [http://www.pouet.net/prod.php?which=78505 somehow].

Latest revision as of 12:43, 15 February 2024

Outputting to the screen

First, be aware of the MSDOS memory layout

Outputting in Textmode (80x25)

Hello World / High Level function

Here's an obligatory "Hello World" program in text mode, using a "high level" MS-DOS function. With a small optimization already included (using XCHG BP,AX instead of MOV AH,09h), this snippet is 20 bytes in size.

Hello World!
 
org 100h			; we start at CS:100h
xchg 	bp,ax		; already a trick, puts 09h into AH
mov		dx,text		; DX expects the adress of a $ terminated string
int 	21h			; call the DOS function (AH = 09h)
ret					; quit
text:
db 'Hello World!$'


Of course, this gets shorter with each byte you remove from the text itself. Now let's look into arbitrary screen access. Right after the start of your program you are in mode 3, that is 80x25 in 16 colors. See the Video Modes List
draw char example
So, to show something on the screen, you would need to set a segment register to 0xB800, then write values into this segment.

Low level access

The following three snippets showcase how to draw a red smiley in three different ways. All example snippets are meant to be standalone programs, starting with the first instruction and nothing before it. The target coordinate (40,12) is about the middle of the screen. We need a multiplier 2 since one char needs two bytes in memory (char and color is a byte each). The high byte 0x04 means red (4) on black (0) while the 0x01 is the first ASCII char - a smiley.

push 0xb800
pop ds
mov bx,(80*12+40)*2
mov ax, 0x0401
mov [bx],ax
ret
push 0xb800
pop es
mov di,(80*12+40)*2
mov ax, 0x0401
stosw
ret
push ss
push 0xb800
pop ss
mov sp,(80*12+40)*2
mov ax, 0x0401
push ax
pop ss
int 0x20

You might notice that the push <word> + pop seg_reg combination is always the same and occupies four bytes alltogether. If correct alignment is not important to you and you really just want any pointer to the screen, there is another way to get a valid one:

 
les bx,[si]
nop
stosb

That's also four bytes, but it already has the stosb opcode (for putting something onto the screen) integrated and even one slot free for another one-byte-instruction. It works because SI initially points to the start of our code, and stosb has the hexadecimal representation of 0AAh. After the first command, the segment register ES contains the value 0AA90h. If you repeatedly write something to the screen with stosb you will eventually reach the 0B800h segment and chars will appear on the screen. With a careful selection of the free one-byte-opcode you can also reintroduce some alignment. This works also with the stosw opcode 0ABh.

Alternative high level functions

Besides the direct way of accessing memory there are also other ways of bringing char to the screen (f.e)

Outputting in mode 13h (320x200)

Basic pixel output

The videomemory for mode 13h is located at segment 0xA000, so you need to assign this value to a segment register. Also, after the start of your program you are normally still in textmode, so you need to switch to the videomode. The following snippet does both:

mov al,0x13 
int 0x10     ; AH = 0 means : set video mode to AL = 0x13 (320 x 200 pixels in 256 colors)
push 0xA000  ; put value on the stack
pop es       ; pop the top stack value into segment register ES

You're free to use any of the segment register / opcode combinations to write to the screen

  • ES (stosb)
  • DS (mov)
  • SS (push)

Let's add some code that actually draws something on the screen, the following program occupies 23 bytes and draws a fullscreen XOR texture

mode13h-example-xor
mov al,0x13
int 0x10
push 0xa000
pop es
X: cwd			; "clear" DX (if AH < 0x7F)
mov ax,di		; get screen position into AX
mov bx,320		; get screen width into BX
div bx			; divide, to get row and column
xor ax,dx		; the famous XOR pattern
and al,32+8		; a more interesting variation of it
stosb			; finally, draw to the screen
jmp short X		; rinse and repeat


Note that there is a different way of preparing the segment register, instead of :

push 0xa000
pop es

you can also do :

mov ah,0xA0
mov es,ax

both variations occupy 4 bytes, but the latter is executable on processor architectures where push <word> is not available.

Alternative way of pixel plotting and optimization

Now let's optimize on the snippet. First, we can adapt the "LES" trick from the textmode section. We just exchange

push 0xa000
pop es

with:

les bx,[bx]

to save two bytes. This works because BX is 0x0000 at start and thus, accesses the region before our code, which is called Program Segment Prefix. The two bytes that are put into the segment register ES are bytes 2 and 3 = "Segment of the first byte beyond the memory allocated to the program" which is usually 0x9FFF. That is just off by one to our desired 0xA000. Unfortunately that means a 16 pixel offset, so if screen alignment means something to you, you can't use this optimization. Also, said two bytes are not always 0x9FFF; for example, if resident programs are above the "memory allocated to the program" (FreeDos), their content is overwritten if we take their base as our video memory base.

Second, we can use an alternative way of putting pixels to the screen, subfunction AH = 0x0C of int 0x10. Also, instead of constructing row and column from the screen pointer, we can use some interesting properties of the screenwidth regarding logical operations. This results in the following 16 byte program:

cwd             ; "clear" DX for perfect alignment
mov al,0x13
X: int 0x10		; set video mode AND draw pixel
inc cx			; increment column
mov ax,cx		; get column in AH
xor al,ah		; the famous XOR pattern
mov ah,0x0C		; set subfunction "set pixel" for int 0x10
and al,32+8		; a more interesting variation of it
jmp short X		; rinse and repeat

The first optimization is the double usage of the same "int 0x10" as setting the videomode and drawing the pixel. The subfunction AH = 0x0C expects row and column in DX and CX. Since the screenwidth is 320, which is 5 * 64, we can ignore the row and just works with the column, if we use logical operations and just use bit 0-6 of the result. The subfunction AH = 0x0C allows for unbounded column values in CX (up to 65535) and correctly "wraps" it internally without an error.

The major drawback of the "subfunction AH = 0x0C" approach is performance loss. While DosBox and many emulators perform just fine, real hardware will draw much much slower based on the Video BIOS.

Basic animation and user interaction

Now let's add the convenient check for the ESC key and also add a simple animation. The DI register is used as frame counter and incremented after the pixel counter CX ran through all 65536 values via LOOP. This frame counter is then added to the column. The resulting program is now 25 bytes in size :

Xor anim example.gif
cwd             	; "clear" DX for perfect alignment
mov 	al,0x13
X: 		int 0x10	; set video mode AND draw pixel
mov 	ax,cx		; get column in AH
add		ax,di		; offset by framecounter
xor 	al,ah		; the famous XOR pattern
and 	al,32+8		; a more interesting variation of it
mov 	ah,0x0C		; set subfunction "set pixel" for int 0x10
loop 	X			; loop 65536 times
inc 	di			; increment framecounter
in 		al,0x60		; check keyboard ...
dec 	al			; ... for ESC
jnz 	X			; rinse and repeat
ret					; quit program

( ↑ This example is the blueprint in the FPU Basics Section.)

Using Custom Colors

Shades of Hue within the Default VGA palette

You might have noticed there is a bit of structure to the default VGA Palette, which you can exploit for some interesting results. Looking at the pallete there is a rainbow of different hue values that start at index 32 that are repeated in a slightly different luma seperated by 72 indices. If you are okay with limiting the amount of shades you need, you can get a small colorramp for all kinds of hue values by simply calculating your color-index like this:

color=((shade%3)*72)+32+huevalue

For an example of how this looks for all kinds of hue values, see Popcast by Hellmood/Desire.

Setting a Custom Palette

Sometimes, when the Default VGA Palette doesn't quite match the look you are looking for, it can be useful to set your own palette using the VGA registers, the basic setup loop looks like this:

palloop:
mov ax,cx
mov dx,0x3c8
out dx,al    ; select palette color
inc dx
out dx,al    ; write red value (0..63)
out dx,al    ; write green value (0..63)
out dx,al    ; write blue value (0..63)
loop palloop

The above code sets a simple grayscale palette, assumes CX Register to be at 0) and is compatible with all DOS platforms. In some cases you can ommit the mov dx,0x3c8, out dx,al, inc dx and directly access the data register by just using mov dx,0x3c9 instead.

To get more interesting colors than just grayscale, you can alter the value of the AL register in between setting the red, green and blue values. For example by shifting, adding, substracting or performing logical operations. Just get creative and check if the result is sufficient for your usecase.

TomCat will show the most common color palettes grouped by functionality. Check his article: Colors (in tiny intros)

Producing sound

MIDI notes

Creating sounds with MIDI requires a bit more preparation, but once you're familiar with it, it's even simpler than PC Speaker sound, because you basically don't have to create the sound, you just have to trigger it. For the start, you have to know, that there is a lot of different instruments and a defined way of communication. Imagine the MIDI interface like a keyboard, you tell it which button/key you want to press, which knob to twist, and sometimes, how hard. Per default, the active instrument is the Acoustic Grand Piano.

Single piano note

Let's start of with a simple example, playing a single note on the piano :

mov 	al,	3Fh		;	set UART mode - command
mov 	dx,	331h	;	MIDI Control Port
out 	dx,	al 		;	send !
dec 	dx			;	MIDI Data Port ( = 330h )
mov 	al,	90h		;	send note on channel ZERO - command
out 	dx,	al		;	send !
mov 	al,	56h		;	data byte 1 : KEY = 56h
out 	dx,	al		; 	send !
mov 	al,	67h		;	data byte 2 : VOLUME = 67h
out 	dx,	al		;	send !
ret					; 	quit


In short: you turn your keyboard on (switching to UART mode), then press a KEY with a certain VOLUME on channel ZERO, then exit. Besides switching to UART mode, all this communication uses the port 330h. This example will work on DosBox but not on Windows XP NTVDM: for still unclear reasons, the NTVDM emulation delays the note until it receives a second one. The simplest way of at least hearing something is to repeatedly play notes, like in the following example :

Repeated piano notes

mov 	al,	3Fh			;	set UART mode - command
mov 	dx,	331h		;	MIDI Control Port
out 	dx,	al 			;	send !
dec 	dx				;	MIDI Data Port ( = 330h )
main:
mov 	al,	90h			;	send note on channel 0 - command
out 	dx,	al			;	send !
mov 	al,	56h			;	data byte 1 : KEY = 56h
out 	dx,	al			; 	send !
mov 	al,	67h			;	data byte 2 : VOLUME = 67h
out 	dx,	al			;	send !
mov 	cl,5
_wait:
hlt
loop _wait
in 		al,	0x60		;	check for ESC
dec 	al				;
jnz 	main			;	no? repeat
ret						;	quit


↑ This is the previous example, enriched with synchronizing against the timer and checking for the ESC key. It works on both DosBox and Windows XP NTVDM and plays a note on the Piano repeatedly.

Repeated notes of other instruments

While hitting one key repeatedly is not really interesting in general, it can produce decent results when doing it with the right instrument activated, like it was done with the "French Horn" in Timelord (by Baudsurfer). Apart from just changing the instrument, let's also optimize a little bit on the size:

org 100h
start:	
mov 	si,data			; init pointer for outsb
mov 	dx,330h			; change to data port
mov 	cl,5			; play our music data
rep 	outsb			; (see below at "data" label)
inc 	dx				; switch to control port
outsb					; change to mode "UART"
_wait:
mov 	al,[fs:0x46c]	; read timer value
cmp	 	al,bl			; wait until...
jz 		_wait			; ...timer value changed
xchg 	bx,ax			; save old timer value
in 		al,0x60			; check for ...
dec 	al				; ... ESC key
jnz 	start			; otherwise : repeat 
dec 	dx				; switch to data port again
outsb					; stop all ... 
outsb					; ... notes played ...
outsb					; ... on channel 3
data: 	
db		0c3h			; change instrument on channel 3
						; (is also "RET" for program quit)
db		60				; to "French Horn"
db		93h				; play note on channel 3
db		35				; deep "b" = note number 35
db		127				; play with volume = 127
db		3fh				; change mode to "UART"
db		0b3h			; control change on channel 3
db		123				; Channel Mode Message "All Notes Off"


↑ This is the previous example, with changed instrument, structuring the MIDI data into a data section, optimizing the output with the usage of outsb instead of out dx,al, and finalizing the program with a special command to turn All Notes Off. This is necessary for all instruments which don't stop by themself. In all the previous examples, we sent the "NOTE ON" command (9Xh), but not the according "NOTE OFF" command (8Xh). Also, the note is now played on channel 03h, since the commandbyte for changing an instrument on channel 3 is 0C3h which is also RET and can be reused. If this looks complicated at first, always remember, it's just sending defined commands to a single port.

The drum channel

Now, that you're aware that there are different channels (overall: 16) to play notes on, how would you like a channel 09h specifically for 'Drums' ? Ten different drumsets with dozens of samples are available out of the box. Per default, the "Standard Kit" is active. The following example plays a track of drum notes repeatedly, while further optimizing for size :

org 100h
aas						; 3fh = "set UART mode"
cwd						; 99h = "play note on drum channel" command
db 		42,38,42,35		; the drum notes (kick, snare, hihat)
mov 	dx,0x331		; MIDI Control Port
outsb					; send "set UART mode"
dec 	dx				; switch to MIDI data port
outsb					; send "play note on drum channel" command

main:
mov		al,[fs:0x46c]	; read timer
test	al,3
jnz 	main			; skip 3 values
inc 	byte [fs:0x46c] ; inc manually to prevent retrigger

inc 	bx				; increment note counter
and 	bl,3			; truncate to 4 notes
mov 	al,[bx+si]		; read the drumnote (see above)

out 	dx,al			; send the drum
mov 	al,127			; set volume to maximum
out 	dx,al			; send volume

in 		al,0x60			; check for ESC
dec 	al				;
jnz 	main			; no? repeat
ret						; otherwise quit


In contrast to the previous example, the data section is now at the start. That means, it's executed as code! This is dangerous of course, but also saves bytes on assigning the DATA offset to SI. Once outsb incremented SI initially two times, it is fixed and further reading from the drumdata is done with [BX+SI]. Unless you know exactly what you are doing, don't use that kind of "executing data" optimization!". In this special case AAS and CWD do no harm and the drum notes 42,38,42,35 are carefully crafted and arranged to resemble the instruction SUB AH,[232Ah] which does no harm either.

Further Midi instrument tuning by controllers and pitch

If you are familiar with hardware synthesizers you'll definitely remember the typical pitch bend or modulation wheels beside the keys, usually two of them. Those are commonly assigned to a vibrato/tremolo effect and a +/-pitch to tune the played note. You can also use those functions in your intro code to affect the currently played midi instrument note.

To access these parameters the coding follows the usual midi programming like you can see here:

mov al,10110000b	;Controller command on Midi channel 0
out dx,al
mov al,00000001b	;0...127 data byte 1 => '1' is the code for the modulation wheel typically assigned to vibrato/tremolo
out dx,al
mov al,01111111b	;0...127 data byte 2 => e.g. '01111111' => Maximum vibrato level
out dx,al

In that example the maximum vibrato level is assigned to any instrument played on midi channel 0. This effect was used in the Crystal Comet 128 Byte intro by Kuemmel.

For pitch bend the code would be like:

mov al,11100000b	;Pitch bend command on Midi channel 0
out dx,al
mov al,0lllllllb	;0...127 data byte 1 => LSB value for pitch
out dx,al
mov al,0mmmmmmmb	;0...127 data byte 2 => MSB value for pitch
out dx,al

Pitch bend uses a 14 Bit value. The center is at 0x2000 (meaning no pitch). Numbers from 0x2000 up to 0x3fff increase the pitch and from 0x2000 down to 0x0000 will decrease it. The range of 0x2000 should refer to 2 semitones. So you can bend +/- 2 semitones. Please be aware that those values must be converted to two 7 Bit values. Therefore e.g. 0x3000 would be 0x60 (MSB) and 0x00 (LSB).

Of course there are more midi controller options, e.g. you could change the stereo pan level. As a reference and for more detailed information please have a look at this Midi tutorial page.

Creating basic sound effects in 16 bytes

In the MIDI repertoire, there are already some sound effects available. With the "data execution" optimization above, let's fire a gunshot in 16 bytes :

aas
les 	di,[bx-0x6C]
xor 	al,127
mov 	dx,0x331
outsb
dec 	dx
mov 	cl,5
rep 	outsb
ret

The first three instructions don't do anything (they do, but we don't care), it's just MIDI data.

  • aas

the command for switching to "UART" mode, for sending to port 0x331

  • les di,[bx-0x6C]

assembles to 0xc4 (change instrument on channel 4), 0x7F (change it to "Gunshot"), 0x94 (play note on channel 4)

  • xor al,127

assembles to 0x34 (play THIS note), 0x7f (play it THAT loud, 127 is also the allowed maximum)

The rest of the code basically just sends the MIDI data to the interface and exits. You can change the kind of sound effect with modifying the modbyte of the second instruction (change BX to BP or SI etc.). Changing the volume is more simple, change the byte value of xor al,127 to any value between 0 and 127.

Procedural MIDI music generation in 64 bytes

With all the above you should now be able to follow the next snippet Descent OST, a small framework for procedural MIDI sound generation in 64 bytes :

; "Descent OST", a 62 byte MIDI music player for MSDOS
; created by HellMood/DESiRE (C)2015
; this is the extracted music routine used in "Descent" 
; it is a procedural MIDI algorithm which sticks a
; subroutine to the DOS timer (interrupt 0x1C)
; the registered routine is called ~18.2 times per second

; developed for use with "NASM", 
; see http://sourceforge.net/projects/nasm/files/

%define		 	rhythmPattern			0b11
; with "rhythmPattern", you define how often a note is played
; generally, higher values and values containing many "ones"
; in binary representation, will result in faster play
; for example "0b11" will play every 4th note
%define 		baseInstrument			9
; defines the number of the first instrument used.
; see http://www.midi.org/techspecs/gm1sound.php for a full list
; keep in mind, that there are only a few instrument blocks
; whose sounds stop after a while. You won't get good results
; from strings etc. just a mess of overlayed sounds
%define 		numInstruments			7
; defines how many instrument are used. keep in mind, that "rhythm-
; Pattern" has influence on the picked instrument. the instruments
; from 9 to 9+7 are called "chromatic percussion"
%define 		noteStep 				5
; defines the basic difference from on note to the next. recommended
; values here are (mainly) 3,4 and 5 for music theoretic reasons
; but feel free to play around =)
%define 		noteRange				12
; after adding the noteStep, the note value is "mod"ded with
; the "noteRange". 12 means octave, which results in very harmonic
; scales
%define 		noteSpread				3
; the third step spreads the notes over the tonal spectrum, you may
; want to keep "noteSpread" * "noteRange" round about 30-60.
%define 		baseNote				40
; the general tone height of everything. some instruments don't play
; arbitrary deep notes correctly, and too high notes cause ear bleeding
; adjust with care ;)

; WARNING : after exiting the program, the timer interrupt is still active
; i strongly recommend to reboot or restart DOSBOX!

; ADVISE : Yes, there are music- and math-related things going on here
; if you're not into music theory, cycle of fifth, and the like, it maybe
; better to just play around with the parameters, rather then understanding them
; just change stuff slowly, and eventually you will get "there"
; wherever that is ;)

org 0x100
xchg cx,ax					; set our second counter to zero
mov dx,music			
mov ax,0x251C				; mode "0x25" , "0x1C" = change address of timer interrupt
int 0x21					; see http://mprolab.teipir.gr/vivlio80X86/dosints.pdf
S:
in ax,0x60					; wait for "ESC" press, then exit
dec al						; music plays on anyway, this is just for
jnz S						; keeping the music exactly as in "Descent"
ret							; return to prompt
music:
inc bx						; increment our first counter (starts at zero)
test bl,byte rhythmPattern	; play a note every 4th time tick
jnz nomusic					; otherwise do nothing
mov dx,0x331
mov al,0x3F
out dx,al
dec dx
mov al,0xC0					; change instrument on channel 0...
out dx,al
mov ax,bx					
aam byte numInstruments
add al,byte baseInstrument	; ...to this instrument
out dx,al
mov al,0x90					; play note on channel 0 ...
out dx,al
add cl,byte noteStep			 
mov al,cl
aam byte noteRange
imul ax,noteSpread
add al,baseNote				; ... play THIS note
out dx,al
neg al						; (play deeper notes louder = add bass)
add al,127+39				; ... play it THAT loud
out dx,al
nomusic:
iret

PC Speaker

Producing sound with PC speakers is incredibly easy. Basically, you set a system timer to a desired frequency, then connect this timer to the speaker. The PC Speaker Article from OSDEV Wiki has the details about it. An example for a tiny intro that uses PC speaker music is SpeaCore

Basic example with melody pattern

A very optimized and dirty variant of producing sound with the speaker is this 12 byte snippet (sound routine from the tiny intro "darkweb"):

hlt				; sync to timer1
inc bx			; increment our counter
mov ax,bx		; work with a copy
or al,0x4B      ; melody pattern + 2 LSB for speaker link
out 0x42,al		; set new countdown for timer2 (two passes)
out 0x61,al		; link timer2 to PC speaker (2 LSBs are 1)
jmp si			; rinse and repeat

Instead of sending low and high byte of our divisor directly in succession, we do it the "two path" way. That reduces the amount of possible frequencies to 255, which is still good enough for some rough sounds. Linking the timer to the PC speaker might not be obvious : Normally you would read the value of port 0x61, set the two least significant bits to TRUE and write the value again. You can save on all of this, if you just send the "two path" value which you just used for the timer if that value has the two least significant bits already set (or al,0x4B does this). Be aware that port 0x61 does many things apart from just connecting the timer to the speaker. A useful resource for ports in general is the Bochs Ports List, for port 0x61 it displays:


0061 w KB controller port B (ISA, EISA) (PS/2 port A is at 0092)

system control port for compatibility with 8255

bit 7 (1= IRQ 0 reset )

bit 6-4 reserved

bit 3 = 1 channel check enable

bit 2 = 1 parity check enable

bit 1 = 1 speaker data enable

bit 0 = 1 timer 2 gate to speaker enable


So if you experience strange things with highly optimized pc speaker output, revert to the safe way. The described way works with real hardware and DosBox. Unfortunately, both Orcacle Virtual Box with MsDos 6.22 and Windows XP NTVDM seem not to properly emulate PC speakers (Investigation and citation needed here!)

Simple deep sound in 8 bytes

One of the smallest possible PC speaker sound generation might be this 8 byte snippet :

dec ax			; AX initially 0000h -> AL = 0xFF
out 42h,al		; change divisor of timer2 to 0xFFFF
out 42h,al		; resulting in a very low frequency
out 61h,al		; 2 LSBs are set, connect timer to speaker
ret				; quit

(Note: This may fail on actual hardware, as there might not be time for the bus to settle between the consecutive out 42h,al statements.)

COVOX output (aka LPT DAC)

It is possible to output to an LPT-connected DAC ("COVOX") in a tinyprog. A proof-of-concept example is Express Train 125 which uses COVOX for sound generation.

This method follows the "audio from one line of C code" style of sound generation. A pouet discussion exists for more background information.

Here are some detailed instructions on how a very small bytebeat player can be realized: MUZAK58

Advanced PC Speaker and COVOX sound via interrupt

For a more advanced use of PC Speaker or COVOX sound output for tiny intros, also regarding a specific timing to a desired sample frequency playback, the use of an interrupt timer is recommended. To illustrate this we take a so called bytebeat and make it into a workable code example for PC Speaker and COVOX.

The major difference between the two is that COVOX has the benefit of a precision of 8 bits and PC Speaker usually only 6 bits. Furthermore the setup/access is different as shown in the sections before. Regarding size of the code and quality of the sound COVOX is preferable.

The bytebeat we are using as an example can be seen and heard here. It's based on JavaScript syntax: ((t&4096)?((t*(t^t%255)|(t>>4))>>1):(t>>3)|((t&8192)?t <<2:t))

Bytebeat code like this can be directly ported to assembler by evaluating the single expressions step by step as you can see in the implementations here. Those examples work within DOSBox and should also run on real hardware with FreeDOS. It doesn't show any graphical output, it just plays the bytebeat until a key is pressed. Your graphics routine should be placed right after the 'main' label.

PC Speaker variant

org 100h

 mov    ax,3508h			;21h, ah=35h get interrupt handler | al=08h interrupt number (PIT timer)
 int    21h					;return: es:bx
 push   es
 push   bx					;backup current interrupt handler
 mov    cx,63 + 108*256  	;PIT counter divisor = 108 and speaker enable for init   
 mov    bl,90h				;10010000b => on "init"  	
							;Bit0   =   0 counter 16 Bits set
							;Bit3-1 = 000 mode 0 select
							;Bit5-4 =  01 read/write counter bits 0-7 only
							;Bit7-6 =  10 counter 2 select

 mov    dx,irq				;new handler address
 call   init

main:
 mov    ah,0
 int    16h					;ah = 0, int16h => read keypress

 pop    dx
 pop    ds					;restore handler address at exit
 xor    cx,cx				;PIT counter divisor = 0 and speaker disable for exit 
 mov    bl,0b6h				;bl = 10110110b => at exit
init:
 xchg   ax,cx 
 out    61h,al				;al = 0 or 63 => Bit0 = 1 timer 2 gate to speaker enable,
 mov    al,ah				;Bit1 = 1 speaker data enable ...or disable both at al = 0
 out    40h,al				;al = 0 or 108 => write PIT counter 0 divisor
 salc
 out    40h,al				;al = 0 => write PIT counter 0 divisor again = 0 high byte
							;=> this results in a frequency for the interrupt call of 11025 Hz.
							;as clock is 1,19318181818 MHz => 1,19318181818 MHz / 108 = 11025 Hz
 xchg   ax,bx				;al=bl  = 10110110b
 out    43h,al				;Bit0   =   0 counter 16 Bits set
							;Bit3-1 = 011 mode 3 select, square wave generator
							;Bit5-4 =  11 read/write counter bits 0-7 first, then 8-15
							;Bit7-6 =  10 counter 2 select
 mov    ax,2508h			;21h, ah=25h set interrupt handler | al=08h interrupt number (PIT timer)  
 int    21h
retn

;bytebeat: ((t&4096)?((t*(t^t%255)|(t>>4))>>1):(t>>3)|((t&8192)?t<<2:t))
irq:
 pusha
 mov    bp,255
 mov    ax,0            	;ax: t
.counter:
 mov    cx,ax
 shr    cx,3            	;cx: (t>>3)
 test   ax,4096         	;(t&4096)?
 jz     .1
 mov    bx,ax           	;bx: t
 sub    dx,dx           	;dx:ax t
 div    bp              	;dx: (t%255)
 xor    dx,bx           	;dx: (t^(t%255))
 shr    cx,1            	;cx: (t>>4)
 xchg   ax,bx           	;ax: t
 mul    dx              	;ax: t*(t^(t%255))
 or     ax,cx           	;ax: t*(t^(t%255))|(t>>4)
 shr    ax,1            	;ax: (t*(t^(t%255))|(t>>4))>>1
 jmp    .3
.1:
 test   ax,8192         	;(t&8192)?
 jz     .2
 shl    ax,2            	;ax: (t<<2)
.2:
 or     ax,cx           	;ax: ax|(t>>3)
.3:
 shr    al,2            	;downscale to 6 bits
 jz     .4
 out    42h,al				;write 6 Bit data to speaker (PIT counter 2)
.4:

 inc    word [bp-255+irq.counter-2]
 mov    al,20h				;00100000b
 out    20h,al				;Bit 5 = 1 send End Of Interrupt (EOI) signal
 popa
iret

COVOX variant

org 100h

 mov    ax,3508h			;21h, ah=35h get interrupt handler | al=08h interrupt number (PIT timer)
 int    21h					;return: es:bx
 push   es
 push   bx					;backup current interrupt handler
 mov    al,108				;PIT counter divisor
 mov    dx,irq				;new handler address
 call   init

main:
 mov    ah,0
 int    16h					;ah = 0, int16h => read keypress
 pop    dx
 pop    ds					;restore handler address at exit
 salc						;al = 0 at exit

init:
 out    40h,al				;al = 0 or 108 => write PIT counter 0 divisor = 108 low byte
 salc
 out    40h,al				;al = 0 => write PIT counter 0 divisor again = 0 high byte
							;=> this results in a frequency for the interrupt call of 11025 Hz.
							;as clock is 1,19318181818 MHz => 1,19318181818 MHz / 108 = 11025 Hz
 mov    ax,2508h			;21h, ah=25h set interrupt handler | al=08h interrupt number (PIT timer)
 int    21h                                                             
 retn

;bytebeat: ((t&4096)?((t*(t^t%255)|(t>>4))>>1):(t>>3)|((t&8192)?t<<2:t))
irq:
 pusha
 mov    bp,255
 mov    ax,0				;ax: t
.counter:
 mov    cx,ax
 shr    cx,3				;cx: (t>>3)
 test   ax,4096				;(t&4096)?
 jz     .1
 mov    bx,ax				;bx: t
 sub    dx,dx				;dx: ax t
 div    bp					;dx: (t%255)
 xor    dx,bx				;dx: (t^(t%255))
 shr    cx,1				;cx: (t>>4)
 xchg   ax,bx				;ax: t
 mul    dx					;ax: t*(t^(t%255))
 or     ax,cx				;ax: t*(t^(t%255))|(t>>4)
 shr    ax,1				;ax: (t*(t^(t%255))|(t>>4))>>1
 jmp   .3
.1:
 test   ax,8192				;(t&8192)?
 jz    .2
 shl    ax,2				;ax: (t<<2)
.2:
 or     ax,cx				;ax: ax|(t>>3)
.3:
 mov    dx,0378h			;LPT1 parallel port address
 out    dx,al				;write 8 Bit sample data
 inc    word[bp-255+irq.counter-2]
 mov    al,20h              ;00100000b
 out    20h,al				;Bit 5 = 1 send End Of Interrupt (EOI) signal
 popa
 iret

Further notes on the two variants

It's important to set and know the sample frequency you want. E.g. if you want to port the frequency from 11025 Hz to e.g. 18939 Hz for the same sound you need to change the following code parts e.g. for COVOX. Pay attention that also the bytebeat parameters where adjusted to fit more or less the double frequency:

;...snip...
 mov    al,63				;PIT counter divisor instead of 108 => 1,19318181818 MHz / 63 = 18939 Hz
;...snip...
;bytebeat: ((t&8192)?((t*(t^t%255)|(t>>5))>>1):(t>>4)|((t&16192)?t<<2:t))
;...snip...
.counter:
 mov    cx,ax
 shr    cx,4            	;cx: (t>>4)
 test   ax,8192         	;(t&8192)?
 jz     .1              	
 mov    bx,ax           	;bx: t
 sub    dx,dx           	;dx:ax t
 div    bp              	;dx: (t%255)
 xor    dx,bx           	;dx: (t^(t%255))
 shr    cx,1            	;cx: (t>>5)
 xchg   ax,bx           	;ax: t
 mul    dx              	;ax: t*(t^(t%255))
 or     ax,cx           	;ax: t*(t^(t%255))|(t>>4)
 shr    ax,1            	;ax: (t*(t^(t%255))|(t>>4))>>1
 jmp    .3
.1:
 test   ax,16384        	;(t&16384)?
 jz     .2
 shl    ax,2            	;ax: (t<<2)
.2:
 or     ax,cx           	;ax: ax|(t>>3)
;...snip...

The routine here uses a frequency of 18939 Hz. So regarding the 16 bit timer used here this would result in a length of a maximum of 65535/18939 = 3.46 seconds before everything loops. Usually that would be enough for some drumbeat, but not for a complete song or melody. In that case you have to use another register as a 'top' timer to trigger your changes for the sound.

One more thing to check and maybe modify if you hear an imperfect sound is the timing regarding when a sample value is actually "played". Preferable you would want to play each sample value at exactly the same time. But as your sample generation routine might need a different amount of CPU cycles each time the interrupt is called this can differ all the time, when code is used like above.

One solution for this is to play the sample calculated from the last interrupt call right away when the interrupt is called the next time. You can do that via self-modifying code like shown here. It takes 5 Bytes more:

;...snip...
irq:
pusha
mov    dx,0378h
mov    al,0           
.sample:
out    dx,al
mov    bp,255
mov    ax,0            ; ax: t
.counter:
;...snip...
inc    word [bp-255+irq.counter-2]
mov    byte [bp-255+irq.sample-1],al
mov    al,20h
out    20h,al
;...snip...

Some remarks: All the code above is not optimized to the max regarding size due to educational reasons. Depending on your code and dependency of the interrupt subroutine you can do several size optimizations.

Instead of using the interrupt 08 theoretically the user defined interrupt number 1c could be used also, but by now this seems to work only with DOSBox but not on a real system with FreeDOS. Further tests are needed to see what is the problem here. The use of interrupt 1c would save 4 bytes as the following code lines to finalize the interrupt could be omitted in the examples above:

;...snip...
mov al,20h
out 20h,al
;...snip...

What can be done with bytebeats ? Basically everything :-) You can look at this thread on pouet to get an idea and check the first PC Speaker intro with bytebeat: TCTRONIC by TomCat/Abaddon.

Some basic waveforms can be encoded like this:

;sawtooth wave
t & 127
;square wave
t & 128
;triangle wave
t ^ ((t & 128) * 127)

How would you go from here to create a specific tone, e.g. an "A4", which would have a frequency of 440 Hz (Check this link to get a list for the frequencies of the notes) ? For that you have to relate the set frequency of the interrupt to the tone frequency and the length of one wave of your wave generator. If we have a sawtooth of t&127 at 22050 Hz this would result in a tone of 22050/128 = 172.3 Hz. To reach 440 Hz we can simply stretch/multiply the timer by 440/172.3 = 2.554 to hear the desired note:

(t*2.554) & 127

Glitch is another online tool to create bytebeat sounds with some enhanced syntax. There are several good reads on this tool and the theorise behind it and byte beat here

Of course there are endless possibilities and the whole world of real time sound calculation/generation is open to you. Here are some tiny intros which use this techniques already: Plasmifier cover 256B, 2(56)unlimited, somehow.