SizeCoding - New pages [en]

Sega MegaDrive

2025-12-18T22:12:49Z

Leaq: fix typos

== SEGA MegaDrive / Genesis ==

The SEGA MegaDrive / Genesis is a 16-bit videogame console released by SEGA in 1989.

It features:
* 320x224 screen resolution by default, read "Screen Resolutions" for more information.
* 4x 16 color palettes.
* Tile based display chip with 2 scrolling tilemaps.
* Full plane or 16 pixel column independent vertical scroll.
* Full plane, 8 pixel row or scanline independent horizontal scroll.
* 80 sprites per frame in H40 mode, 64 sprites in H32 mode. Read "Screen Resolutions" for more information.
* 20 sprites per scanline in H40 mode, 16 sprites in H32 mode.
* 64K of dedicated Video Memory (VRAM)
* YM2612 FM Sound Chip + SN76489 PSG Sound Chip (Integrated in the VDP)

=== Setting up ===
* Assembler: VASM
* Emulator(s): BlastEm ( https://www.retrodev.com/blastem/ )
* Tool(s): MDTools ( https://github.com/sikthehedgehog/mdtools )
* Hardware: Sega Mega Drive / Genesis console with an everdrive.

Cartridges have a ROM header that can normally be ditched entirely, since it's only used by emulators to display information about the cartridge or choose correct emulation settings / peripherals. This is not true for all console revisions, though; one part of the header is mandatory for systems equiped with the TradeMark Security System. Read "ROM Header" and "TMSS" for more information.

=== CPU Vector table ===
This must be the very first thing in the ROM. On power up or reset the CPU initializes the stack pointer held at address $000000 and jumps to the reset vector held at address $000004. The rest of the vectors can be discarded competely, since they are only used for exception handling and video interrupts (more information about interrupts can be found in the "VDP" section).
If a cartridge doesn't want to process a particular interrupt, or wants to ignore it, standard procedure is to set that vector to a dummy handler that simply returns from the exception (RTE). For our purposes, if only the stack pointer and reset vector will be used, we can use the remaining address space used by the vector table as executable code space.

Below are examples that illustrate how the vector table is conventionally set up and how we might want to do it.

<syntaxhighlight lang="">
; Example 1: Standard procedure, where all vectors are initialized to something.
; If a particular vector is not to be used, a "null" or "dummy" handler is put.
dc.l 0x00FFE000 ; Initial stack pointer value
dc.l CPU_EntryPoint ; Start of program
dc.l CPU_Exception ; Bus error
dc.l CPU_Exception ; Address error
dc.l CPU_Exception ; Illegal instruction
dc.l CPU_Exception ; Division by zero
dc.l CPU_Exception ; CHK CPU_Exception
dc.l CPU_Exception ; TRAPV CPU_Exception
dc.l CPU_Exception ; Privilege violation
dc.l INT_Null ; TRACE exception
dc.l INT_Null ; Line-A emulator
dc.l INT_Null ; Line-F emulator
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Spurious exception
dc.l INT_Null ; IRQ level 1
dc.l INT_Null ; IRQ level 2
dc.l INT_Null ; IRQ level 3
dc.l INT_HInterrupt ; IRQ level 4 (horizontal retrace interrupt)
dc.l INT_Null ; IRQ level 5
dc.l INT_VInterrupt ; IRQ level 6 (vertical retrace interrupt)
dc.l INT_Null ; IRQ level 7
dc.l INT_Null ; TRAP #00 exception
dc.l INT_Null ; TRAP #01 exception
dc.l INT_Null ; TRAP #02 exception
dc.l INT_Null ; TRAP #03 exception
dc.l INT_Null ; TRAP #04 exception
dc.l INT_Null ; TRAP #05 exception
dc.l INT_Null ; TRAP #06 exception
dc.l INT_Null ; TRAP #07 exception
dc.l INT_Null ; TRAP #08 exception
dc.l INT_Null ; TRAP #09 exception
dc.l INT_Null ; TRAP #10 exception
dc.l INT_Null ; TRAP #11 exception
dc.l INT_Null ; TRAP #12 exception
dc.l INT_Null ; TRAP #13 exception
dc.l INT_Null ; TRAP #14 exception
dc.l INT_Null ; TRAP #15 exception
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)
dc.l INT_Null ; Unused (reserved)

; Example 2: For sizecoding purposes, we only specify the initial stack pointer
; and the reset vector, the rest of the space we can use as executable code space.
; In theory this should work, in non TMSS systems. In practice TMSS compatibility should
; be taken into account (read "TMSS" for more information) and some emulators might not like this.
dc.l 0x00FFE000 ; Initial stack pointer value
dc.l CPU_EntryPoint ; Start of program
CPU_EntryPoint:
<our demo>
bra.s CPU_EntryPoint

</syntaxhighlight>
This vector table is then followed by the ROM header at 0x100

=== ROM Header ===
The ROM Header is a data structure located at 0x000100 that specifies some metadata about the ROM; like the original and localized titles of the game, author and copyright, cart serial number, region, some information about which peripherals the game will use, Save RAM mapping ranges, ROM mapping ranges, and most importantly, the system type.
If you want to read the full structure of the ROM Header, head to https://plutiedev.com/rom-header

Mega Drive systems prior to probably the Mega Drive 1 VA5 or VA4 can boot cartridges that contain absolutely no ROM Header, as these consoles simply ignore it. You could stop there and just ditch the ROM Header entirely, but later Mega Drive 1's, Mega Drive 2's and Mega Drive 3's all come equipped with a trademark challenge-response type protection scheme that makes this not that simple. This is called the TradeMark Security System, or TMSS.

=== TMSS ===
TMSS is an onboard 16KB ROM that gets mapped to the cartridge's address space when the cosole first turns on. The protection scheme has two stages:

* First, and skipping a few minor details, TMSS uploads some code to RAM, jumps to said code and then unmaps itself and maps the cartridge into the address space. The RAM code then scans the cartridge's ROM Header for the system type. All it wants is to find the string "SEGA" or " SEGA" (to account for a typo, maybe?), if the SEGA string is found at address 0x000100, TMSS proceeds to the second stage, if not, it unmaps the cartridge and maps itself back, exits that RAM routine and locks up the system.

TMSS then maps itself back, uploads some palettes to CRAM, uploads tiles and a simple tilemap to VRAM and runs a timer. This is to show the following message on screen:
<syntaxhighlight lang="">
PRODUCED BY OR
UNDER LICENSE FROM
SEGA ENTERPRISES LTD.
</syntaxhighlight>

* In the second stage, when the timer runs out, TMSS disables display by reseting the VDP's MODE2 register, but it doesn't clear VRAM or CRAM, so a few colors and assets reside in video memory after control has been passed to the cartridge.

It then locks the VDP, maps the cartridge, initializes the cartridge's stack pointer and jumps to its reset vector. TMSS locks the VDP so the cartridge has to write the string "SEGA" at $A14000 before trying to access any video registers, otherwise the VDP won't assert /DTACK when the 68k tries to read or write to its ports, effectively locking up the console.

So, all you need to ensure your demo has full TMSS compatibility is have the string "SEGA" at 0x000100 and make your demo write "SEGA" at $A14000 before accessing any VDP port. Of course, this would make true 256 byte demos impossible, since you need those trailing 4 bytes for the "SEGA" string. What we can do is leverage ROM mirroring, and put the "SEGA" string where the stack vector would be. If we make a simple EEPROM cartridge and cut the address lines to the 256 byte range, TMSS would read 0x000100 as 0x000000.

There is a problem with emulators, however. Emulators usually load ROMs differently depending on the system type, but as we don't have one (we just care about the "SEGA" string) they will probably default to loading it as if it were a flat < 4MB cartridge. If emulators load ROMs in a flat 4MB buffer, the following parts of the address space that we expect to be mirrored might contain all zeros or garbage. To prevent this, the safest thing to do is to emulate a non TMSS system when running the demo on an emulator. BlastEm, for example, emulates non TMSS systems by default, but lets you choose.

=== Memory map ===

<syntaxhighlight lang="">
Start address End address Description
$000000 $3FFFFF Cartridge ROM/RAM
$400000 $7FFFFF Reserved (used by the Mega-CD and 32X)
$800000 $9FFFFF Reserved (used by the 32X)
$840000 $85FFFF 32X frame buffer
$860000 $87FFFF 32X frame buffer overwrite mode
$880000 $8FFFFF 32X cartridge ROM (first 512kB bank only)
$900000 $9FFFFF 32X cartridge bankswitched ROM (any 512kB bank, controlled by 32X registers)
$A00000 $A0FFFF Z80 memory space
$A10000 $A10001 Version register
$A10002 $A10003 Controller 1 data
$A10004 $A10005 Controller 2 data
$A10006 $A10007 Expansion port data
$A10008 $A10009 Controller 1 control
$A1000A $A1000B Controller 2 control
$A1000C $A1000D Expansion port control
$A1000E $A1000F Controller 1 serial transmit
$A10010 $A10011 Controller 1 serial receive
$A10012 $A10013 Controller 1 serial control
$A10014 $A10015 Controller 2 serial transmit
$A10016 $A10017 Controller 2 serial receive
$A10018 $A10019 Controller 2 serial control
$A1001A $A1001B Expansion port serial transmit
$A1001C $A1001D Expansion port serial receive
$A1001E $A1001F Expansion port serial control
$A10020 $A10FFF Reserved
$A11000 Memory mode register
$A11002 $A110FF Reserved
$A11100 $A11101 Z80 bus request
$A11102 $A111FF Reserved
$A11200 $A11201 Z80 reset
$A11202 $A12FFF Reserved
$A13000 $A130FF TIME registers; used to send signals to the cartridge
$A130EC $A130EF "MARS" when 32X is attached
$A130F1 SRAM access register
$A130F3 Bank register for address $80000-$FFFFF
$A130F5 Bank register for address $100000-$17FFFF
$A130F7 Bank register for address $180000-$1FFFFF
$A130F9 Bank register for address $200000-$27FFFF
$A130FB Bank register for address $280000-$2FFFFF
$A130FD Bank register for address $300000-$37FFFF
$A130FF Bank register for address $380000-$3FFFFF
$A14000 $A14003 TMSS "SEGA"
$A14101 TMSS/cartridge register
$A14102 $BFFFFF Reserved
$C00000 $C00001 VDP data port
$C00002 $C00003 VDP data port (mirror)
$C00004 $C00005 VDP control port
$C00006 $C00007 VDP control port (mirror)
$C00008 $C00009 VDP H/V counter
$C0000A $C0000F VDP H/V counter (mirror)
$C00011 PSG output
$C00013 $C00017 PSG output (mirror)
$C0001C $C0001D Debug register
$C0001E $C0001F Debug register (mirror)
$C00020 $FEFFFF Reserved
$FF0000 $FFFFFF 68000 RAM
</syntaxhighlight>

=== Startup ===

Here it's described what things you should do first in your ROM, and the state to expect from the console when it first turns on.

In a TMSS system, when control is passed to the cartridge, both horizontal and vertical interrupts will be disabled from the VDP side, DMA will be disabled, there will be leftover graphics and colors from the TMSS screen and display rendering will be turned off. In a non TMSS system state is pretty much random, main memory and video memory won't be initialized and the VDP could be in virtually any random state. Make sure to also read the "Interrupts" section in "VDP" below.

To ensure predictable state on any console, you should first write "SEGA" at $A14000 to unlock the VDP, test the VDP control port to make it return to a known state and enable display rendering. Some of the steps (like the hardware version check before writing "SEGA") and testing VDP control could be removed, but bugs might arise in some console revisions.

A simple ROM that sets a background color and assumes ROM mirroring:
<syntaxhighlight lang="">
dc.l "SEGA"
dc.l start

start:
; Test hardware rev. Skipping this and blindly writing to $A14000
; regardless of the console revision could work, but not guaranteed
move.w $A10000, ccr
bcc.s @skip_tmss
move.l sp, ($A14000) ; Unlock VDP with "SEGA". sp contains "SEGA"
@skip_tmss:
lea ($C00004), a0
; Stop VDP / Return VDP to known state. Skipping this could
; result in the VDP processing whatever was previously in its FIFO after a reset
tst.w (a0)

move.l #$80048F00, (a0) ; Set bit 2 of MODE1, and set Auto Increment to 0
move.l #$8144C000, (a0) ; Set bit 2 of MODE2, enable display, and point data port to CRAM
move.w #$0F00, -4(a0) ; Write blue at color index 0 of CRAM
bra.s *
</syntaxhighlight>

=== VDP ===

The Video Display Processor (VDP) is at the heart of graphics (and some sound) features of the Mega Drive. At first glance it might seem like a modest tile based video controller but it has tricks up its sleeve. It is assumed that you are familiar with VDP registers. If not, this resource is very handy: https://plutiedev.com/vdp-registers

You should also know what ports to use and how to write to these various registers: https://plutiedev.com/vdp-setup

==== Screen Resolutions ====

The VDP always initializes to a 320x224 pixel resolution, but on PAL systems a higher vertical resolution of 240 pixels is selectable through the MODE2 register. Bit 3 selects either V28 (224px) or V30 (240px) vertical resolutions.

Available on all systems regardless of region is the possibility of choosing a narrower horizontal resolution of 256 pixels as well. This is done through the MODE4 register, where if bits 7 and 0 are both clear, H32 mode or 256 pixel horizontal resolution is selected. If both bits are set instead, the standard H40 mode or 320 pixel horizontal resolution is selected.

===== Very important thing about H32 =====

The native horizontal resolution of the VDP is 320 pixels, as we've said, how does it achieve a 256 pixel horizontal resolution in H32 mode, then? By underclocking its pixel clock.

This means that when you select a horizontal resolution of 256 pixels the VDP runs slower, since a lot of its internal logic is dependent on the pixel clock. One very important result that you'll have to face if you select this mode is that the maximum sprite amount per frame will be reduced from 80 (in H40) to 64 (in H32). Coincidentally, the maximum sprite amount per scanline will also be reduced from 20 to 16. This is simply because the VDP will run at a slower pixel clock and it won't be fast enough to fetch all the sprites it's capable of rendering during every blanking period.

==== Sprites ====

Sprites are freely moving objects independent from the 2 background planes. They can be configured to be any size ranging from 1x1 tile to 4x4 tiles (8x8 px tiles, so 8x8 to 32x32 px). Any combination of width and height as long as it's within the 1x1 to 4x4 limit is allowed.

Sprites reside in VRAM as a sort of display list. Every sprite field has x,y position information, tile index for where to find the first tile of the sprite in VRAM, horizontal and vertical flipping, which palette (PAL 0, 1, 2, or 3) to use and finally the "link" value. The link value is used to tell the VDP which sprite in the Sprite Attribute Table (SAT) to process next. The VDP will always start processing the first sprite slot found in the SAT, then it reads the link value for the next sprite and jumps to its slot. Sprite processing ends when the VDP finds a sprite with a link value of 0.

Sprites can be multiplexed (i.e. change their possition and attributes mid frame) to surpass the maximum amount of sprites per frame. Beware that the VDP caches some of the sprite attributes, however. For every sprite fetch it caches the Y coordinate and size of the sprite. The cache is flushed by writing to the Sprite Attribute Table in VRAM, but not if the table's address is swapped mid frame, so if you have two Sprite Attribute Tables and just swap from one to the other mid frame to change sprite behavior on the fly you'll end up with half the data from the previous table affecting your new sprites.

More information about sprite definition and caching can be found here: https://plutiedev.com/sprites

==== Background planes ====

The VDP can generate 2 planes of tile based backgrounds. Each tile in the tilemap(s) can reference a different 16 color palette, and each background plane can be independently scrolled in a few different ways:

* Each plane can be scrolled completely. Or scrolled in "full plane".
* Each plane can be scrolled per individual vertical column. Each column is 16 pixels in width. This is done by writing to VSRAM (vertical scroll RAM). Each word is a different scroll position for each column on each plane.
* Each plane can be scrolled per individual horizontal row of 8 pixels in height. Think of it as scrolling "one row of tiles". This is done by writing to HSCROLL, a horizontal scroll table similar to VSRAM but held in VRAM.
* Each plane can be scrolled per individual horizontal line. Yes, you can do scanline independent scrolling without interrupts. This is also through the HSCROLL table.

Both planes can be either 32x32, 32x64, 64x32, 64x64, 128x64 or 64x128 tiles in size. You cannot select different sizes for each plane. If you select 32x32, both planes will be 32x32 tiles in size.

==== Interrupts ====

This is important for sizecoding purposes. The VDP generates horizontal interrupts, vertical interrupts and "external" interrupts. External interrupts are for lightguns and such peripherals.

Interrupts are enabled from two sides:
* From the interrupt mask set in the 68k
* From the VDP registers

It's impossible to get an interrupt on startup unless you enable it from both sides, because even if interrupts were to be enabled through VDP registers, the 68k always initializes to IPL level 7 on startup, so all interrupts are ignored.

To implement raster effects without interrupts, you can poll the VDP's HVCOUNT port ($C00008). You can get an approximate of what line the VDP is currently rendering there.

To get VBlank state you can poll the VDP's status register (reading from VDP control, or $C00004) and check the VBlank flag.

=== Additional Resources ===
* [https://demozoo.org/productions/?platform=22 SEGA MegaDrive demoscene productions]
* [https://plutiedev.com/ Plutiedev. Lots of info about any aspect of the system. This is more than enough to get you started]
* [https://plutiedev.com/mirror/kabuto-hardware-notes SEGA MegaDrive Hardware notes]
* [https://www.scribd.com/document/637611089/SEGA-Mega-Drive-Assembly-Workshop SEGA MegaDrive Assembly Workshop]
* [https://github.com/BigEvilCorporation/megadrive_samples/tree/master Various SEGA MegaDrive assembler examples]
* [https://github.com/OrionNavattan/TMSS-Disassembly/blob/main/tmss.asm TMSS ROM disassembly]

Sinclair QL

2025-12-06T13:22:32Z

Superogue:

== Sinclair QL ==
The Sinclair QL (for Quantum Leap) is a personal computer launched by Sinclair Research in 1984, as an upper-end counterpart to the ZX Spectrum.
It was aimed at the serious home user and professional and executive users markets from small to medium-sized businesses and higher educational establishments, but failed to achieve commercial success.
The Sinclair QL uses a Motorola 68008.CPU with 32-bit internal data registers, but an 8-bit external data bus.

=== Setting up ===
* Assembler: VASM
* Emulator(s): Q-emulator ( http://www.terdina.net/ql/winql.html )
* Tool(s): Mdvtool (create MDV images)
* Hardware: Sinclair QL machine or Spectrum Next running the QL core

In order to test the final result one can use a floppy-disk replace replacement or Qubide interface to load files from SD card.

=== Loader ===
To load content from a folder or MDV image, you need both the binary and a basic loader, similar to platforms like the ZX Spectrum and CPC, create a file called BOOT in the output folder with the following code:

<syntaxhighlight lang="">
10 PROGRAM=RESPR(512)
15 LBYTES "MDV1_CODE",PROGRAM
20 CALL PROGRAM
</syntaxhighlight>

Now copy your intro code as CODE into the same output folder and Q-Emulator should be able to pick everything up automatically and load the code. You can link the folder directly from the Q-emulator or generate a MDV image using mdvtool for distribution.

=== Memory map ===

<syntaxhighlight lang="">
$00000 Onboard 48k Rom
$0C000 16K Rom Cartridge
$10000 Onboard I/O
$18000 (R) RTC byte 0 / (W) RTC Reset
$18001 (R) RTC byte 1 / (W) RTC Step
$18002 (R) RTC byte 2 / (W) Transmit control
$18003 (R) RTC byte 3 / (W) IPC link control
$18020 (R) Microdrive/RS232c status / (W) Microdrive control
$18021 (R) Interrupt/IPC status / (W) Interrupt control
$18022 (R) Microdrive Track 1 / (W) Microdrive / RS232C data
$18023 (R) Microdrive Track 2 / (W) Display control
$18063 Screen Mode S---C-O- On Colordepth Screenpage
$20000 Screen 1 Screen Ram
$28000 Screen 2 /
System system (systemvars*)
$2847C System stack pointer*
$28E00 Base of Common Heap*
$2BC00 Free area*
$30000 Running Programs Free area
$37200 Basic area*
$38000 User Stack pointer*
$38000 Prog data*
$40000 Add on ram (up to 512k)
$C0000 Add on peripherals
$E0000 Add on Rom (up to 128k)
$FFFFF End of address space
</syntaxhighlight>

=== Video display ===
There are two possible screen modes, configured by bit 3 of port $18063

setting a 0 give 4 colors at 512x256 with Black,Red,Green and White
setting a 1 give 8 colors at 256x256 with Black, R, G B, C, M, Y and White

The Screen is directly memory mapped from $20000-$28000, There is no palette - the colors are fixed.

==== Vsync ====
Port $18021 bit 3 will go high (1) when Vsync starts, then we need to write a 1 to that same bit at the same port to clear the Vsync event.
Therefore, in effect we can write 255 to port $18021, then read from $18021 until it's nonzero to get the Vsync event.

<syntaxhighlight lang="">
move.b #%11111111, $18021 ; Clear interrupt bits
waitvbl:
move.b $18021,d0 ; Read in interrupt state
tst.b d0 ; Wait for an interrupt
beq waitvbl
</syntaxhighlight>

==== Before using Screen2 ====
Since the Screen2 area is mapped to the system, we need to add the following code at the start of the program to be able
to make use of it.
<syntaxhighlight lang="">
trap #0
ori #0700,sr
</syntaxhighlight>

Or alternatively move the stack manually
<syntaxhighlight lang="">
move.l #$40000-4,sp
</syntaxhighlight>

==== Swapping screens ====
To swap between the two screens via hardware, we need to swap the buffer pointer and set the screen register accordingly.

<syntaxhighlight lang="">
resetScreens:
move.l #$20000,screenpointer1
move.l #$28000,screenpointer2
move.b #8,scr
move.b #0,scr+1
move.b #8,$18063
rts

swapscreens:
lea screenpointer1,a0 ; get screen ptr address
move.l (a0),d0 ; save it
move.l 4(a0),(a0)+ ; rotate screenbuffers
move.l d0,(a0) ; write back
lea scr,a0 ; get flipbit
move.b (a0),$18063 ; write flipbit
eor.b #128,(a0) ; flip the bit
rts

scr: dc.b 8,0
</syntaxhighlight>

==== Copying and clearing the screen ====
As an alternative to swapping the screenpointers, here is some compact, but slow, code to copy over the contents from screen2 to screen1 and clear the screen2.
<syntaxhighlight lang="">
lea $28000,a0 ; screen2
lea $20000,a1 ; screen1
move.l a0,a2
move.w #256*32,d0
swappage:
move.l (a2),(a1)+
clr.l (a2)+
dbf d0,swappage
</syntaxhighlight>

==== Plotting to screen ====
While there are a few rom/trap routines that allow you to draw primitives, these are quite expensive to setup and clunky to use.
So its best to draw into the screenmemory directly, using the following plot routines provided by gigabates.

<syntaxhighlight lang="">
; plot pixel
; a0 = screen, d0 = x, d1 = y, d2 = color index 0-7
plot:
lsl.w #6,d1 ; y byte
moveq #3,d3 ; x pixel shift
and.w d0,d3
add.w d3,d3
asr.w #2,d0 ; x byte
add.w d0,d1 ; total bytes
add.w d1,d1 ; to word offset
lsl.b #6,d2
bcc .noGreen
add.w #$8000,d2
.noGreen:
lsr.w d3,d2 ; shift scrambled color to px pos
or.w d2,(a0,d1)
rts
</syntaxhighlight>

=== Sound ===
The Spectrum QL has 2 channels of internal beeped sound, or AY support via an extension board.

Speaker Sound commands have to be passed via the Bios, using the same kind of commands as with the keyboard.
You need to adjust the Pitch settings to change the sound, and you can change the randomness bits to make the sound distorted, It seems it's not possible to change the volume!

<syntaxhighlight lang="">
move.l #$11,d0 ; set sound command
lea .sounddata,a3 ; load sound message payload pointer to a3
trap #1 ; call dosound
</syntaxhighlight>

=== Additional Resources ===
* [https://demozoo.org/productions/?platform=95 Sinclair QL demoscene productions]
* [https://ia600804.us.archive.org/9/items/SinclairQLHomepage/docs/manuals/qltm.pdf QL Technical Guide]
* [https://www.chibiakumas.com/68000/sinclairql.php 68008 Assembly programming for the Sinclair QL]
* [https://dilwyn.qlforum.co.uk/ Sinclair QL Pages]
* [http://www.terdina.net/ql/winql.html Q_emulator]

Prototyping DOS effects with ShaderToy

2025-10-05T19:21:08Z

Pestis: /* Vector arithmetic (examples) */

Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.

To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.

== Scalar operators ==

{| class="wikitable"
|-
! ShaderToy !! Bytes !! x87 equivalent
|-
| x+=y || 2 || faddp st1, st0
|-
| x+y || 4 || If both x and y are needed later: fld st0; fadd st0, st2
|}

The cost for <code>-</code>, <code>*</code>, and <code>/</code> scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary <code>fld</code> or <code>fxch</code> instructions.

Notice the existence of <code>fsubr</code> and <code>fdivr</code> instructions, so <code>x=(y/x)</code> can still be just 2 bytes, even if it looks more complicated in ShaderToy.

Also notice that operating on a single component of a vector (<code>b.x += a.x</code>) is actually a scalar operation and thus takes the same 2-4 bytes.

== Scalar functions ==

{| class="wikitable"
|-
! ShaderToy !! Bytes !! x87 equivalent
|-
| -x || 2 || fchs
|-
| abs(x) || 2 || fabs
|-
| sqrt(x) || 2 || fsqrt
|-
| sin(x) || 2 || fsin
|-
| cos(x) || 2 || fcos
|-
| sin(x) ... cos(x) || 2 || fsincos
|-
| tan(x) || 2 || fptan
|-
| atan(y,x) || 2 || fpatan
|-
| log2(x) || 4 || fld1 ... fyl2x
|-
| sign(x) || 6 || Computed as x/abs(x) fld st0; fabs; fdivp st1, st0 Note that this does not handle the case x=0
|-
| mix(x,y,a) || 10 || Stack in: a x y. Stack out: x*(1-a)+y*a. fmul st2, st0; fld1; fsubrp st1, st0; fmulp st1, st0; faddp st1, st0
|-
| 2*min(x,y) || 10 || Computed as a+b-abs(a-b) i.e. fld st0; fsub st0, st2; fabs; fsubp st1, st0; faddp st1, st0 Replace fsubp with faddp for 2*max(x,y)
|-
| min(x,y) || 11 || fcom st0, st1; fnstsw ax; sahf; jc .S; fxch st0, st1; .S: fstp st1, st0 Replace jc with jnc for max(x,y)
|-
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1
|-
| 2*clamp(x,-1,1) || 14 || Computed as abs(1+x) - abs(1-x) i.e. fld1; fadd st0, st1; fabs; fld1; fsub st0, st2; fabs; fsubp st1, st0 You can use other constants in place of fld1
|-
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code
|-
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code
|}

There are no <code>acos</code>, <code>asin</code>, <code>sinh</code>, <code>cosh</code>, <code>tanh</code>, <code>asinh</code>, <code>acosh</code>, and <code>atanh</code> instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as <code>tanh</code> is a classic "squash" function to get any number into -1 .. 1 range.

Notice the significant byte cost of min and max operations, primarily due to the absence of fcomi and fmovcc instructions on older processors. This can catch people off guard, as many shaders use min/max extensively. For example, raymarchers often rely on them to compute shape unions. Avoiding unnecessary use of min and max can prevent a few headaches later.

== Rounding and remainders ==

{| class="wikitable"
|-
! ShaderToy !! Bytes !! x87 equivalent
|-
| round(x) || 2 || frndint with the default rounding mode, which is nearest
|-
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.
|-
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 10, round up), followed by frndint
|-
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 01, round down), followed by frndint
|-
| trunc(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 11, round towards zero), followed by frndint
|}

Notice that <code>x-round(x)</code> is a very compact way to do domain repetition for raymarchers.

== Vector arithmetic (examples) ==

{| class="wikitable"
|-
! ShaderToy !! Bytes !! x87 equivalent
|-
|| a.xy = a.yx || 2 || fxch st0, st1
|-
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;
|-
|| a+=b || 5-6 || Assuming b is not needed later. 6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0; 5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]
|-
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x
|-
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6
|-
|| length(a) || 14 || If a is not needed later: fmul st0, st0; fxch st0, st2; fmul st0, st0; faddp st2, st0; fmul st0, st0; faddp st1, st0; fsqrt
|-
|| mix(a,b,k) || 21-22 || Stack in: k a.x a.y a.z b.x b.y b.z fmul st4, st0; fmul st5, st0; fmul st6, st0; fld1; fsubrp st1, st0; fmul st1, st0; fmul st2, st0; fmulp st3, st0; faddp st3, st0; faddp st3, st0; faddp st3, st0 For the last three repeating instructions, you might be able to use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]
|}

From this you can already see that a simple <code>normalize(a)</code> is going to take a lot of bytes, as it has to be computed as <code>a/=length(a)</code>. Therefore, normalizing your raymarchers rays is usually to be avoided. <code>cross</code>, <code>reflect</code>, and <code>refract</code> are probably also too costly for sizecoding.

== Floating point constants ==

x87 has the following constants built-in and loading each takes just 2 bytes:

{| class="wikitable"
|-
! Constant !! Approximation !! Instruction
|-
| 0.0 || 0.0 || fldz
|-
| 1.0 || 1.0 || fld1
|-
| pi || 3.14159... || fldpi
|-
| log2(e) || 1.44270... || fldl2e
|-
| loge(2) || 0.69315... || fldln2
|-
| log2(10) || 3.32193... || fldl2t
|-
| log10(2)|| 0.30103... || fldlg2
|}

Thus, if you just need "some random constant" in your shader, using one of these can save bytes. Notice, however, that <code>fldpi; fmulp st1, st0</code> is still 4 bytes, whereas <code>fmul st0, dword [bp+offset]</code> can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.

Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.

== Case study: Balrog ==

With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:

<pre>
for(int j=0;j<ITERS;j++){
t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition
t.x += t.x; // domain scaling
r *= RSCALE;
r += t.x*t.x;
t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.
t.x += t.z * o; // rotation, but using very poor math
t.z -= t.x * o;
}
</pre>

Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:

<pre>
mov cl, ITERS
.maploop: ; for(int j=0;j<ITERS;j++) {
fld st0
frndint
fsubp st1, st0
fabs ; t.x = abs(t.x - round(t.x))
fadd st0 ; t.x += t.x;
fld dword [c_rscale+bp-BASE]
fmulp st4, st0 ; r *= RSCALE
fld st0
fmul st0
faddp st4, st0 ; r += t.x*t.x
fxch st2, st0
fxch st1, st0 ; t.xyz = t.yzx
fld st2
fmul dword [si]
faddp st1, st0 ; t.x += t.z * o;
fld st0
fmul dword [si]
fsubp st3, st0 ; t.z -= t.x * o
loop .maploop ; }
</pre>

The comments show exactly how each ShaderToy line maps to different x87 instructions.

The Balrog code also later exemplifies the floating point truncation technique:

<pre>
c_mindist equ $-3
db 0x38 ; 0.0001
c_glowamount equ $-2
c_colorscale equ $-2
dw 0x3d61 ; 0.055
c_stepsizediv equ $-1
db 0x03 ; 807
c_stepsizediv_z equ $-3
db 0x40 ; 2.1006666666666662
c_glowdecay equ $-2
dw 0x461c ; 1e4
c_rscale equ $-2
db 0xa1, 0x3f ; 1.2599210498948732
c_rdiv equ $-2
dw 0x434b ; 203.18733465192963
c_camz equ $-1
db 0xcc, 0x12, 0x42 ; 36.7
c_xdiv equ $-1
db 0x09, 0x00, 0x40 ; 2.0006
c_xmult equ $-2
dw 0x3f2a
c_camy equ $-2
dw 0x3f1c ; 0.61
</pre>

Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.