<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>http://www.sizecoding.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Kuemmel</id>
		<title>SizeCoding - User contributions [en]</title>
		<link rel="self" type="application/atom+xml" href="http://www.sizecoding.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Kuemmel"/>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/wiki/Special:Contributions/Kuemmel"/>
		<updated>2026-05-02T16:19:26Z</updated>
		<subtitle>User contributions</subtitle>
		<generator>MediaWiki 1.27.0</generator>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=RISC_OS_on_ARM_based_CPUs&amp;diff=733</id>
		<title>RISC OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=RISC_OS_on_ARM_based_CPUs&amp;diff=733"/>
				<updated>2020-09-29T20:12:31Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on RISC OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn RISC PC. All those home computers were run by RISC OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast RISC OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run RISC OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (&amp;lt;code&amp;gt;r0...15&amp;lt;/code&amp;gt;, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;), 32 64-Bit multi purpose or double precision registers for SIMD (&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;), and 16 128-Bit multi purpose registers for SIMD (&amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt;). All those registers are mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in RISC OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does RISC OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
[[File:edgedancer.png]]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
[[File:elsecaller.png]]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in RISC OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in RISC OS (&amp;amp;8000 is the general start address for executables in RISC OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on RISC OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
VFP and NEON are basically the FPU and the SIMD (single instruction, multiple data) extension of the ARM instruction set. VFP works with 32 single (32 Bit) and double precision (64 bit) floating point registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;,&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
NEON uses the same register set regarding &amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt; and adds the 128 Bit sized &amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt; registers. It can use and operate with multiple integer data types (8,16,32,64 Bit signed and unsigned) and single precision floating point (32 Bit) numbers. It's also possible to use an indexed register like d0[0], e.g. for multiplying multiple data in a register by a single scalar from another register (like &amp;lt;code&amp;gt;vmul.f32 q0,q1,d4[0]&amp;lt;/code&amp;gt; =&amp;gt; multiply each of the 4 single floats in q1 by single float d4[0] and place results in q0).&lt;br /&gt;
&lt;br /&gt;
Another feature is to have instructions that saturate the results, which is quite useful when working with colouring. So e.g. &amp;lt;code&amp;gt;vqadd.u8 q0,q1,q2&amp;lt;/code&amp;gt; would add 4 true colour RGB pixels (datatype &amp;lt;code&amp;gt;u8&amp;lt;/code&amp;gt; = unsgined 8 Bit integer) from q1 to the ones in q2. If an overflow occurres the value would be saturated to 255.&lt;br /&gt;
&lt;br /&gt;
The pure amount of available NEON instructions and their variations (saturating, narrowing, widening,...) is quite huge, make sure to check the links below to read up on that. Like explained before the register set of VFP and NEON and the registers sets themselves are mapped on each other. So modifying &amp;lt;code&amp;gt;s0&amp;lt;/code&amp;gt; would result in modifying the low 32 Bits of &amp;lt;code&amp;gt;d0&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;q0&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Before we can use the VFP/NEON unit within our RISC OS code we need to invest some bytes in requesting a so called &amp;lt;i&amp;gt;VFPContext&amp;lt;/i&amp;gt; for initialization. To do so with ARM code that would look like this.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;mov r0,#3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
mov r1,#32          //request full set of 32 VFP/NEON registers&lt;br /&gt;
mov r2,#0&lt;br /&gt;
swi VFPSupport_CreateContext&amp;lt;/syntaxhighlight&amp;gt;Further documentation on the SWI &amp;lt;i&amp;gt;VFPSupport_CreateContext&amp;lt;/i&amp;gt; can be found [https://www.riscosopen.org/wiki/documentation/show/VFPSupport_CreateContext here]&lt;br /&gt;
&lt;br /&gt;
The same in THUMB-2:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;movs r1,#32&lt;br /&gt;
lsls r0,r1,#26      //reuse r1&lt;br /&gt;
adds r0,r0,#3       //r0=3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
movs r2,#0&lt;br /&gt;
movw r10,#0x8ec1&lt;br /&gt;
movt r10,#0x5&lt;br /&gt;
swi OS_CallASWI     //needed due to swi number &amp;gt;0xff&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The major interest on using NEON is speed (for floats and integers) and working with floats in general, not so much size, as like shown above the setup consumes some bytes. If your code doesn't need floats and the speed is good enough there might not be much need for using NEON. You will find also a small amount of parallel arithmetic and saturating capable instructions for normal ARM integer code in the instruction set. But as shown in Kuemmels's [https://www.pouet.net/prod.php?which=85284 RISC OS 3dball]...that wouldn't be probably possible without NEON in that size/speed.&lt;br /&gt;
&lt;br /&gt;
[[File:3dball.png]]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on RISC OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The RISC OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Code Tricks ==&lt;br /&gt;
=== Cheap Absolute Value ===&lt;br /&gt;
Usually calculating ABS() would look like this in ARM (of course the cmp could be omitted if a preceding instruction triggered the flags already).&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0&lt;br /&gt;
rsblt r0,r0,#0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
and in Thumb&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0&lt;br /&gt;
it    lt&lt;br /&gt;
rsblt r0,r0,#0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
If your routine is okay with a small deviation (this variant will form a negative value using the formulae ABS(x) = -x - 1) this can be done by just one instruction (ARM). Hint taken from Coder's Revenge 06/96 disc mag.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;eor r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
In Thumb this would need 2 instructions so I skip that here.&lt;br /&gt;
=== Clamping === &lt;br /&gt;
Clamping towards zero or similar code cases can be optimized in the same way. Here are 3 examples also from Coder's Revenge 06/96:&lt;br /&gt;
&lt;br /&gt;
Example 1 - All values &amp;lt; 0 should be 0 &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0 // r0 ? 0&lt;br /&gt;
movlt r0,#0 // r0&amp;lt;0 =&amp;gt; r0=0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
can be done by&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;bic   r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt;  &lt;br /&gt;
Example 2 - All values &amp;gt;= 0 should be 0 &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0 // r0  ? 0&lt;br /&gt;
movge r0,#0 // r0&amp;gt;=0 =&amp;gt; r0=0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
can be done by&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;and   r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
Example 3 - All values &amp;lt;= -1 should be -1 &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmn   r0,#1 // r0 ? -1&lt;br /&gt;
mvnle r0,#0 // r0&amp;lt;=-1 =&amp;gt; r0=-1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
can be done by&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;orr   r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
All of those simplifications can be coded in Thumb, too, but would need also two instructions each and therefore end up with the same size, so I skip that here.&lt;br /&gt;
Clamping towards zero or similar code cases can be optimized in the same way. Here are 3 examples also from Coder's Revenge 06/96:&lt;br /&gt;
=== Limiting values to byte or halfword ===&lt;br /&gt;
It might be a trivia but could be overlooked. If you would want to limit a register to byte or halfword in thumb mode you would want to do  &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;and r0,r0,#0xff   // limit to byte&lt;br /&gt;
and r1,r1,#0xffff // limit to halfword (it's impossible to assemble this as 0xffff is out of range)&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
...but the first one assembles to 4 bytes and the later one isn't even possible to assemble (it's doable with the 4 byte instruction &amp;lt;code&amp;gt;bfc r1,16,16&amp;lt;/code&amp;gt; or two shift instructions). But we are lucky and the extending instructions come in handy to do the job with just 2 byte instructions each:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;uxtb r0,r0 // unsigned byte extend to word = limit to byte&lt;br /&gt;
uxth r1,r1 // unsigned halfword extend to word = limit to halfword &amp;lt;/syntaxhighlight&amp;gt;  &lt;br /&gt;
&lt;br /&gt;
== Compression ==&lt;br /&gt;
&lt;br /&gt;
Due to the overhead of a decompression-routine, compressing your intro would start making sense from a level of may be 512 Byte and for sure when you aim coding an intro &amp;gt;=1 Kbyte.&lt;br /&gt;
&lt;br /&gt;
Luckily we already have a tool for that. That is the absolute and untyped file compressor called &amp;lt;b&amp;gt;Codepressor&amp;lt;/b&amp;gt; originally developed by Pervect/Topix, now updated and maintained by Phlamethrower. Check out this [http://www.phlamethrower.co.uk/riscos/cpress.php Link] to get the latest version. It contains different compression algorithms and will try all of them and choose the best finally.&lt;br /&gt;
&lt;br /&gt;
The usage is quite straight forward. Just let the filer see the application and type in the command line&lt;br /&gt;
&amp;lt;code&amp;gt;codepressr &amp;lt;filename_in&amp;gt; &amp;lt;filename_out&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
to compress your intro. As an example the intro [https://www.pouet.net/prod.php?which=85044 'blury'] was compressed from 966 to 832 Bytes.&lt;br /&gt;
&lt;br /&gt;
Codepressor also supports Thumb executables, both, if the thumb trigger is set within the code or as shown above set within the filesystem. Future versions might also have thumb decompression routines to make the files even shorter. At the time of writing the decompression routines are written in normal ARM code.&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for RISC OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0006/e/ Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0001/m ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0007/e Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=RISC_OS_on_ARM_based_CPUs&amp;diff=732</id>
		<title>RISC OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=RISC_OS_on_ARM_based_CPUs&amp;diff=732"/>
				<updated>2020-09-29T20:11:32Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on RISC OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn RISC PC. All those home computers were run by RISC OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast RISC OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run RISC OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (&amp;lt;code&amp;gt;r0...15&amp;lt;/code&amp;gt;, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;), 32 64-Bit multi purpose or double precision registers for SIMD (&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;), and 16 128-Bit multi purpose registers for SIMD (&amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt;). All those registers are mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in RISC OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does RISC OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
[[File:edgedancer.png]]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
[[File:elsecaller.png]]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in RISC OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in RISC OS (&amp;amp;8000 is the general start address for executables in RISC OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on RISC OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
VFP and NEON are basically the FPU and the SIMD (single instruction, multiple data) extension of the ARM instruction set. VFP works with 32 single (32 Bit) and double precision (64 bit) floating point registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;,&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
NEON uses the same register set regarding &amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt; and adds the 128 Bit sized &amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt; registers. It can use and operate with multiple integer data types (8,16,32,64 Bit signed and unsigned) and single precision floating point (32 Bit) numbers. It's also possible to use an indexed register like d0[0], e.g. for multiplying multiple data in a register by a single scalar from another register (like &amp;lt;code&amp;gt;vmul.f32 q0,q1,d4[0]&amp;lt;/code&amp;gt; =&amp;gt; multiply each of the 4 single floats in q1 by single float d4[0] and place results in q0).&lt;br /&gt;
&lt;br /&gt;
Another feature is to have instructions that saturate the results, which is quite useful when working with colouring. So e.g. &amp;lt;code&amp;gt;vqadd.u8 q0,q1,q2&amp;lt;/code&amp;gt; would add 4 true colour RGB pixels (datatype &amp;lt;code&amp;gt;u8&amp;lt;/code&amp;gt; = unsgined 8 Bit integer) from q1 to the ones in q2. If an overflow occurres the value would be saturated to 255.&lt;br /&gt;
&lt;br /&gt;
The pure amount of available NEON instructions and their variations (saturating, narrowing, widening,...) is quite huge, make sure to check the links below to read up on that. Like explained before the register set of VFP and NEON and the registers sets themselves are mapped on each other. So modifying &amp;lt;code&amp;gt;s0&amp;lt;/code&amp;gt; would result in modifying the low 32 Bits of &amp;lt;code&amp;gt;d0&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;q0&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Before we can use the VFP/NEON unit within our RISC OS code we need to invest some bytes in requesting a so called &amp;lt;i&amp;gt;VFPContext&amp;lt;/i&amp;gt; for initialization. To do so with ARM code that would look like this.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;mov r0,#3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
mov r1,#32          //request full set of 32 VFP/NEON registers&lt;br /&gt;
mov r2,#0&lt;br /&gt;
swi VFPSupport_CreateContext&amp;lt;/syntaxhighlight&amp;gt;Further documentation on the SWI &amp;lt;i&amp;gt;VFPSupport_CreateContext&amp;lt;/i&amp;gt; can be found [https://www.riscosopen.org/wiki/documentation/show/VFPSupport_CreateContext here]&lt;br /&gt;
&lt;br /&gt;
The same in THUMB-2:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;movs r1,#32&lt;br /&gt;
lsls r0,r1,#26      //reuse r1&lt;br /&gt;
adds r0,r0,#3       //r0=3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
movs r2,#0&lt;br /&gt;
movw r10,#0x8ec1&lt;br /&gt;
movt r10,#0x5&lt;br /&gt;
swi OS_CallASWI     //needed due to swi number &amp;gt;0xff&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The major interest on using NEON is speed (for floats and integers) and working with floats in general, not so much size, as like shown above the setup consumes some bytes. If your code doesn't need floats and the speed is good enough there might not be much need for using NEON. You will find also a small amount of parallel arithmetic and saturating capable instructions for normal ARM integer code in the instruction set. But as shown in Kuemmels's [https://www.pouet.net/prod.php?which=85284 RISC OS 3dball]...that wouldn't be probably possible without NEON in that size/speed.&lt;br /&gt;
&lt;br /&gt;
[[File:3dball.png]]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on RISC OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The RISC OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Code Tricks ==&lt;br /&gt;
=== Cheap Absolute Value ===&lt;br /&gt;
Usually calculating ABS() would look like this in ARM (of course the cmp could be omitted if a preceding instruction triggered the flags already).&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0&lt;br /&gt;
rsblt r0,r0,#0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
and in Thumb&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0&lt;br /&gt;
it    lt&lt;br /&gt;
rsblt r0,r0,#0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
If your routine is okay with a small deviation (this variant will form a negative value using the formulae ABS(x) = -x - 1) this can be done by just one instruction (ARM). Hint taken from Coder's Revenge 06/96 disc mag.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;eor r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
In Thumb this would need 2 instructions so I skip that here.&lt;br /&gt;
=== Clamping === &lt;br /&gt;
Clamping towards zero or similar code cases can be optimized in the same way. Here are 3 examples also from Coder's Revenge 06/96:&lt;br /&gt;
&lt;br /&gt;
Example 1 - All values &amp;lt; 0 should be 0 &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0 // r0 ? 0&lt;br /&gt;
movlt r0,#0 // r0&amp;lt;0 =&amp;gt; r0=0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
can be done by&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;bic   r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt;  &lt;br /&gt;
Example 2 - All values &amp;gt;= 0 should be 0 &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0 // r0  ? 0&lt;br /&gt;
movge r0,#0 // r0&amp;gt;=0 =&amp;gt; r0=0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
can be done by&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;and   r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
Example 3 - All values &amp;lt;= -1 should be -1 &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmn   r0,#1 // r0 ? -1&lt;br /&gt;
mvnle r0,#0 // r0&amp;lt;=-1 =&amp;gt; r0=-1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
can be done by&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;orr   r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
All of those simplifications can be coded in Thumb, too, but would need also two instructions each and therefore end up with the same size, so I skip that here.&lt;br /&gt;
Clamping towards zero or similar code cases can be optimized in the same way. Here are 3 examples also from Coder's Revenge 06/96:&lt;br /&gt;
=== Limiting values to byte or halfword ===&lt;br /&gt;
It might be a trivia but could be overlooked. If you would want to limit a register to byte or halfword in thumb mode you would want to do  &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;and r0,r0,#0xff   // limit to byte&lt;br /&gt;
and r1,r1,#0xffff // limit to halfword (it's impossible to assemble this as 0xffff is out of range)&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
...but the first one assembles to 4 bytes and the later one isn't even possible to assemble (it's doable with the 4 byte instruction &amp;lt;code&amp;gt;bfc r1,16,16&amp;lt;/code&amp;gt; or two shift instructions). But we are lucky and the extending instructions come in handy to do the job with just 2 byte instructions each:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;uxtb r0,r0 // unsigned byte to halfword = limit to byte&lt;br /&gt;
uxth r1,r1 // unsigned halfword extend to word = limit to halfword &amp;lt;/syntaxhighlight&amp;gt;  &lt;br /&gt;
&lt;br /&gt;
== Compression ==&lt;br /&gt;
&lt;br /&gt;
Due to the overhead of a decompression-routine, compressing your intro would start making sense from a level of may be 512 Byte and for sure when you aim coding an intro &amp;gt;=1 Kbyte.&lt;br /&gt;
&lt;br /&gt;
Luckily we already have a tool for that. That is the absolute and untyped file compressor called &amp;lt;b&amp;gt;Codepressor&amp;lt;/b&amp;gt; originally developed by Pervect/Topix, now updated and maintained by Phlamethrower. Check out this [http://www.phlamethrower.co.uk/riscos/cpress.php Link] to get the latest version. It contains different compression algorithms and will try all of them and choose the best finally.&lt;br /&gt;
&lt;br /&gt;
The usage is quite straight forward. Just let the filer see the application and type in the command line&lt;br /&gt;
&amp;lt;code&amp;gt;codepressr &amp;lt;filename_in&amp;gt; &amp;lt;filename_out&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
to compress your intro. As an example the intro [https://www.pouet.net/prod.php?which=85044 'blury'] was compressed from 966 to 832 Bytes.&lt;br /&gt;
&lt;br /&gt;
Codepressor also supports Thumb executables, both, if the thumb trigger is set within the code or as shown above set within the filesystem. Future versions might also have thumb decompression routines to make the files even shorter. At the time of writing the decompression routines are written in normal ARM code.&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for RISC OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0006/e/ Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0001/m ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0007/e Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=RISC_OS_on_ARM_based_CPUs&amp;diff=731</id>
		<title>RISC OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=RISC_OS_on_ARM_based_CPUs&amp;diff=731"/>
				<updated>2020-09-29T20:09:22Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: some tricks added&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on RISC OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn RISC PC. All those home computers were run by RISC OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast RISC OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run RISC OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (&amp;lt;code&amp;gt;r0...15&amp;lt;/code&amp;gt;, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;), 32 64-Bit multi purpose or double precision registers for SIMD (&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;), and 16 128-Bit multi purpose registers for SIMD (&amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt;). All those registers are mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in RISC OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does RISC OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
[[File:edgedancer.png]]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
[[File:elsecaller.png]]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in RISC OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in RISC OS (&amp;amp;8000 is the general start address for executables in RISC OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on RISC OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
VFP and NEON are basically the FPU and the SIMD (single instruction, multiple data) extension of the ARM instruction set. VFP works with 32 single (32 Bit) and double precision (64 bit) floating point registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;,&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
NEON uses the same register set regarding &amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt; and adds the 128 Bit sized &amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt; registers. It can use and operate with multiple integer data types (8,16,32,64 Bit signed and unsigned) and single precision floating point (32 Bit) numbers. It's also possible to use an indexed register like d0[0], e.g. for multiplying multiple data in a register by a single scalar from another register (like &amp;lt;code&amp;gt;vmul.f32 q0,q1,d4[0]&amp;lt;/code&amp;gt; =&amp;gt; multiply each of the 4 single floats in q1 by single float d4[0] and place results in q0).&lt;br /&gt;
&lt;br /&gt;
Another feature is to have instructions that saturate the results, which is quite useful when working with colouring. So e.g. &amp;lt;code&amp;gt;vqadd.u8 q0,q1,q2&amp;lt;/code&amp;gt; would add 4 true colour RGB pixels (datatype &amp;lt;code&amp;gt;u8&amp;lt;/code&amp;gt; = unsgined 8 Bit integer) from q1 to the ones in q2. If an overflow occurres the value would be saturated to 255.&lt;br /&gt;
&lt;br /&gt;
The pure amount of available NEON instructions and their variations (saturating, narrowing, widening,...) is quite huge, make sure to check the links below to read up on that. Like explained before the register set of VFP and NEON and the registers sets themselves are mapped on each other. So modifying &amp;lt;code&amp;gt;s0&amp;lt;/code&amp;gt; would result in modifying the low 32 Bits of &amp;lt;code&amp;gt;d0&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;q0&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Before we can use the VFP/NEON unit within our RISC OS code we need to invest some bytes in requesting a so called &amp;lt;i&amp;gt;VFPContext&amp;lt;/i&amp;gt; for initialization. To do so with ARM code that would look like this.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;mov r0,#3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
mov r1,#32          //request full set of 32 VFP/NEON registers&lt;br /&gt;
mov r2,#0&lt;br /&gt;
swi VFPSupport_CreateContext&amp;lt;/syntaxhighlight&amp;gt;Further documentation on the SWI &amp;lt;i&amp;gt;VFPSupport_CreateContext&amp;lt;/i&amp;gt; can be found [https://www.riscosopen.org/wiki/documentation/show/VFPSupport_CreateContext here]&lt;br /&gt;
&lt;br /&gt;
The same in THUMB-2:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;movs r1,#32&lt;br /&gt;
lsls r0,r1,#26      //reuse r1&lt;br /&gt;
adds r0,r0,#3       //r0=3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
movs r2,#0&lt;br /&gt;
movw r10,#0x8ec1&lt;br /&gt;
movt r10,#0x5&lt;br /&gt;
swi OS_CallASWI     //needed due to swi number &amp;gt;0xff&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The major interest on using NEON is speed (for floats and integers) and working with floats in general, not so much size, as like shown above the setup consumes some bytes. If your code doesn't need floats and the speed is good enough there might not be much need for using NEON. You will find also a small amount of parallel arithmetic and saturating capable instructions for normal ARM integer code in the instruction set. But as shown in Kuemmels's [https://www.pouet.net/prod.php?which=85284 RISC OS 3dball]...that wouldn't be probably possible without NEON in that size/speed.&lt;br /&gt;
&lt;br /&gt;
[[File:3dball.png]]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on RISC OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The RISC OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Code Tricks ==&lt;br /&gt;
=== Cheap Absolute Value ===&lt;br /&gt;
Usually calculating ABS() would look like this in ARM (of course the cmp could be omitted if a preceding instruction triggered the flags already).&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0&lt;br /&gt;
rsblt r0,r0,#0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
and in Thumb&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0&lt;br /&gt;
it    lt&lt;br /&gt;
rsblt r0,r0,#0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
If your routine is okay with a small deviation (this variant will form a negative value using the formulae ABS(x) = -x - 1) this can be done by just one instruction (ARM). Hint taken from Coder's Revenge 06/96 disc mag.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;eor r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
In Thumb this would need 2 instructions so I skip that here.&lt;br /&gt;
=== Clamping === &lt;br /&gt;
Clamping towards zero or similar code cases can be optimized in the same way. Here are 3 examples also from Coder's Revenge 06/96:&lt;br /&gt;
&lt;br /&gt;
Example 1 - All values &amp;lt; 0 should be 0 &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0 // r0 ? 0&lt;br /&gt;
movlt r0,#0 // r0&amp;lt;0 =&amp;gt; r0=0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
can be done by&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;bic   r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt;  &lt;br /&gt;
Example 2 - All values &amp;gt;= 0 should be 0 &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0 // r0  ? 0&lt;br /&gt;
movge r0,#0 // r0&amp;gt;=0 =&amp;gt; r0=0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
can be done by&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;and   r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
Example 3 - All values &amp;lt;= -1 should be -1 &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmn   r0,#1 // r0 ? -1&lt;br /&gt;
mvnle r0,#0 // r0&amp;lt;=-1 =&amp;gt; r0=-1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
can be done by&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;orr   r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
All of those simplifications can be coded in Thumb, too, but would need also two instructions each and therefore end up with the same size, so I skip that here.&lt;br /&gt;
Clamping towards zero or similar code cases can be optimized in the same way. Here are 3 examples also from Coder's Revenge 06/96:&lt;br /&gt;
=== Limiting values to byte or halfword ===&lt;br /&gt;
It might be a trivia but could be overlooked. If you would want to limit a register to byte or halfword in thumb mode you would want to do  &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;and r0,r0,#0xff   // limit to byte&lt;br /&gt;
and r1,r1,#0xffff // limit to halfword (it's impossible to assemble this as 0xffff is out of range)&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
...but the first one assembles to 4 bytes and the later one isn't even possible to assemble (it's doable with the 4 byte instruction &amp;lt;code&amp;gt;bfc r1,16,16&amp;lt;/code&amp;gt; or two shift instructions). But we are lucky and the extending instructions come in handy to do the job with just 2 byte instructions each:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;uxtb r0,r0 // limit to byte&lt;br /&gt;
uxth r1,r1 // limit to halfword (it's impossible to assemble this as 0xffff is out of range)&amp;lt;/syntaxhighlight&amp;gt;  &lt;br /&gt;
&lt;br /&gt;
== Compression ==&lt;br /&gt;
&lt;br /&gt;
Due to the overhead of a decompression-routine, compressing your intro would start making sense from a level of may be 512 Byte and for sure when you aim coding an intro &amp;gt;=1 Kbyte.&lt;br /&gt;
&lt;br /&gt;
Luckily we already have a tool for that. That is the absolute and untyped file compressor called &amp;lt;b&amp;gt;Codepressor&amp;lt;/b&amp;gt; originally developed by Pervect/Topix, now updated and maintained by Phlamethrower. Check out this [http://www.phlamethrower.co.uk/riscos/cpress.php Link] to get the latest version. It contains different compression algorithms and will try all of them and choose the best finally.&lt;br /&gt;
&lt;br /&gt;
The usage is quite straight forward. Just let the filer see the application and type in the command line&lt;br /&gt;
&amp;lt;code&amp;gt;codepressr &amp;lt;filename_in&amp;gt; &amp;lt;filename_out&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
to compress your intro. As an example the intro [https://www.pouet.net/prod.php?which=85044 'blury'] was compressed from 966 to 832 Bytes.&lt;br /&gt;
&lt;br /&gt;
Codepressor also supports Thumb executables, both, if the thumb trigger is set within the code or as shown above set within the filesystem. Future versions might also have thumb decompression routines to make the files even shorter. At the time of writing the decompression routines are written in normal ARM code.&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for RISC OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0006/e/ Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0001/m ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0007/e Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Main_Page&amp;diff=730</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Main_Page&amp;diff=730"/>
				<updated>2020-09-17T08:17:46Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Welcome to SizeCoding.org! ==&lt;br /&gt;
&lt;br /&gt;
'''SizeCoding.org is a wiki dedicated to the art of creating very tiny programs for most popular types of CPUs. As sizecoding is also popular on other hardware we recently opened the website for other plaforms as well, check the links below.'''  &lt;br /&gt;
&lt;br /&gt;
By &amp;quot;very tiny programs&amp;quot;, we mean programs that are '''256 bytes or less in size''', typically created by members of the [https://en.wikipedia.org/wiki/Demoscene demoscene] as a show of programming skill.  The size of these tiny programs is measured by their total size in opcode bytes, and are usually presented as executable .COM files to be run in pure DOS, a DOS VM running inside another operating system, or an emulator that can run DOS such as DOSBox.  &lt;br /&gt;
&lt;br /&gt;
'''Despite their tiny size, these programs are able to produce amazing graphical displays, playable games, and sometimes music.'''  There are even some surprisingly effective programs in under '''16 bytes''', such as [https://trixter.oldskool.org/2012/12/17/maze-generation-in-thirteen-bytes/ a maze generator in 10 bytes], a [[Paint16b|mouse-driven paint program in 16 bytes]], and an [[M8trix_8b|&amp;quot;Enter the Matrix&amp;quot; screensaver]] in only 8 bytes. &lt;br /&gt;
&lt;br /&gt;
'''The intent of this wiki is to teach x86 assembler programmers the various techniques used to create tiny demoscene intros.'''  While these techniques can be used for other applications (boot sectors, BIOS and firmware code, etc.), the information presented here is firmly oriented towards the demoscene.  Practicality and common sense are sometimes thrown out the window just to shave a single byte -- consider yourself warned.&lt;br /&gt;
&lt;br /&gt;
This wiki is divided into the following major sections for x86:&lt;br /&gt;
&lt;br /&gt;
*'''[[Getting Started]]''' - What do you need to know before starting your first tinyprog?&lt;br /&gt;
&lt;br /&gt;
*'''[[Techniques|Tips, Tricks, and Techniques]]''' - The meat of the wiki.  Lists of small opcodes, default environment settings, size optimization basics, and a ton of random tips.&lt;br /&gt;
&lt;br /&gt;
*'''[[Case Studies]]''' - Analysis of existing award-winning tiny programs, with comments on what choices were made and why.&lt;br /&gt;
&lt;br /&gt;
*'''[[Resources|Additional Resources]]''' - Discussion threads, competitions, repositories, and other external references that can aid you in your quest.&lt;br /&gt;
&lt;br /&gt;
For other platforms check out the sections here:&lt;br /&gt;
&lt;br /&gt;
*'''[[Linux]]''' - Sizecoding on linux platforms.&lt;br /&gt;
*'''[[RISC OS on ARM based CPUs]]''' - Create your tiny intro on ARM hardware on RISC OS.&lt;br /&gt;
*'''[[ARM based CPUs]]''' - Other ARM Based Plaforms (Gameboy Advance, etc.)&lt;br /&gt;
*'''[[Motorola 68k based CPUS]]''' - Start Sizecoding on your Atari ST or Amiga.&lt;br /&gt;
*'''[[6502 based CPUs]]''' - Commodore 64, Atari XE/XL, Atari 2600, Atari Lynx, etc.&lt;br /&gt;
*'''[[Z80 based CPUs]]''' - For all your ZX Spectrum, Amstrad CPC, etc. sizecoding needs&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=RISC_OS_on_ARM_based_CPUs&amp;diff=729</id>
		<title>RISC OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=RISC_OS_on_ARM_based_CPUs&amp;diff=729"/>
				<updated>2020-09-17T08:17:05Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on RISC OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn RISC PC. All those home computers were run by RISC OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast RISC OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run RISC OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (&amp;lt;code&amp;gt;r0...15&amp;lt;/code&amp;gt;, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;), 32 64-Bit multi purpose or double precision registers for SIMD (&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;), and 16 128-Bit multi purpose registers for SIMD (&amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt;). All those registers are mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in RISC OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does RISC OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
[[File:edgedancer.png]]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
[[File:elsecaller.png]]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in RISC OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in RISC OS (&amp;amp;8000 is the general start address for executables in RISC OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on RISC OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
VFP and NEON are basically the FPU and the SIMD (single instruction, multiple data) extension of the ARM instruction set. VFP works with 32 single (32 Bit) and double precision (64 bit) floating point registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;,&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
NEON uses the same register set regarding &amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt; and adds the 128 Bit sized &amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt; registers. It can use and operate with multiple integer data types (8,16,32,64 Bit signed and unsigned) and single precision floating point (32 Bit) numbers. It's also possible to use an indexed register like d0[0], e.g. for multiplying multiple data in a register by a single scalar from another register (like &amp;lt;code&amp;gt;vmul.f32 q0,q1,d4[0]&amp;lt;/code&amp;gt; =&amp;gt; multiply each of the 4 single floats in q1 by single float d4[0] and place results in q0).&lt;br /&gt;
&lt;br /&gt;
Another feature is to have instructions that saturate the results, which is quite useful when working with colouring. So e.g. &amp;lt;code&amp;gt;vqadd.u8 q0,q1,q2&amp;lt;/code&amp;gt; would add 4 true colour RGB pixels (datatype &amp;lt;code&amp;gt;u8&amp;lt;/code&amp;gt; = unsgined 8 Bit integer) from q1 to the ones in q2. If an overflow occurres the value would be saturated to 255.&lt;br /&gt;
&lt;br /&gt;
The pure amount of available NEON instructions and their variations (saturating, narrowing, widening,...) is quite huge, make sure to check the links below to read up on that. Like explained before the register set of VFP and NEON and the registers sets themselves are mapped on each other. So modifying &amp;lt;code&amp;gt;s0&amp;lt;/code&amp;gt; would result in modifying the low 32 Bits of &amp;lt;code&amp;gt;d0&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;q0&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Before we can use the VFP/NEON unit within our RISC OS code we need to invest some bytes in requesting a so called &amp;lt;i&amp;gt;VFPContext&amp;lt;/i&amp;gt; for initialization. To do so with ARM code that would look like this.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;mov r0,#3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
mov r1,#32          //request full set of 32 VFP/NEON registers&lt;br /&gt;
mov r2,#0&lt;br /&gt;
swi VFPSupport_CreateContext&amp;lt;/syntaxhighlight&amp;gt;Further documentation on the SWI &amp;lt;i&amp;gt;VFPSupport_CreateContext&amp;lt;/i&amp;gt; can be found [https://www.riscosopen.org/wiki/documentation/show/VFPSupport_CreateContext here]&lt;br /&gt;
&lt;br /&gt;
The same in THUMB-2:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;movs r1,#32&lt;br /&gt;
lsls r0,r1,#26      //reuse r1&lt;br /&gt;
adds r0,r0,#3       //r0=3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
movs r2,#0&lt;br /&gt;
movw r10,#0x8ec1&lt;br /&gt;
movt r10,#0x5&lt;br /&gt;
swi OS_CallASWI     //needed due to swi number &amp;gt;0xff&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The major interest on using NEON is speed (for floats and integers) and working with floats in general, not so much size, as like shown above the setup consumes some bytes. If your code doesn't need floats and the speed is good enough there might not be much need for using NEON. You will find also a small amount of parallel arithmetic and saturating capable instructions for normal ARM integer code in the instruction set. But as shown in Kuemmels's [https://www.pouet.net/prod.php?which=85284 RISC OS 3dball]...that wouldn't be probably possible without NEON in that size/speed.&lt;br /&gt;
&lt;br /&gt;
[[File:3dball.png]]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on RISC OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The RISC OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Code Tricks ==&lt;br /&gt;
=== Cheap Absolute Value ===&lt;br /&gt;
Usually calculating ABS() would look like this in ARM (of course the cmp could be omitted if a preceding instruction triggered the flags already).&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0&lt;br /&gt;
rsblt r0,r0,#0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
and in Thumb&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0&lt;br /&gt;
it    lt&lt;br /&gt;
rsblt r0,r0,#0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
If your routine is okay with a small deviation (this variant will form a negative value using the formulae ABS(x) = -x - 1) this can be done by just one instruction (ARM). Hint taken from Coder's Revenge 06/96 disc mag.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;eor r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
In Thumb this would need 2 instructions so I skip that here.&lt;br /&gt;
=== Clamping === &lt;br /&gt;
Clamping towards zero or similar code cases can be optimized in the same way. Here are 3 examples also from Coder's Revenge 06/96:&lt;br /&gt;
&lt;br /&gt;
Example 1 - All values &amp;lt; 0 should be 0 &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0 // r0 ? 0&lt;br /&gt;
movlt r0,#0 // r0&amp;lt;0 =&amp;gt; r0=0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
can be done by&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;bic   r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt;  &lt;br /&gt;
Example 2 - All values &amp;gt;= 0 should be 0 &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0 // r0  ? 0&lt;br /&gt;
movge r0,#0 // r0&amp;gt;=0 =&amp;gt; r0=0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
can be done by&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;and   r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
Example 3 - All values &amp;lt;= -1 should be -1 &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmn   r0,#1 // r0 ? -1&lt;br /&gt;
mvnle r0,#0 // r0&amp;lt;=-1 =&amp;gt; r0=-1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
can be done by&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;orr   r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
All of those simplifications can be coded in Thumb, too, but would need also two instructions each and therefore end up with the same size, so I skip that here.&lt;br /&gt;
&lt;br /&gt;
== Compression ==&lt;br /&gt;
&lt;br /&gt;
Due to the overhead of a decompression-routine, compressing your intro would start making sense from a level of may be 512 Byte and for sure when you aim coding an intro &amp;gt;=1 Kbyte.&lt;br /&gt;
&lt;br /&gt;
Luckily we already have a tool for that. That is the absolute and untyped file compressor called &amp;lt;b&amp;gt;Codepressor&amp;lt;/b&amp;gt; originally developed by Pervect/Topix, now updated and maintained by Phlamethrower. Check out this [http://www.phlamethrower.co.uk/riscos/cpress.php Link] to get the latest version. It contains different compression algorithms and will try all of them and choose the best finally.&lt;br /&gt;
&lt;br /&gt;
The usage is quite straight forward. Just let the filer see the application and type in the command line&lt;br /&gt;
&amp;lt;code&amp;gt;codepressr &amp;lt;filename_in&amp;gt; &amp;lt;filename_out&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
to compress your intro. As an example the intro [https://www.pouet.net/prod.php?which=85044 'blury'] was compressed from 966 to 832 Bytes.&lt;br /&gt;
&lt;br /&gt;
Codepressor also supports Thumb executables, both, if the thumb trigger is set within the code or as shown above set within the filesystem. Future versions might also have thumb decompression routines to make the files even shorter. At the time of writing the decompression routines are written in normal ARM code.&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for RISC OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0006/e/ Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0001/m ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0007/e Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=RISC_OS_on_ARM_based_CPUs&amp;diff=693</id>
		<title>RISC OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=RISC_OS_on_ARM_based_CPUs&amp;diff=693"/>
				<updated>2020-07-14T09:15:21Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: Code Tricks added&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on RISC OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn RISC PC. All those home computers were run by RISC OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast RISC OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run RISC OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (&amp;lt;code&amp;gt;r0...15&amp;lt;/code&amp;gt;, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;), 32 64-Bit multi purpose or double precision registers for SIMD (&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;), and 16 128-Bit multi purpose registers for SIMD (&amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt;). All those registers are mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in RISC OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does RISC OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
[[File:edgedancer.png]]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
[[File:elsecaller.png]]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in RISC OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in RISC OS (&amp;amp;8000 is the general start address for executables in RISC OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on RISC OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
VFP and NEON are basically the FPU and the SIMD (single instruction, multiple data) extension of the ARM instruction set. VFP works with 32 single (32 Bit) and double precision (64 bit) floating point registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;,&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
NEON uses the same register set regarding &amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt; and adds the 128 Bit sized &amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt; registers. It can use and operate with multiple integer data types (8,16,32,64 Bit signed and unsigned) and single precision floating point (32 Bit) numbers. It's also possible to use an indexed register like d0[0], e.g. for multiplying multiple data in a register by a single scalar from another register (like &amp;lt;code&amp;gt;vmul.f32 q0,q1,d4[0]&amp;lt;/code&amp;gt; =&amp;gt; multiply each of the 4 single floats in q1 by single float d4[0] and place results in q0).&lt;br /&gt;
&lt;br /&gt;
Another feature is to have instructions that saturate the results, which is quite useful when working with colouring. So e.g. &amp;lt;code&amp;gt;vqadd.u8 q0,q1,q2&amp;lt;/code&amp;gt; would add 4 true colour RGB pixels (datatype &amp;lt;code&amp;gt;u8&amp;lt;/code&amp;gt; = unsgined 8 Bit integer) from q1 to the ones in q2. If an overflow occurres the value would be saturated to 255.&lt;br /&gt;
&lt;br /&gt;
The pure amount of available NEON instructions and their variations (saturating, narrowing, widening,...) is quite huge, make sure to check the links below to read up on that. Like explained before the register set of VFP and NEON and the registers sets themselves are mapped on each other. So modifying &amp;lt;code&amp;gt;s0&amp;lt;/code&amp;gt; would result in modifying the low 32 Bits of &amp;lt;code&amp;gt;d0&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;q0&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Before we can use the VFP/NEON unit within our RISC OS code we need to invest some bytes in requesting a so called &amp;lt;i&amp;gt;VFPContext&amp;lt;/i&amp;gt; for initialization. To do so with ARM code that would look like this.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;mov r0,#3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
mov r1,#32          //request full set of 32 VFP/NEON registers&lt;br /&gt;
mov r2,#0&lt;br /&gt;
swi VFPSupport_CreateContext&amp;lt;/syntaxhighlight&amp;gt;Further documentation on the SWI &amp;lt;i&amp;gt;VFPSupport_CreateContext&amp;lt;/i&amp;gt; can be found [https://www.riscosopen.org/wiki/documentation/show/VFPSupport_CreateContext here]&lt;br /&gt;
&lt;br /&gt;
The same in THUMB-2:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;movs r1,#32&lt;br /&gt;
lsls r0,r1,#26      //reuse r1&lt;br /&gt;
adds r0,r0,#3       //r0=3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
movs r2,#0&lt;br /&gt;
movw r10,#0x8ec1&lt;br /&gt;
movt r10,#0x5&lt;br /&gt;
swi OS_CallASWI     //needed due to swi number &amp;gt;0xff&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The major interest on using NEON is speed (for floats and integers) and working with floats in general, not so much size, as like shown above the setup consumes some bytes. If your code doesn't need floats and the speed is good enough there might not be much need for using NEON. You will find also a small amount of parallel arithmetic and saturating capable instructions for normal ARM integer code in the instruction set. But as shown in Kuemmels's [https://www.pouet.net/prod.php?which=85284 RISC OS 3dball]...that wouldn't be probably possible without NEON in that size/speed.&lt;br /&gt;
&lt;br /&gt;
[[File:3dball.png]]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on RISC OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The RISC OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Code Tricks ==&lt;br /&gt;
=== Cheap Absolute Value ===&lt;br /&gt;
Usually calculating ABS() would look like this in ARM (of course the cmp could be omitted if a preceding instruction triggered the flags already).&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0&lt;br /&gt;
rsblt r0,r0,#0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
and in Thumb&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0&lt;br /&gt;
it    lt&lt;br /&gt;
rsblt r0,r0,#0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
If your routine is okay with a small deviation (this variant will form a negative value using the formulae ABS(x) = -x - 1) this can be done by just one instruction (ARM). Hint taken from Coder's Revenge 06/96 disc mag.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;eor r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
In Thumb this would need 2 instructions so I skip that here.&lt;br /&gt;
=== Clamping === &lt;br /&gt;
Clamping towards zero or similar code cases can be optimized in the same way. Here are 3 examples also from Coder's Revenge 06/96:&lt;br /&gt;
&lt;br /&gt;
Example 1 - All values &amp;lt; 0 should be 0 &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0 // r0 ? 0&lt;br /&gt;
movlt r0,#0 // r0&amp;lt;0 =&amp;gt; r0=0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
can be done by&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;bic   r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt;  &lt;br /&gt;
Example 2 - All values &amp;gt;= 0 should be 0 &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,#0 // r0  ? 0&lt;br /&gt;
movge r0,#0 // r0&amp;gt;=0 =&amp;gt; r0=0&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
can be done by&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;and   r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
Example 3 - All values &amp;lt;= -1 should be -1 &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmn   r0,#1 // r0 ? -1&lt;br /&gt;
mvnle r0,#0 // r0&amp;lt;=-1 =&amp;gt; r0=-1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
can be done by&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;orr   r0,r0,r0,asr#31&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
All of those simplifications can be coded in Thumb, too, but would need also two instructions each and therefore end up with the same size, so I skip that here.&lt;br /&gt;
&lt;br /&gt;
== Compression ==&lt;br /&gt;
&lt;br /&gt;
Due to the overhead of a decompression-routine, compressing your intro would start making sense from a level of may be 512 Byte and for sure when you aim coding an intro &amp;gt;=1 Kbyte.&lt;br /&gt;
&lt;br /&gt;
Luckily we already have a tool for that. That is the absolute and untyped file compressor called &amp;lt;b&amp;gt;Codepressor&amp;lt;/b&amp;gt; originally developed by Pervect/Topix, now updated and maintained by Phlamethrower. Check out this [http://www.phlamethrower.co.uk/riscos/cpress.php Link] to get the latest version. It contains different compression algorithms and will try all of them and choose the best finally.&lt;br /&gt;
&lt;br /&gt;
The usage is quite straight forward. Just let the filer see the application and type in the command line&lt;br /&gt;
&amp;lt;code&amp;gt;codepressr &amp;lt;filename_in&amp;gt; &amp;lt;filename_out&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
to compress your intro. As an example the intro [https://www.pouet.net/prod.php?which=85044 'blury'] was compressed from 966 to 832 Bytes.&lt;br /&gt;
&lt;br /&gt;
Codepressor also supports Thumb executables, both, if the thumb trigger is set within the code or as shown above set within the filesystem. Future versions might also have thumb decompression routines to make the files even shorter. At the time of writing the decompression routines are written in normal ARM code.&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for RISC OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0006/e/ Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0001/m ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0007/e Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=RISC_OS_on_ARM_based_CPUs&amp;diff=686</id>
		<title>RISC OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=RISC_OS_on_ARM_based_CPUs&amp;diff=686"/>
				<updated>2020-07-12T10:19:41Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on RISC OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn RISC PC. All those home computers were run by RISC OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast RISC OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run RISC OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (&amp;lt;code&amp;gt;r0...15&amp;lt;/code&amp;gt;, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;), 32 64-Bit multi purpose or double precision registers for SIMD (&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;), and 16 128-Bit multi purpose registers for SIMD (&amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt;). All those registers are mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in RISC OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does RISC OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
[[File:edgedancer.png]]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
[[File:elsecaller.png]]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in RISC OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in RISC OS (&amp;amp;8000 is the general start address for executables in RISC OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on RISC OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
VFP and NEON are basically the FPU and the SIMD (single instruction, multiple data) extension of the ARM instruction set. VFP works with 32 single (32 Bit) and double precision (64 bit) floating point registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;,&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
NEON uses the same register set regarding &amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt; and adds the 128 Bit sized &amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt; registers. It can use and operate with multiple integer data types (8,16,32,64 Bit signed and unsigned) and single precision floating point (32 Bit) numbers. It's also possible to use an indexed register like d0[0], e.g. for multiplying multiple data in a register by a single scalar from another register (like &amp;lt;code&amp;gt;vmul.f32 q0,q1,d4[0]&amp;lt;/code&amp;gt; =&amp;gt; multiply each of the 4 single floats in q1 by single float d4[0] and place results in q0).&lt;br /&gt;
&lt;br /&gt;
Another feature is to have instructions that saturate the results, which is quite useful when working with colouring. So e.g. &amp;lt;code&amp;gt;vqadd.u8 q0,q1,q2&amp;lt;/code&amp;gt; would add 4 true colour RGB pixels (datatype &amp;lt;code&amp;gt;u8&amp;lt;/code&amp;gt; = unsgined 8 Bit integer) from q1 to the ones in q2. If an overflow occurres the value would be saturated to 255.&lt;br /&gt;
&lt;br /&gt;
The pure amount of available NEON instructions and their variations (saturating, narrowing, widening,...) is quite huge, make sure to check the links below to read up on that. Like explained before the register set of VFP and NEON and the registers sets themselves are mapped on each other. So modifying &amp;lt;code&amp;gt;s0&amp;lt;/code&amp;gt; would result in modifying the low 32 Bits of &amp;lt;code&amp;gt;d0&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;q0&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Before we can use the VFP/NEON unit within our RISC OS code we need to invest some bytes in requesting a so called &amp;lt;i&amp;gt;VFPContext&amp;lt;/i&amp;gt; for initialization. To do so with ARM code that would look like this.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;mov r0,#3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
mov r1,#32          //request full set of 32 VFP/NEON registers&lt;br /&gt;
mov r2,#0&lt;br /&gt;
swi VFPSupport_CreateContext&amp;lt;/syntaxhighlight&amp;gt;Further documentation on the SWI &amp;lt;i&amp;gt;VFPSupport_CreateContext&amp;lt;/i&amp;gt; can be found [https://www.riscosopen.org/wiki/documentation/show/VFPSupport_CreateContext here]&lt;br /&gt;
&lt;br /&gt;
The same in THUMB-2:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;movs r1,#32&lt;br /&gt;
lsls r0,r1,#26      //reuse r1&lt;br /&gt;
adds r0,r0,#3       //r0=3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
movs r2,#0&lt;br /&gt;
movw r10,#0x8ec1&lt;br /&gt;
movt r10,#0x5&lt;br /&gt;
swi OS_CallASWI     //needed due to swi number &amp;gt;0xff&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The major interest on using NEON is speed (for floats and integers) and working with floats in general, not so much size, as like shown above the setup consumes some bytes. If your code doesn't need floats and the speed is good enough there might not be much need for using NEON. You will find also a small amount of parallel arithmetic and saturating capable instructions for normal ARM integer code in the instruction set. But as shown in Kuemmels's [https://www.pouet.net/prod.php?which=85284 RISC OS 3dball]...that wouldn't be probably possible without NEON in that size/speed.&lt;br /&gt;
&lt;br /&gt;
[[File:3dball.png]]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on RISC OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The RISC OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Compression ==&lt;br /&gt;
&lt;br /&gt;
Due to the overhead of a decompression-routine, compressing your intro would start making sense from a level of may be 512 Byte and for sure when you aim coding an intro &amp;gt;=1 Kbyte.&lt;br /&gt;
&lt;br /&gt;
Luckily we already have a tool for that. That is the absolute and untyped file compressor called &amp;lt;b&amp;gt;Codepressor&amp;lt;/b&amp;gt; originally developed by Pervect/Topix, now updated and maintained by Phlamethrower. Check out this [http://www.phlamethrower.co.uk/riscos/cpress.php Link] to get the latest version. It contains different compression algorithms and will try all of them and choose the best finally.&lt;br /&gt;
&lt;br /&gt;
The usage is quite straight forward. Just let the filer see the application and type in the command line&lt;br /&gt;
&amp;lt;code&amp;gt;codepressr &amp;lt;filename_in&amp;gt; &amp;lt;filename_out&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
to compress your intro. As an example the intro [https://www.pouet.net/prod.php?which=85044 'blury'] was compressed from 966 to 832 Bytes.&lt;br /&gt;
&lt;br /&gt;
Codepressor also supports Thumb executables, both, if the thumb trigger is set within the code or as shown above set within the filesystem. Future versions might also have thumb decompression routines to make the files even shorter. At the time of writing the decompression routines are written in normal ARM code.&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for RISC OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0006/e/ Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0001/m ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0007/e Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=RISC_OS_on_ARM_based_CPUs&amp;diff=685</id>
		<title>RISC OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=RISC_OS_on_ARM_based_CPUs&amp;diff=685"/>
				<updated>2020-07-11T08:55:26Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: typo&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on RISC OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn RISC PC. All those home computers were run by RISC OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast RISC OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run RISC OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (&amp;lt;code&amp;gt;r0...15&amp;lt;/code&amp;gt;, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;), 32 64-Bit multi purpose or double precision registers for SIMD (&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;), and 16 128-Bit multi purpose registers for SIMD (&amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt;). All those registers are mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in RISC OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does RISC OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
[[File:edgedancer.png]]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
[[File:elsecaller.png]]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in RISC OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in RISC OS (&amp;amp;8000 is the general start address for executables in RISC OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on RISC OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
VFP and NEON are basically the FPU and the SIMD (single instruction, multiple data) extension of the ARM instruction set. VFP works with 32 single (32 Bit) and double precision (64 bit) floating point registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;,&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
NEON uses the same register set regarding &amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt; and adds the 128 Bit sized &amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt; registers. It can use and operate with multiple integer data types (8,16,32,64 Bit signed and unsigned) and single precision floating point (32 Bit) numbers. It's also possible to use an indexed register like d0[0], e.g. for multiplying multiple data in a register by a single scalar from another register (like &amp;lt;code&amp;gt;vmul.f32 q0,q1,d4[0]&amp;lt;/code&amp;gt; =&amp;gt; multiply each of the 4 single floats in q1 by single float d4[0] and place results in q0).&lt;br /&gt;
&lt;br /&gt;
Another feature is to have instructions that saturate the results, which is quite useful when working with colouring. So e.g. &amp;lt;code&amp;gt;vqadd.u8 q0,q1,q2&amp;lt;/code&amp;gt; would add 4 true colour RGB pixels (datatype &amp;lt;code&amp;gt;u8&amp;lt;/code&amp;gt; = unsgined 8 Bit integer) from q1 to the ones in q2. If an overflow occurres the value would be saturated to 255.&lt;br /&gt;
&lt;br /&gt;
The pure amount of available NEON instructions and their variations (saturating, narrowing, widening,...) is quite huge, make sure to check the links below to read up on that. Like explained before the register set of VFP and NEON and the registers sets themselves are mapped on each other. So modifying &amp;lt;code&amp;gt;s0&amp;lt;/code&amp;gt; would result in modifying the low 32 Bits of &amp;lt;code&amp;gt;d0&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;q0&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Before we can use the VFP/NEON unit within our RISC OS code we need to invest some bytes in requesting a so called &amp;lt;i&amp;gt;VFPContext&amp;lt;/i&amp;gt; for initialization. To do so with ARM code that would look like this.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;mov r0,#3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
mov r1,#32          //request full set of 32 VFP/NEON registers&lt;br /&gt;
mov r2,#0&lt;br /&gt;
swi VFPSupport_CreateContext&amp;lt;/syntaxhighlight&amp;gt;Further documentation on the SWI &amp;lt;i&amp;gt;VFPSupport_CreateContext&amp;lt;/i&amp;gt; can be found [https://www.riscosopen.org/wiki/documentation/show/VFPSupport_CreateContext here]&lt;br /&gt;
&lt;br /&gt;
The same in THUMB-2:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;movs r1,#32&lt;br /&gt;
lsls r0,r1,#26      //reuse r1&lt;br /&gt;
adds r0,r0,#3       //r0=3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
movs r2,#0&lt;br /&gt;
movw r10,#0x8ec1&lt;br /&gt;
movt r10,#0x5&lt;br /&gt;
swi OS_CallASWI     //needed due to swi number &amp;gt;0xff&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The major interest on using NEON is speed (for floats and integers) and working with floats in general, not so much size, as like shown above the setup consumes some bytes. If your code doesn't need floats and the speed is good enough there might not be much need for using NEON. You will find also a small amount of parallel arithmetic and saturating capable instructions for normal ARM integer code in the instruction set. But as shown in Kuemmels's [https://www.pouet.net/prod.php?which=85284 RISC OS 3dball]...that wouldn't be probably possible without NEON in that size/speed.&lt;br /&gt;
&lt;br /&gt;
[[File:3dball.png]]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on RISC OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The RISC OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Compression ==&lt;br /&gt;
&lt;br /&gt;
Due to the overhead of a decompression-routine, compressing your intro would start making sense from a level of may be 512 Byte and for sure when you aim coding an intro &amp;gt;=1 Kbyte.&lt;br /&gt;
&lt;br /&gt;
Luckily we already have a tool for that. That is the absolute and untyped file compressor called &amp;lt;b&amp;gt;Codepressor&amp;lt;/b&amp;gt; originally developed by Pervect/Topix, now updated and maintained by Phlamethrower. Check out this [http://www.phlamethrower.co.uk/riscos/cpress.php Link] to get the latest version. It contains different compression algorithms and will try all of them and choose the best finally.&lt;br /&gt;
&lt;br /&gt;
The usage is quite straight forward. Just let the filer see the application and type in the command line&lt;br /&gt;
&amp;lt;code&amp;gt;codepressr &amp;lt;filename_in&amp;gt; &amp;lt;filename_out&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
to compress your intro. As an example the intro [https://www.pouet.net/prod.php?which=85044 'blury'] was compressed from 966 to 832 Bytes.&lt;br /&gt;
&lt;br /&gt;
At the time of writing it doesn't work with Thumb code when the standard trick that is mention above is used to specify the execution address of the intro in the filing system. So for now you would have to skip that and invest 4 bytes at the start of your intro to make it compressible with Codepressor:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;adr pc,thumb+1&lt;br /&gt;
.thumb&lt;br /&gt;
thumb:&amp;lt;/syntaxhighlight&amp;gt;     &lt;br /&gt;
This will trigger Thumb mode manually within your code. &amp;lt;code&amp;gt;adr pc,thumb+1&amp;lt;/code&amp;gt; is a pseudo-instruction that tries to produce a single add or sub  instruction to load the address, in that case here it'll be assembled to &amp;lt;code&amp;gt;sub pc,pc,#3&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for RISC OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0006/e/ Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0001/m ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0007/e Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Main_Page&amp;diff=684</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Main_Page&amp;diff=684"/>
				<updated>2020-07-11T08:54:24Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Welcome to SizeCoding.org! ==&lt;br /&gt;
&lt;br /&gt;
'''SizeCoding.org is a wiki dedicated to the art of creating very tiny programs for most popular types of CPUs. As sizecoding is also popular on other hardware we recently opened the website for other plaforms as well, check the links below.'''  &lt;br /&gt;
&lt;br /&gt;
By &amp;quot;very tiny programs&amp;quot;, we mean programs that are '''256 bytes or less in size''', typically created by members of the [https://en.wikipedia.org/wiki/Demoscene demoscene] as a show of programming skill.  The size of these tiny programs is measured by their total size in opcode bytes, and are usually presented as executable .COM files to be run in pure DOS, a DOS VM running inside another operating system, or an emulator that can run DOS such as DOSBox.  &lt;br /&gt;
&lt;br /&gt;
'''Despite their tiny size, these programs are able to produce amazing graphical displays, playable games, and sometimes music.'''  There are even some surprisingly effective programs in under '''16 bytes''', such as [https://trixter.oldskool.org/2012/12/17/maze-generation-in-thirteen-bytes/ a maze generator in 10 bytes], a [[Paint16b|mouse-driven paint program in 16 bytes]], and an [[M8trix_8b|&amp;quot;Enter the Matrix&amp;quot; screensaver]] in only 8 bytes. &lt;br /&gt;
&lt;br /&gt;
'''The intent of this wiki is to teach x86 assembler programmers the various techniques used to create tiny demoscene intros.'''  While these techniques can be used for other applications (boot sectors, BIOS and firmware code, etc.), the information presented here is firmly oriented towards the demoscene.  Practicality and common sense are sometimes thrown out the window just to shave a single byte -- consider yourself warned.&lt;br /&gt;
&lt;br /&gt;
This wiki is divided into the following major sections for x86:&lt;br /&gt;
&lt;br /&gt;
*'''[[Getting Started]]''' - What do you need to know before starting your first tinyprog?&lt;br /&gt;
&lt;br /&gt;
*'''[[Techniques|Tips, Tricks, and Techniques]]''' - The meat of the wiki.  Lists of small opcodes, default environment settings, size optimization basics, and a ton of random tips.&lt;br /&gt;
&lt;br /&gt;
*'''[[Case Studies]]''' - Analysis of existing award-winning tiny programs, with comments on what choices were made and why.&lt;br /&gt;
&lt;br /&gt;
*'''[[Resources|Additional Resources]]''' - Discussion threads, competitions, repositories, and other external references that can aid you in your quest.&lt;br /&gt;
&lt;br /&gt;
For other platforms check out the sections here:&lt;br /&gt;
&lt;br /&gt;
*'''[[RISC OS on ARM based CPUs]]''' - Create your tiny intro on ARM hardware on Risc OS.&lt;br /&gt;
*'''[[Motorola 68k based CPUS]]''' - Start Sizecoding on your Atari ST or Amiga.&lt;br /&gt;
*'''[[6502 based CPUs]]''' - Commodore 64, Atari XE/XL, Atari Lynx, etc.&lt;br /&gt;
*'''[[Z80 based CPUs]]''' - For all your ZX Spectrum, CPC, Gameboy, etc. sizecoding needs&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=683</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=683"/>
				<updated>2020-07-11T08:53:00Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on RISC OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn RISC PC. All those home computers were run by RISC OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast RISC OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run RISC OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (&amp;lt;code&amp;gt;r0...15&amp;lt;/code&amp;gt;, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;), 32 64-Bit multi purpose or double precision registers for SIMD (&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;), and 16 128-Bit multi purpose registers for SIMD (&amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt;). All those registers are mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in RISC OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does RISC OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
[[File:edgedancer.png]]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
[[File:elsecaller.png]]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in RISC OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in RISC OS (&amp;amp;8000 is the general start address for executables in RISC OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on RISC OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
VFP and NEON are basically the FPU and the SIMD (single instruction, multiple data) extension of the ARM instruction set. VFP works with 32 single (32 Bit) and double precision (64 bit) floating point registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;,&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
NEON uses the same register set regarding &amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt; and adds the 128 Bit sized &amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt; registers. It can use and operate with multiple integer data types (8,16,32,64 Bit signed and unsigned) and single precision floating point (32 Bit) numbers. It's also possible to use an indexed register like d0[0], e.g. for multiplying multiple data in a register by a single scalar from another register (like &amp;lt;code&amp;gt;vmul.f32 q0,q1,d4[0]&amp;lt;/code&amp;gt; =&amp;gt; multiply each of the 4 single floats in q1 by single float d4[0] and place results in q0).&lt;br /&gt;
&lt;br /&gt;
Another feature is to have instructions that saturate the results, which is quite useful when working with colouring. So e.g. &amp;lt;code&amp;gt;vqadd.u8 q0,q1,q2&amp;lt;/code&amp;gt; would add 4 true colour RGB pixels (datatype &amp;lt;code&amp;gt;u8&amp;lt;/code&amp;gt; = unsgined 8 Bit integer) from q1 to the ones in q2. If an overflow occurres the value would be saturated to 255.&lt;br /&gt;
&lt;br /&gt;
The pure amount of available NEON instructions and their variations (saturating, narrowing, widening,...) is quite huge, make sure to check the links below to read up on that. Like explained before the register set of VFP and NEON and the registers sets themselves are mapped on each other. So modifying &amp;lt;code&amp;gt;s0&amp;lt;/code&amp;gt; would result in modifying the low 32 Bits of &amp;lt;code&amp;gt;d0&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;q0&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Before we can use the VFP/NEON unit within our RISC OS code we need to invest some bytes in requesting a so called &amp;lt;i&amp;gt;VFPContext&amp;lt;/i&amp;gt; for initialization. To do so with ARM code that would look like this.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;mov r0,#3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
mov r1,#32          //request full set of 32 VFP/NEON registers&lt;br /&gt;
mov r2,#0&lt;br /&gt;
swi VFPSupport_CreateContext&amp;lt;/syntaxhighlight&amp;gt;Further documentation on the SWI &amp;lt;i&amp;gt;VFPSupport_CreateContext&amp;lt;/i&amp;gt; can be found [https://www.riscosopen.org/wiki/documentation/show/VFPSupport_CreateContext here]&lt;br /&gt;
&lt;br /&gt;
The same in THUMB-2:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;movs r1,#32&lt;br /&gt;
lsls r0,r1,#26      //reuse r1&lt;br /&gt;
adds r0,r0,#3       //r0=3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
movs r2,#0&lt;br /&gt;
movw r10,#0x8ec1&lt;br /&gt;
movt r10,#0x5&lt;br /&gt;
swi OS_CallASWI     //needed due to swi number &amp;gt;0xff&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The major interest on using NEON is speed (for floats and integers) and working with floats in general, not so much size, as like shown above the setup consumes some bytes. If your code doesn't need floats and the speed is good enough there might not be much need for using NEON. You will find also a small amount of parallel arithmetic and saturating capable instructions for normal ARM integer code in the instruction set. But as shown in Kuemmels's [https://www.pouet.net/prod.php?which=85284 RISC OS 3dball]...that wouldn't be probably possible without NEON in that size/speed.&lt;br /&gt;
&lt;br /&gt;
[[File:3dball.png]]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on RISC OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The RISC OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Compression ==&lt;br /&gt;
&lt;br /&gt;
Due to the overhead of a decompression-routine, compressing your intro would start making sense from a level of may be 512 Byte and for sure when you aim coding an intro &amp;gt;=1 Kbyte.&lt;br /&gt;
&lt;br /&gt;
Luckily we already have a tool for that. That is the absolute and untyped file compressor called &amp;lt;b&amp;gt;Codepressor&amp;lt;/b&amp;gt; originally developed by Pervect/Topix, now updated and maintained by Phlamethrower. Check out this [http://www.phlamethrower.co.uk/riscos/cpress.php Link] to get the latest version. It contains different compression algorithms and will try all of them and choose the best finally.&lt;br /&gt;
&lt;br /&gt;
The usage is quite straight forward. Just let the filer see the application and type in the command line&lt;br /&gt;
&amp;lt;code&amp;gt;codepressr &amp;lt;filename_in&amp;gt; &amp;lt;filename_out&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
to compress your intro. As an example the intro [https://www.pouet.net/prod.php?which=85044 'blury'] was compressed from 966 to 832 Bytes.&lt;br /&gt;
&lt;br /&gt;
At the time of writing it doesn't work with Thumb code when the standard trick that is mention above is used to specify the execution address of the intro in the filing system. So for now you would have to skip that and invest 4 bytes at the start of your intro to make it compressible with Codepressor:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;adr pc,thumb+1&lt;br /&gt;
.thumb&lt;br /&gt;
thumb:&amp;lt;/syntaxhighlight&amp;gt;     &lt;br /&gt;
This will trigger Thumb mode manually within your code. &amp;lt;code&amp;gt;adr pc,thumb+1&amp;lt;/code&amp;gt; is a pseudo-instruction that tries to produce a single add or sub  instruction to load the address, in that case here it'll be assembled to &amp;lt;code&amp;gt;sub pc,pc,#3&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for RISC OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0006/e/ Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0001/m ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0007/e Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Main_Page&amp;diff=682</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Main_Page&amp;diff=682"/>
				<updated>2020-07-11T08:49:42Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Welcome to SizeCoding.org! ==&lt;br /&gt;
&lt;br /&gt;
'''SizeCoding.org is a wiki dedicated to the art of creating very tiny programs for most popular types of CPUs. As sizecoding is also popular on other hardware we recently opened the website for other plaforms as well, check the links below.'''  &lt;br /&gt;
&lt;br /&gt;
By &amp;quot;very tiny programs&amp;quot;, we mean programs that are '''256 bytes or less in size''', typically created by members of the [https://en.wikipedia.org/wiki/Demoscene demoscene] as a show of programming skill.  The size of these tiny programs is measured by their total size in opcode bytes, and are usually presented as executable .COM files to be run in pure DOS, a DOS VM running inside another operating system, or an emulator that can run DOS such as DOSBox.  &lt;br /&gt;
&lt;br /&gt;
'''Despite their tiny size, these programs are able to produce amazing graphical displays, playable games, and sometimes music.'''  There are even some surprisingly effective programs in under '''16 bytes''', such as [https://trixter.oldskool.org/2012/12/17/maze-generation-in-thirteen-bytes/ a maze generator in 10 bytes], a [[Paint16b|mouse-driven paint program in 16 bytes]], and an [[M8trix_8b|&amp;quot;Enter the Matrix&amp;quot; screensaver]] in only 8 bytes. &lt;br /&gt;
&lt;br /&gt;
'''The intent of this wiki is to teach x86 assembler programmers the various techniques used to create tiny demoscene intros.'''  While these techniques can be used for other applications (boot sectors, BIOS and firmware code, etc.), the information presented here is firmly oriented towards the demoscene.  Practicality and common sense are sometimes thrown out the window just to shave a single byte -- consider yourself warned.&lt;br /&gt;
&lt;br /&gt;
This wiki is divided into the following major sections for x86:&lt;br /&gt;
&lt;br /&gt;
*'''[[Getting Started]]''' - What do you need to know before starting your first tinyprog?&lt;br /&gt;
&lt;br /&gt;
*'''[[Techniques|Tips, Tricks, and Techniques]]''' - The meat of the wiki.  Lists of small opcodes, default environment settings, size optimization basics, and a ton of random tips.&lt;br /&gt;
&lt;br /&gt;
*'''[[Case Studies]]''' - Analysis of existing award-winning tiny programs, with comments on what choices were made and why.&lt;br /&gt;
&lt;br /&gt;
*'''[[Resources|Additional Resources]]''' - Discussion threads, competitions, repositories, and other external references that can aid you in your quest.&lt;br /&gt;
&lt;br /&gt;
For other platforms check out the sections here:&lt;br /&gt;
&lt;br /&gt;
*'''[[Risc OS on ARM based CPUs]]''' - Create your tiny intro on ARM hardware on Risc OS.&lt;br /&gt;
*'''[[Motorola 68k based CPUS]]''' - Start Sizecoding on your Atari ST or Amiga.&lt;br /&gt;
*'''[[6502 based CPUs]]''' - Commodore 64, Atari XE/XL, Atari Lynx, etc.&lt;br /&gt;
*'''[[Z80 based CPUs]]''' - For all your ZX Spectrum, CPC, Gameboy, etc. sizecoding needs&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Main_Page&amp;diff=681</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Main_Page&amp;diff=681"/>
				<updated>2020-07-11T08:48:42Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Welcome to SizeCoding.org! ==&lt;br /&gt;
&lt;br /&gt;
'''SizeCoding.org is a wiki dedicated to the art of creating very tiny programs for most popular types of CPUs. As sizecoding is also popular on other hardware we recently opened the website for other plaforms as well, check the links below.'''  &lt;br /&gt;
&lt;br /&gt;
By &amp;quot;very tiny programs&amp;quot;, we mean programs that are '''256 bytes or less in size''', typically created by members of the [https://en.wikipedia.org/wiki/Demoscene demoscene] as a show of programming skill.  The size of these tiny programs is measured by their total size in opcode bytes, and are usually presented as executable .COM files to be run in pure DOS, a DOS VM running inside another operating system, or an emulator that can run DOS such as DOSBox.  &lt;br /&gt;
&lt;br /&gt;
'''Despite their tiny size, these programs are able to produce amazing graphical displays, playable games, and sometimes music.'''  There are even some surprisingly effective programs in under '''16 bytes''', such as [https://trixter.oldskool.org/2012/12/17/maze-generation-in-thirteen-bytes/ a maze generator in 10 bytes], a [[Paint16b|mouse-driven paint program in 16 bytes]], and an [[M8trix_8b|&amp;quot;Enter the Matrix&amp;quot; screensaver]] in only 8 bytes. &lt;br /&gt;
&lt;br /&gt;
'''The intent of this wiki is to teach x86 assembler programmers the various techniques used to create tiny demoscene intros.'''  While these techniques can be used for other applications (boot sectors, BIOS and firmware code, etc.), the information presented here is firmly oriented towards the demoscene.  Practicality and common sense are sometimes thrown out the window just to shave a single byte -- consider yourself warned.&lt;br /&gt;
&lt;br /&gt;
This wiki is divided into the following major sections for x86:&lt;br /&gt;
&lt;br /&gt;
*'''[[Getting Started]]''' - What do you need to know before starting your first tinyprog?&lt;br /&gt;
&lt;br /&gt;
*'''[[Techniques|Tips, Tricks, and Techniques]]''' - The meat of the wiki.  Lists of small opcodes, default environment settings, size optimization basics, and a ton of random tips.&lt;br /&gt;
&lt;br /&gt;
*'''[[Case Studies]]''' - Analysis of existing award-winning tiny programs, with comments on what choices were made and why.&lt;br /&gt;
&lt;br /&gt;
*'''[[Resources|Additional Resources]]''' - Discussion threads, competitions, repositories, and other external references that can aid you in your quest.&lt;br /&gt;
&lt;br /&gt;
For other platforms check out the sections here:&lt;br /&gt;
&lt;br /&gt;
*'''[[RISC OS on ARM based CPUs]]''' - Create your tiny intro on ARM hardware on Risc OS.&lt;br /&gt;
*'''[[Motorola 68k based CPUS]]''' - Start Sizecoding on your Atari ST or Amiga.&lt;br /&gt;
*'''[[6502 based CPUs]]''' - Commodore 64, Atari XE/XL, Atari Lynx, etc.&lt;br /&gt;
*'''[[Z80 based CPUs]]''' - For all your ZX Spectrum, CPC, Gameboy, etc. sizecoding needs&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Floating-point_Opcodes&amp;diff=678</id>
		<title>Floating-point Opcodes</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Floating-point_Opcodes&amp;diff=678"/>
				<updated>2020-07-09T11:51:50Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: FPU trick added&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The [https://en.wikipedia.org/wiki/Floating-point_unit FPU] offers a lot of complex and powerful floating-point operations, like &amp;lt;code&amp;gt;SIN&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;COS&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;ATAN&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;SQRT&amp;lt;/code&amp;gt;, etc. ([http://www.website.masmforum.com/tutorials/fptute/appen1.htm SIMPLY FPU] by Raymond Filiatreault has a compact overview of all FPU commands for the curious.)  Use of the FPU can drastically increase what you can do in a tiny production while simultaneously keeping code size down.  All x86 CPUs can use the FPU, although older CPUs (80486sx and earlier) need the FPU physically installed in the FPU socket.&lt;br /&gt;
&lt;br /&gt;
This guide assumes intermediate to advanced-level proficiency with sizecoding.&lt;br /&gt;
&lt;br /&gt;
== FPU Basics ==&lt;br /&gt;
&lt;br /&gt;
Usage and communication with the FPU is quite uncommon and takes a bit to get used to, so we'll start with a simple example.  This is what [[Output#Basic_Animation_and_user_interaction|our code example]] looks like originally:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;cwd             	; &amp;quot;clear&amp;quot; DX for perfect alignment&lt;br /&gt;
mov 	al,0x13&lt;br /&gt;
X: 		int 0x10	; set video mode AND draw pixel&lt;br /&gt;
mov 	ax,cx		; get column in AH&lt;br /&gt;
add		ax,di		; offset by framecounter	          &amp;lt;-- REPLACE THIS WITH FPU CODE&lt;br /&gt;
xor 	al,ah		; the famous XOR pattern&lt;br /&gt;
and 	al,32+8		; a more interesting variation of it&lt;br /&gt;
mov 	ah,0x0C		; set subfunction &amp;quot;set pixel&amp;quot; for int 0x10&lt;br /&gt;
loop 	X			; loop 65536 times&lt;br /&gt;
inc 	di			; increment framecounter&lt;br /&gt;
in 		al,0x60		; check keyboard ...&lt;br /&gt;
dec 	al			; ... for ESC&lt;br /&gt;
jnz 	X			; rinse and repeat&lt;br /&gt;
ret					; quit program&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and this is how the code looks if we replace the instruction with FPU code :&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;cwd             	; &amp;quot;clear&amp;quot; DX for perfect alignment&lt;br /&gt;
mov 	al,0x13&lt;br /&gt;
X: 		int 0x10	; set video mode AND draw pixel&lt;br /&gt;
mov 	ax,cx		; get &amp;quot;column&amp;quot; in AX&lt;br /&gt;
&lt;br /&gt;
fninit				; init FPU first&lt;br /&gt;
mov		[si],ax		; write first addend to a memory location&lt;br /&gt;
fild	word [si]	; F(pu) I(nteger) L(oad)D a WORD from memory location to the FPU stack&lt;br /&gt;
mov		[si],di		; write second addend to a memory location&lt;br /&gt;
fiadd	word [si]	; Directly add the word in the memory location to the top FPU stack&lt;br /&gt;
fist	word [si]	; F(pu) I(nteger) ST(ore) the result into a memory location&lt;br /&gt;
mov		ax,[si]		; Get the word from the memory location into AX&lt;br /&gt;
&lt;br /&gt;
xor 	al,ah		; the famous XOR pattern&lt;br /&gt;
and 	al,32+8		; a more interesting variation of it&lt;br /&gt;
mov 	ah,0x0C		; set subfunction &amp;quot;set pixel&amp;quot; for int 0x10&lt;br /&gt;
loop 	X			; loop 65536 times&lt;br /&gt;
inc 	di			; increment framecounter&lt;br /&gt;
in 		al,0x60		; check keyboard ...&lt;br /&gt;
dec 	al			; ... for ESC&lt;br /&gt;
jnz 	X			; rinse and repeat&lt;br /&gt;
ret					; quit program&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(Obviously that made the program bigger instead of smaller, but the point of this exercise is to illustrate how to work with the FPU.)  Looking at our changes, you get a sense of what usual interaction with the FPU is:&lt;br /&gt;
# &amp;lt;code&amp;gt;F(N)INIT&amp;lt;/code&amp;gt;: Initialize the FPU&lt;br /&gt;
# transfer values from CPU registers to memory location(s)&lt;br /&gt;
# transfer values from memory location(s) onto FPU stack&lt;br /&gt;
# do the actual calculations using the FPU (more on this soon)&lt;br /&gt;
# transfer result from the FPU stack into memory location(s)&lt;br /&gt;
# transfer result from memory location(s) back into registers&lt;br /&gt;
&lt;br /&gt;
That is a lot of extra code for a single integer addition, but that's a simple example; ''once more complex floating point operations are involved, it starts to pay off.''&lt;br /&gt;
&lt;br /&gt;
== Optimizing with the FPU ==&lt;br /&gt;
&lt;br /&gt;
=== Distance function example ===&lt;br /&gt;
&lt;br /&gt;
For more advanced FPU operation, let's start from scratch with an unoptimized program which plots the distance of each pixel to the screen center as color, in 49 bytes.&lt;br /&gt;
&lt;br /&gt;
[[File:Distance to center example.png|thumb|Distance to center]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;push 	0a000h			&lt;br /&gt;
pop 	es				; get start of video memory in ES&lt;br /&gt;
mov 	al,0x13			; switch to video mode 13h&lt;br /&gt;
int 	0x10			; 320 * 200 in 256 colors&lt;br /&gt;
fninit					; -	&lt;br /&gt;
						; it's useful to comment what's on the&lt;br /&gt;
						; stack after each FPU operation&lt;br /&gt;
						; to not get lost ;) start is : empty (-)&lt;br /&gt;
X:&lt;br /&gt;
xor 	dx,dx			; reset the high word before division&lt;br /&gt;
mov 	bx,320			; 320 columns&lt;br /&gt;
mov 	ax,di			; get screen pointer in AX&lt;br /&gt;
div 	bx				; construct X,Y from screen pointer into AX,DX&lt;br /&gt;
sub 	ax,100			; subtract the origin&lt;br /&gt;
sub 	dx,160			; = (160,100) ... center of 320x200 screen	&lt;br /&gt;
mov 	[si],ax			; move X into a memory location&lt;br /&gt;
fild 	word [si]		; X&lt;br /&gt;
fmul 	st0				; X²&lt;br /&gt;
mov 	[si],dx			; move Y into a memory location&lt;br /&gt;
fild 	word [si]		; Y X²&lt;br /&gt;
fmul 	st0				; Y² X²&lt;br /&gt;
fadd 	st0,st1			; Y²+X²&lt;br /&gt;
fsqrt					; R&lt;br /&gt;
fistp 	word [si]		; -&lt;br /&gt;
mov 	ax,[si]			; get the result from memory&lt;br /&gt;
stosb					; write to screen (DI) and increment DI&lt;br /&gt;
jmp short X				; next pixel&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A few words on this : &lt;br /&gt;
* The FPU registers (st0, st1, ...) are organized as a stack. When you load something to the FPU, everything else will be moved one location further away from the top (implicitly!) Some FPU instructions work only on the top, other allow the explicit parametrization with arbitrary FPU registers.&lt;br /&gt;
* Depending on what you do, sometimes &amp;lt;code&amp;gt;F(N)INIT&amp;lt;/code&amp;gt; can be omitted. Real hardware will refuse to work more often than emulators, but it's always worth the try.&lt;br /&gt;
* Accessing memory (size) efficiently can be a real pain. The safest way is to reference absolute memory locations (f.e &amp;lt;code&amp;gt;[1234]&amp;lt;/code&amp;gt;) but that's two bytes more per instruction than referencing memory with &amp;lt;code&amp;gt;[BX]&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;[SI]&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;[BX+SI]&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;[BP+DI]&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;[BP+SI]&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;[DI]&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;[BX+DI]&amp;lt;/code&amp;gt;. When working with FPU and this ''classic'' approach of FPU communication, you have to design your codeflow to have one or some of these locations available.&lt;br /&gt;
* Accessing the memory is always with regard to the segment register &amp;lt;code&amp;gt;DS&amp;lt;/code&amp;gt; unless you perform segment overrides. When accessing memory with &amp;lt;code&amp;gt;[BP+??]&amp;lt;/code&amp;gt; be aware that the memory is accessed with regard to the segment register &amp;lt;code&amp;gt;SS&amp;lt;/code&amp;gt; (see [https://courses.engr.illinois.edu/ece390/books/artofasm/CH04/CH04-2.html#HEADING2-35 Register Indirect Addressing Modes])&lt;br /&gt;
* There are a few conventions which help you identify FPU commands. &amp;quot;i&amp;quot; stands for integer (WORD or DWORD), &amp;quot;p&amp;quot; means &amp;quot;pop stack afterwards&amp;quot;, so &amp;lt;code&amp;gt;FST&amp;lt;/code&amp;gt; means just &amp;quot;store&amp;quot; while &amp;lt;code&amp;gt;FISTP&amp;lt;/code&amp;gt; means &amp;quot;store as integer, then pop the stack&amp;quot;&lt;br /&gt;
&lt;br /&gt;
=== Stack addressing, &amp;quot;Rrrolas Trick&amp;quot;, alignment optimization, Aspect Ratio ===&lt;br /&gt;
&lt;br /&gt;
Now let's unleash the state of the art sizecoding arsenal onto this, to bring it down to 37 bytes (40 bytes with aspect correction)&lt;br /&gt;
&lt;br /&gt;
[[File:Distant to center optimized.png|thumb|Distance to center : optimized]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;push 	0a000h - 70		; modified to center to 160,100&lt;br /&gt;
aas						; aspect ratio constant part&lt;br /&gt;
pop 	es				; get start of video memory in ES&lt;br /&gt;
mov 	al,0x13			; switch to video mode 13h&lt;br /&gt;
int 	0x10			; 320 * 200 in 256 colors&lt;br /&gt;
X:&lt;br /&gt;
mov 	ax,0xCCCD		; perform the famous...&lt;br /&gt;
mul		di				; ... Rrrola trick =)&lt;br /&gt;
sub 	dh,[si]			; align vertically&lt;br /&gt;
pusha 					; push all registers on stack&lt;br /&gt;
fild 	word [bx-8]		; X&lt;br /&gt;
fmul 	st0				; X²&lt;br /&gt;
fild 	word [bx-9]		; Y X²&lt;br /&gt;
fmul	dword [bx+si]	; aspect ratio correction&lt;br /&gt;
fmul 	st0				; Y² X²&lt;br /&gt;
fadd 	st0,st1			; Y²+X²&lt;br /&gt;
fsqrt					; R&lt;br /&gt;
fistp 	dword [bx-5]	; -&lt;br /&gt;
popa					; pop all registers from stack&lt;br /&gt;
stosb					; write to screen (DI) and increment DI&lt;br /&gt;
jmp short X				; next pixel&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The resulting image is almost identical to to the former. Let's go through this step by step:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;push 	0a000h - 70&amp;lt;/code&amp;gt;&lt;br /&gt;
Instead of aligning horizontally with &amp;lt;code&amp;gt;sub	dx,160&amp;lt;/code&amp;gt; we can code this implicitly by moving our segment register ten units - that is 10 * 16 = 160 pixels - to the left (see [https://en.wikipedia.org/wiki/X86_memory_segmentation#Real_mode Real Mode Addressing]). With further multiple subtraction of 20 units - that is 320 pixels, we can shift the visible screen towards the top, to finetune vertical alignment. As long as this shift is no more than 4 lines ''( 65536 / 320 - 200 = 4,8 )'' there is no further visual impact.&lt;br /&gt;
* &amp;lt;code&amp;gt;aas&amp;lt;/code&amp;gt;&lt;br /&gt;
This is the high byte of a constant, placed in a way that &amp;lt;code&amp;gt;[SI]&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;[BX+SI]&amp;lt;/code&amp;gt; resolves to ~1.24 when read as 32bit float. The last byte of segment &amp;lt;code&amp;gt;ES&amp;lt;/code&amp;gt; is also of importance. Check yourself with the [http://www.h-schmidt.net/FloatConverter/IEEE754.html IEEE 754 Converter]&lt;br /&gt;
* &amp;lt;code&amp;gt;mov ax,0xCCCD&amp;lt;/code&amp;gt; &amp;amp; &amp;lt;code&amp;gt;mul di&amp;lt;/code&amp;gt; (The &amp;quot;Rrrola trick&amp;quot;)&lt;br /&gt;
Instead of constructing X and Y from the screen pointer &amp;lt;code&amp;gt;DI&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;DIV&amp;lt;/code&amp;gt; you can get a decent estimation with multiplying the screen pointer with &amp;lt;code&amp;gt;0xCCCD&amp;lt;/code&amp;gt; and read X and Y from the 8bit registers &amp;lt;code&amp;gt;DH&amp;lt;/code&amp;gt; (+DL as 16bit value) and &amp;lt;code&amp;gt;DL&amp;lt;/code&amp;gt; (+AH as 16bit value). The idea is to interpret &amp;lt;code&amp;gt;DI&amp;lt;/code&amp;gt; as a kind of 16 bit float in the range &amp;lt;code&amp;gt;[0,1]&amp;lt;/code&amp;gt;, from start to end. Multiplying this number in [0,1] with 65536 / 320 = 204,8 results in the row before the comma, and again as a kind of a float, the column after the comma. The representation &amp;lt;code&amp;gt;0xCCCD&amp;lt;/code&amp;gt; is the nearest rounding of 204,8 * 256 ( = 52428,8 ~ 52429 = 0xCCCD). As long as the 16 bit representations are used, there is no precision loss.&lt;br /&gt;
* &amp;lt;code&amp;gt;sub dh,[si]&amp;lt;/code&amp;gt;&lt;br /&gt;
The instruction at &amp;lt;code&amp;gt;[SI]&amp;lt;/code&amp;gt; is &amp;lt;code&amp;gt;push &amp;lt;word&amp;gt;&amp;lt;/code&amp;gt; and has the opcode &amp;lt;code&amp;gt;0x68&amp;lt;/code&amp;gt; which is 104 in decimal. Combined with the fine tuned vertical alignment above ( ~4 lines) this results in (virtually) subtracting 100 for perfect vertical alignment. This is one byte shorter than &amp;lt;code&amp;gt;sub dh,100&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;pusha / popa&amp;lt;/code&amp;gt;&lt;br /&gt;
Instead of going the ''classical'' way of communicating with the FPU, we push all the registers, read/write values with memory addressing to/from the FPU, then pop all registers again. This works when &amp;lt;code&amp;gt;DS&amp;lt;/code&amp;gt; = &amp;lt;code&amp;gt;SS&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;SP&amp;lt;/code&amp;gt; is &amp;quot;close enough&amp;quot; to &amp;lt;code&amp;gt;BX&amp;lt;/code&amp;gt; (initially zero and kept that way) to allow &amp;lt;code&amp;gt;[BX+&amp;lt;signed byte&amp;gt;]&amp;lt;/code&amp;gt; addressing. It comes with the special benefit of implicit 8bit shifts. One serious drawback is loss of precision, since the registers &amp;lt;code&amp;gt;DL&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;AH&amp;lt;/code&amp;gt; &amp;quot;lose connection&amp;quot; when using &amp;lt;code&amp;gt;PUSHA&amp;lt;/code&amp;gt; (see the order of registers : [http://x86.renejeschke.de/html/file_module_x86_id_270.html PUSHA/PUSHAD documentation]&lt;br /&gt;
*&amp;lt;code&amp;gt;fild word [bx+&amp;lt;signed byte&amp;gt;]&amp;lt;/code&amp;gt; &amp;amp; *&amp;lt;code&amp;gt;fistp dword [bx+&amp;lt;signed byte&amp;gt;]&amp;lt;/code&amp;gt;&lt;br /&gt;
This is the so called &amp;quot;stack addressing&amp;quot;. We assume that &amp;lt;code&amp;gt;BX=0&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;SP=0xFFFE&amp;lt;/code&amp;gt; at start, so we know where the registers are in memory after &amp;lt;code&amp;gt;pusha&amp;lt;/code&amp;gt; (AX at [BX-4], CX at [BX-6] etc.). It's important to realize that we work with signed 16 bit values now, in the full range of [-32768,32767]. That is also why we need &amp;lt;code&amp;gt;DWORD&amp;lt;/code&amp;gt; when storing the result : &amp;lt;code&amp;gt;sqrt(x²+y²)&amp;lt;/code&amp;gt; exceeds the signed 16bit range for quite some value pairs. Note that there are already implicit 8bit shifts (bx-9,bx-5)&lt;br /&gt;
*&amp;lt;code&amp;gt;fmul dword [bx+si]&amp;lt;/code&amp;gt;&lt;br /&gt;
With the &amp;quot;Rrrola&amp;quot; trick above, we have the row number to be 204 at maximum, but also the column can't be greater than 256. This results in a wrong aspect ratio, but it can almost completely be fixed with this two byte instruction (+ one byte for the &amp;lt;code&amp;gt;AAS&amp;lt;/code&amp;gt; instruction) : 256 * 1,24 = 317,44 which is quite close to 320. If aspect ratio is of no meaning to the desired effect, this three bytes can be shaved off. &lt;br /&gt;
&lt;br /&gt;
=== The &amp;quot;Tunnel&amp;quot; ===&lt;br /&gt;
&lt;br /&gt;
Now let's add some features:&lt;br /&gt;
* extract angle as opposed to the distance and combine both&lt;br /&gt;
* reverse divide the distance to create the &amp;quot;tunnel&amp;quot; effect&lt;br /&gt;
* animate with smooth steps along the distance&lt;br /&gt;
* improve on the colors with subselecting from the standard palette&lt;br /&gt;
* quit the program on ESC&lt;br /&gt;
&lt;br /&gt;
This results in the following program with a size of 63 bytes :&lt;br /&gt;
&lt;br /&gt;
[[File:Tunnel effect neontube.png|thumb|tunnel effect neontube]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;push 	0xa000 - 10 - 3 * 20	; video base - 3.5 lines&lt;br /&gt;
or 		al, 0x13				; mode 13h = 320 x 200 in 256 colors&lt;br /&gt;
pop 	es						; get aligned video memory base&lt;br /&gt;
int 	0x10					; switch videomode&lt;br /&gt;
X: &lt;br /&gt;
sub		dh, [si]				; vertical alignment&lt;br /&gt;
pusha							; push all registers on stack&lt;br /&gt;
fild 	word	[bx-9]			; fpustack :  x&lt;br /&gt;
fild 	word	[bx-8]			; fpustack :  y  x&lt;br /&gt;
fpatan							; fpustack :  arc&lt;br /&gt;
fst 	st1						; fpustack :  arc  arc&lt;br /&gt;
fcos							; fpustack :  cos(arc)  arc&lt;br /&gt;
fimul	dword	[si]			; fpustack :  l*cos(arc)  arc&lt;br /&gt;
fidiv	word	[bx-8]			; fpustack :  l*cos(arc)/x  arc&lt;br /&gt;
fiadd	word	[bp+si]			; fpustack :  l*cos(arc)/x+offset  arc&lt;br /&gt;
fistp	dword	[bx-7]			; fpustack :  arc&lt;br /&gt;
fimul	word	[byte si+val]	; fpustack :  scaled_arc&lt;br /&gt;
fistp	word	[bx-5]			; fpustack :  -&lt;br /&gt;
popa							; pop all registers from stack&lt;br /&gt;
xor 	al, cl					; XOR scaled_arc with distance&lt;br /&gt;
and 	al, 16 + 8 + 2			; sub selecting palette part&lt;br /&gt;
stosb							; writing to screen&lt;br /&gt;
mov 	ax, 0xCCCD				; Performing the famous&lt;br /&gt;
mul 	di						; Rrrola trick&lt;br /&gt;
jo 		X						; next frame check&lt;br /&gt;
add 	word [bp+si], byte 23	; change offset smoothly&lt;br /&gt;
in 		al, 0x60				; check for ...&lt;br /&gt;
dec 	ax						; ...ESC key&lt;br /&gt;
jnz 	X						; otherwise continue&lt;br /&gt;
ret								; quit program&lt;br /&gt;
val:	dw 6519 				; n = 160 * 256 / pi / 2 ; 0x1977&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Many other tiny tunnel effects have been coded, so it is highly recommended to check out the documented source code of [http://web.archive.org/web/20050216133244/http://www.farb-rausch.de/ryg/tunnel.asm &amp;quot;Constant Evolution&amp;quot; by ryg/Farbrausch] and the [http://www.pouet.net/prod.php?which=29412 &amp;quot;Heart shaped tunnel&amp;quot; from Lord Kelvin], both with a size of 64 bytes. While &amp;quot;Constant Evolution&amp;quot; takes a slightly different route than the example here (''classic'' FPU communication, ''classic'' X Y construction, sqrt(x²+y²) instead of using &amp;lt;code&amp;gt;fcos&amp;lt;/code&amp;gt;+&amp;lt;code&amp;gt;fimul&amp;lt;/code&amp;gt;), &amp;quot;Heart shaped tunnel&amp;quot; uses no FPU at all. &lt;br /&gt;
&lt;br /&gt;
The takeaways from this example are:&lt;br /&gt;
&lt;br /&gt;
* Loading a constant from the code with some degrees of freedom&lt;br /&gt;
&amp;lt;code&amp;gt;fimul dword [si]&amp;lt;/code&amp;gt; multiplies with a 32 bit integer dividend for the tunnel effect. The highest byte of this constant points to our code, to the opcode from &amp;lt;code&amp;gt;or al, 0x13&amp;lt;/code&amp;gt;. This instruction puts &amp;lt;code&amp;gt;0x13&amp;lt;/code&amp;gt; into &amp;lt;code&amp;gt;al&amp;lt;/code&amp;gt;, and since there a lot of possibilities to achieve this, there is a direct way of changing the appearance of the tunnel with changing this instruction to one of the following : &amp;lt;code&amp;gt;mov al,0x13&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;add al,0x13&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;xor al,0x13&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;adc al,0x13&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sbb al,0xED&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;sub al,0xED&amp;lt;/code&amp;gt;. In this special case, the instruction can also be swapped with &amp;lt;code&amp;gt;pop es&amp;lt;/code&amp;gt; to gain a further degree of freedom.&lt;br /&gt;
&lt;br /&gt;
* normal loading of a constant which can't be reused as opcode&lt;br /&gt;
Although it's the ultimate goal to not even use a single extra byte for constants, sometimes the required sequence simply does not appear in the code. In this case, a constant is needed to convert the angle from the range [-pi,pi] to the color space in a way that no gaps appear while stepping from 359° to 0°. In the last line at &amp;lt;code&amp;gt;val:&amp;lt;/code&amp;gt; the value &amp;lt;code&amp;gt;160&amp;lt;/code&amp;gt; in the comment is 32 * 5 where 5 is number of &amp;quot;spiral arms&amp;quot; the tunnel effect has. The 8bit shift (*256) is to increase precision. It turns out that 16bit precision is enough to get a decent &amp;quot;gap closer&amp;quot; for values obtained by &amp;lt;code&amp;gt;fpatan&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
* operating directly on an indirect memory location without offset&lt;br /&gt;
The location &amp;lt;code&amp;gt;[bp+si]&amp;lt;/code&amp;gt; is used as animation variable, while both participatory registers are kept fixed (the value is &amp;lt;code&amp;gt;0x0A??&amp;lt;/code&amp;gt; and therefor way above our code). Since we work with 16bit values and the top 8bit are the measurement in pixels, the instruction &amp;lt;code&amp;gt;add word [bp+si], byte 23&amp;lt;/code&amp;gt; allows for sub pixel precision in animation, while occupying 3 bytes of space. Depending on the target hardware, this value 23 can be increased/decreased to achieve faster/smoother animation.&lt;br /&gt;
&lt;br /&gt;
* Optimizing the check for the next frame&lt;br /&gt;
Normally, there is a check like &amp;lt;code&amp;gt;test di,di&amp;lt;/code&amp;gt; with direct consecutive branch necessary. The used approach allows for direct branching after &amp;lt;code&amp;gt;mul di&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;jo&amp;lt;/code&amp;gt;, since the overflow flag is always but twice triggered for a frame. This saves two bytes, but also requires adjustment of the animation constant, because the animation constant is also added twice. A further benefit is that in one of these two cases, &amp;lt;code&amp;gt;AX&amp;lt;/code&amp;gt; is zero which save a further byte on the following ESC check (&amp;lt;code&amp;gt;dec ax&amp;lt;/code&amp;gt; instead of &amp;lt;code&amp;gt;dec al&amp;lt;/code&amp;gt; )&lt;br /&gt;
&lt;br /&gt;
=== Size optimizing the &amp;quot;Tunnel&amp;quot; ===&lt;br /&gt;
&lt;br /&gt;
Now if we abandon all the comfort, alignment, smoothness and convenience, and optimize this straight for size, we end up with a 52 byte version. This does not include the possible exclusion of color tuning (2 bytes), after all the effect is supposed to look at least somewhat appealing ;)&lt;br /&gt;
&lt;br /&gt;
[[File:Neontube 52b optimized.png|thumb|Neontube 52b optimized]]&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;mov		al,0x13				; mode 13h = 320 x 200 in 256 colors&lt;br /&gt;
int		0x10				; switch videomode&lt;br /&gt;
X: &lt;br /&gt;
or		al, [bp+si]			; *illusion* - executed ONCE&lt;br /&gt;
xor		al, 0x68			; *illusion* - executed ONCE&lt;br /&gt;
mov		dx, 0x79F			; *illusion* - executed ONCE&lt;br /&gt;
pusha						; push all registers on stack&lt;br /&gt;
fild 	word	[bx-9]		; x&lt;br /&gt;
fild 	word	[bx-8]		; y x&lt;br /&gt;
fpatan						; arc&lt;br /&gt;
fst 	st1					; arc arc&lt;br /&gt;
fcos						; cos(arc) arc&lt;br /&gt;
fimul	dword	[si]		; l*cos(arc) arc&lt;br /&gt;
fidiv	word	[bx-8]		; l*cos(arc)/x arc&lt;br /&gt;
fistp	dword	[bx-4]		; arc&lt;br /&gt;
fimul	word	[bx]		; scaled_arc&lt;br /&gt;
fistp	word	[bx-5]		; -&lt;br /&gt;
popa						; pop all registers from stack&lt;br /&gt;
sub		ah, [bp+si]			; animation along distance&lt;br /&gt;
xor		al, ah				; XOR scaled_arc with distance&lt;br /&gt;
and		al, 16 + 8 + 4		; sub palette selection&lt;br /&gt;
stosb						; write to screen, advance DI&lt;br /&gt;
mov		ax, 0xCCCD			; the famous&lt;br /&gt;
mul		di					; Rrrola trick&lt;br /&gt;
jmp 	short X-1			; *ODD* jump into &amp;quot;int 0x10&amp;quot;&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since this code contains an &amp;quot;odd jump&amp;quot; into the middle of the instruction &amp;lt;code&amp;gt;int 0x10&amp;lt;/code&amp;gt; it helps to disassemble the code from address &amp;lt;code&amp;gt;0x103&amp;lt;/code&amp;gt; on, until both code pathes realign at &amp;lt;code&amp;gt;pusha&amp;lt;/code&amp;gt; :&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;adc 	[bp+si],cl		; decrement framecounter&lt;br /&gt;
add 	dh,[si]			; vertical alignment&lt;br /&gt;
push 	word 0x9FBA		; video base - 3.5 lines&lt;br /&gt;
pop 	es				; get aligned video memory&lt;br /&gt;
						; continues with &amp;quot;pusha&amp;quot;&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Like before the &amp;lt;code&amp;gt;mul di&amp;lt;/code&amp;gt; instruction triggers the overflow flag - and the carry flag - always but twice per frame. &amp;lt;code&amp;gt;CL&amp;lt;/code&amp;gt; remains &amp;lt;code&amp;gt;0xFF&amp;lt;/code&amp;gt; unchanged from start, so &amp;lt;code&amp;gt;adc [bp+si],cl&amp;lt;/code&amp;gt; effectively decrements the framecounter twice per frame. &lt;br /&gt;
&lt;br /&gt;
The dividend for scaling the arc is now taken from the location &amp;lt;code&amp;gt;[bx]&amp;lt;/code&amp;gt;, which is &amp;lt;code&amp;gt;0x20CD&amp;lt;/code&amp;gt;. Calculating the number of &amp;quot;spirals&amp;quot; backwards from this number (*pi*2/256/32) gives about 6.44 which is close enough to 6.5, so that the visual gap in the top is almost not recognizable - besides from the miscoloring which can be fixed by removing the 16 from &amp;lt;code&amp;gt;and al, 16 + 8 + 4&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
It's noteworthy that the location &amp;lt;code&amp;gt;[si]&amp;lt;/code&amp;gt; does not contain a good offset anymore, since &amp;lt;code&amp;gt;push &amp;lt;word&amp;gt;&amp;lt;/code&amp;gt; moved away from the top, to help create *illusion* code that not only realigns soon in the second code path but also does not modify registers and memory environment in a hindering way (only &amp;lt;code&amp;gt;AL&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;DX&amp;lt;/code&amp;gt; are modified)&lt;br /&gt;
&lt;br /&gt;
The takeaway here is to study the modbytes of instructions that are very likely to appear in tiny intros, as well as using their constants as code. &amp;lt;code&amp;gt;int 0x10&amp;lt;/code&amp;gt; will almost always be in your code, so there is always &amp;lt;code&amp;gt;adc byte[??], 8bit_reg&amp;lt;/code&amp;gt; too in between.&lt;br /&gt;
&lt;br /&gt;
Finally, you might have noticed that these tunnels don't &amp;quot;spin&amp;quot;. For the cost of two bytes this can be easily added. &amp;lt;code&amp;gt;add		al, [bp+si]&amp;lt;/code&amp;gt; before applying XOR in the last example would offset the &amp;lt;code&amp;gt;arc&amp;lt;/code&amp;gt; with the framecounter and produce a spinning effect. For a coherent visual experience the sub palette selection must be changed to &amp;lt;code&amp;gt;and al, 8 + 4&amp;lt;/code&amp;gt;, too.&lt;br /&gt;
&lt;br /&gt;
== FPU Tricks ==&lt;br /&gt;
&lt;br /&gt;
=== Comparing 2 float numbers on the FPU ===&lt;br /&gt;
&lt;br /&gt;
Sometimes when doing fpu algebra, you need to compare 2 float numbers using the FPU (for example when calculating an intersection between a point and an object). &lt;br /&gt;
&lt;br /&gt;
To do so will take a few steps:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
fcomp      ; compare the contents of ST0 and ST1&lt;br /&gt;
fstsw ax   ; copy the fpu flags to ax (ah)&lt;br /&gt;
sahf       ; copy the contens of AH register to the CPU-flags&lt;br /&gt;
jb inside  ; you can now check the cpu flags and jump accordingly&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If a Pentium Pro instruction set is available or emulated (Real DOS setups or Dosbox-X), you can also use the FCOMI instruction instead to do the same thing, which saves a few bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
fcomip&lt;br /&gt;
jb inside &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Truncate float numbers / get fractional part ===&lt;br /&gt;
May be you would want to have the truncated number of your floating point value for some purpose or you would want the fractional part. In that case an instruction comes in handy that was added quite late to the FPU. It came with the SSE3 insutrction set and is called &amp;lt;code&amp;gt;fisttp&amp;lt;/code&amp;gt; (Store Integer with Truncation). &lt;br /&gt;
&lt;br /&gt;
To get the fractional part you can do it like this:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
fild st0              ;duplicate your number =&amp;gt; st0=f, st1=f &lt;br /&gt;
fisttp dword[si]      ;INT(f) - store truncated value somewhere in memory&lt;br /&gt;
fild dword[si]        ;load truncated value&lt;br /&gt;
fsubp st1,st0         ;calculate f-INT(f) &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
Generally it's shorter and faster than fiddling arount with rounding mode and &amp;lt;code&amp;gt;frndint&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
=== Simple Floor Casting ===&lt;br /&gt;
Another classic effect that works well with the FPU is a simple Perspective Floor Caster. &lt;br /&gt;
While this effect can also be achieved with just CPU code, using the FPU gives a bit finer control over things like camera height and other things.&lt;br /&gt;
&lt;br /&gt;
Here is the basic code for the effect as used in e.g. [https://www.pouet.net/prod.php?which=85681 Rush, by: Marquee Design]. &lt;br /&gt;
&lt;br /&gt;
[[File:Fpu_floor.png|thumb]]&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
fninit&lt;br /&gt;
fild word [bx-8]	; load y-value &lt;br /&gt;
fidiv word [viewheight] ; scalingvalue/height to plane&lt;br /&gt;
fabs 			; abs(y)&lt;br /&gt;
fild word [bx-9]        ; x abs(y)&lt;br /&gt;
fdiv st1                ; x/abs(y) abs(y)&lt;br /&gt;
fistp word [bx-4]       ; store U (plane x) to ax&lt;br /&gt;
fidivr word [floorval]  ; 32767/abs(y) &lt;br /&gt;
fist word [bx-6]        ; store V (plane y) to cx		&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following constants are used in the above calculation:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;nasm&amp;quot;&amp;gt;&lt;br /&gt;
floorval dw 32767&lt;br /&gt;
viewheight dw 10 &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Needless to say, the code above is simplified and unoptimised or educational purposes. &lt;br /&gt;
It can be optimised further using various tricks mentioned above and integrated easily into already existing FPU calculations for example to save space.&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=672</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=672"/>
				<updated>2020-07-07T08:59:10Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: Section on compression added&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (&amp;lt;code&amp;gt;r0...15&amp;lt;/code&amp;gt;, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;), 32 64-Bit multi purpose or double precision registers for SIMD (&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;), and 16 128-Bit multi purpose registers for SIMD (&amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt;). All those registers are mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
[[File:edgedancer.png]]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
[[File:elsecaller.png]]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
VFP and NEON are basically the FPU and the SIMD (single instruction, multiple data) extension of the ARM instruction set. VFP works with 32 single (32 Bit) and double precision (64 bit) floating point registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;,&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
NEON uses the same register set regarding &amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt; and adds the 128 Bit sized &amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt; registers. It can use and operate with multiple integer data types (8,16,32,64 Bit signed and unsigned) and single precision floating point (32 Bit) numbers. It's also possible to use an indexed register like d0[0], e.g. for multiplying multiple data in a register by a single scalar from another register (like &amp;lt;code&amp;gt;vmul.f32 q0,q1,d4[0]&amp;lt;/code&amp;gt; =&amp;gt; multiply each of the 4 single floats in q1 by single float d4[0] and place results in q0).&lt;br /&gt;
&lt;br /&gt;
Another feature is to have instructions that saturate the results, which is quite useful when working with colouring. So e.g. &amp;lt;code&amp;gt;vqadd.u8 q0,q1,q2&amp;lt;/code&amp;gt; would add 4 true colour RGB pixels (datatype &amp;lt;code&amp;gt;u8&amp;lt;/code&amp;gt; = unsgined 8 Bit integer) from q1 to the ones in q2. If an overflow occurres the value would be saturated to 255.&lt;br /&gt;
&lt;br /&gt;
The pure amount of available NEON instructions and their variations (saturating, narrowing, widening,...) is quite huge, make sure to check the links below to read up on that. Like explained before the register set of VFP and NEON and the registers sets themselves are mapped on each other. So modifying &amp;lt;code&amp;gt;s0&amp;lt;/code&amp;gt; would result in modifying the low 32 Bits of &amp;lt;code&amp;gt;d0&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;q0&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Before we can use the VFP/NEON unit within our Risc OS code we need to invest some bytes in requesting a so called &amp;lt;i&amp;gt;VFPContext&amp;lt;/i&amp;gt; for initialization. To do so with ARM code that would look like this.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;mov r0,#3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
mov r1,#32          //request full set of 32 VFP/NEON registers&lt;br /&gt;
mov r2,#0&lt;br /&gt;
swi VFPSupport_CreateContext&amp;lt;/syntaxhighlight&amp;gt;Further documentation on the SWI &amp;lt;i&amp;gt;VFPSupport_CreateContext&amp;lt;/i&amp;gt; can be found [https://www.riscosopen.org/wiki/documentation/show/VFPSupport_CreateContext here]&lt;br /&gt;
&lt;br /&gt;
The same in THUMB-2:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;movs r1,#32&lt;br /&gt;
lsls r0,r1,#26      //reuse r1&lt;br /&gt;
adds r0,r0,#3       //r0=3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
movs r2,#0&lt;br /&gt;
movw r10,#0x8ec1&lt;br /&gt;
movt r10,#0x5&lt;br /&gt;
swi OS_CallASWI     //needed due to swi number &amp;gt;0xff&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The major interest on using NEON is speed (for floats and integers) and working with floats in general, not so much size, as like shown above the setup consumes some bytes. If your code doesn't need floats and the speed is good enough there might not be much need for using NEON. You will find also a small amount of parallel arithmetic and saturating capable instructions for normal ARM integer code in the instruction set. But as shown in Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]...that wouldn't be probably possible without NEON in that size/speed.&lt;br /&gt;
&lt;br /&gt;
[[File:3dball.png]]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Compression ==&lt;br /&gt;
&lt;br /&gt;
Due to the overhead of a decompression-routine, compressing your intro would start making sense from a level of may be 512 Byte and for sure when you aim coding an intro &amp;gt;=1 Kbyte.&lt;br /&gt;
&lt;br /&gt;
Luckily we already have a tool for that. That is the absolute and untyped file compressor called &amp;lt;b&amp;gt;Codepressor&amp;lt;/b&amp;gt; originally developed by Pervect/Topix, now updated and maintained by Phlamethrower. Check out this [http://www.phlamethrower.co.uk/riscos/cpress.php Link] to get the latest version. It contains different compression algorithms and will try all of them and choose the best finally.&lt;br /&gt;
&lt;br /&gt;
The usage is quite straight forward. Just let the filer see the application and type in the command line&lt;br /&gt;
&amp;lt;code&amp;gt;codepressr &amp;lt;filename_in&amp;gt; &amp;lt;filename_out&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
to compress your intro. As an example the intro [https://www.pouet.net/prod.php?which=85044 'blury'] was compressed from 966 to 832 Bytes.&lt;br /&gt;
&lt;br /&gt;
At the time of writing it doesn't work with Thumb code when the standard trick that is mention above is used to specify the execution address of the intro in the filing system. So for now you would have to skip that and invest 4 bytes at the start of your intro to make it compressible with Codepressor:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;adr pc,thumb+1&lt;br /&gt;
.thumb&lt;br /&gt;
thumb:&amp;lt;/syntaxhighlight&amp;gt;     &lt;br /&gt;
This will trigger Thumb mode manually within your code. &amp;lt;code&amp;gt;adr pc,thumb+1&amp;lt;/code&amp;gt; is a pseudo-instruction that tries to produce a single add or sub  instruction to load the address, in that case here it'll be assembled to &amp;lt;code&amp;gt;sub pc,pc,#3&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0006/e/ Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0001/m ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0007/e Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=662</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=662"/>
				<updated>2020-07-06T12:52:45Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (&amp;lt;code&amp;gt;r0...15&amp;lt;/code&amp;gt;, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;), 32 64-Bit multi purpose or double precision registers for SIMD (&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;), and 16 128-Bit multi purpose registers for SIMD (&amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt;). All those registers are mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
[[File:edgedancer.png]]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
[[File:elsecaller.png]]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
VFP and NEON are basically the FPU and the SIMD (single instruction, multiple data) extension of the ARM instruction set. VFP works with 32 single (32 Bit) and double precision (64 bit) floating point registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;,&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
NEON uses the same register set regarding &amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt; and adds the 128 Bit sized &amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt; registers. It can use and operate with multiple integer data types (8,16,32,64 Bit signed and unsigned) and single precision floating point (32 Bit) numbers. It's also possible to use an indexed register like d0[0], e.g. for multiplying multiple data in a register by a single scalar from another register (like &amp;lt;code&amp;gt;vmul.f32 q0,q1,d4[0]&amp;lt;/code&amp;gt; =&amp;gt; multiply each of the 4 single floats in q1 by single float d4[0] and place results in q0).&lt;br /&gt;
&lt;br /&gt;
Another feature is to have instructions that saturate the results, which is quite useful when working with colouring. So e.g. &amp;lt;code&amp;gt;vqadd.u8 q0,q1,q2&amp;lt;/code&amp;gt; would add 4 true colour RGB pixels (datatype &amp;lt;code&amp;gt;u8&amp;lt;/code&amp;gt; = unsgined 8 Bit integer) from q1 to the ones in q2. If an overflow occurres the value would be saturated to 255.&lt;br /&gt;
&lt;br /&gt;
The pure amount of available NEON instructions and their variations (saturating, narrowing, widening,...) is quite huge, make sure to check the links below to read up on that. Like explained before the register set of VFP and NEON and the registers sets themselves are mapped on each other. So modifying &amp;lt;code&amp;gt;s0&amp;lt;/code&amp;gt; would result in modifying the low 32 Bits of &amp;lt;code&amp;gt;d0&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;q0&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Before we can use the VFP/NEON unit within our Risc OS code we need to invest some bytes in requesting a so called &amp;lt;i&amp;gt;VFPContext&amp;lt;/i&amp;gt; for initialization. To do so with ARM code that would look like this.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;mov r0,#3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
mov r1,#32          //request full set of 32 VFP/NEON registers&lt;br /&gt;
mov r2,#0&lt;br /&gt;
swi VFPSupport_CreateContext&amp;lt;/syntaxhighlight&amp;gt;Further documentation on the SWI &amp;lt;i&amp;gt;VFPSupport_CreateContext&amp;lt;/i&amp;gt; can be found [https://www.riscosopen.org/wiki/documentation/show/VFPSupport_CreateContext here]&lt;br /&gt;
&lt;br /&gt;
The same in THUMB-2:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;movs r1,#32&lt;br /&gt;
lsls r0,r1,#26      //reuse r1&lt;br /&gt;
adds r0,r0,#3       //r0=3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
movs r2,#0&lt;br /&gt;
movw r10,#0x8ec1&lt;br /&gt;
movt r10,#0x5&lt;br /&gt;
swi OS_CallASWI     //needed due to swi number &amp;gt;0xff&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The major interest on using NEON is speed (for floats and integers) and working with floats in general, not so much size, as like shown above the setup consumes some bytes. If your code doesn't need floats and the speed is good enough there might not be much need for using NEON. You will find also a small amount of parallel arithmetic and saturating capable instructions for normal ARM integer code in the instruction set. But as shown in Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]...that wouldn't be probably possible without NEON in that size/speed.&lt;br /&gt;
&lt;br /&gt;
[[File:3dball.png]]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0006/e/ Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0001/m ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0007/e Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=File:3dball.png&amp;diff=661</id>
		<title>File:3dball.png</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=File:3dball.png&amp;diff=661"/>
				<updated>2020-07-06T12:51:36Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: png version&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;png version&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=File:3dball.gif&amp;diff=660</id>
		<title>File:3dball.gif</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=File:3dball.gif&amp;diff=660"/>
				<updated>2020-07-06T12:44:18Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: 3dball pic&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;3dball pic&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=File:Elsecaller.png&amp;diff=659</id>
		<title>File:Elsecaller.png</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=File:Elsecaller.png&amp;diff=659"/>
				<updated>2020-07-06T12:43:56Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: elsecaller pic&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;elsecaller pic&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=File:Edgedancer.png&amp;diff=658</id>
		<title>File:Edgedancer.png</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=File:Edgedancer.png&amp;diff=658"/>
				<updated>2020-07-06T12:40:11Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: edgedancer pic&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;edgedancer pic&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=657</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=657"/>
				<updated>2020-07-06T12:29:18Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: Link Update&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (&amp;lt;code&amp;gt;r0...15&amp;lt;/code&amp;gt;, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;), 32 64-Bit multi purpose or double precision registers for SIMD (&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;), and 16 128-Bit multi purpose registers for SIMD (&amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt;). All those registers are mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
VFP and NEON are basically the FPU and the SIMD (single instruction, multiple data) extension of the ARM instruction set. VFP works with 32 single (32 Bit) and double precision (64 bit) floating point registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;,&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
NEON uses the same register set regarding &amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt; and adds the 128 Bit sized &amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt; registers. It can use and operate with multiple integer data types (8,16,32,64 Bit signed and unsigned) and single precision floating point (32 Bit) numbers. It's also possible to use an indexed register like d0[0], e.g. for multiplying multiple data in a register by a single scalar from another register (like &amp;lt;code&amp;gt;vmul.f32 q0,q1,d4[0]&amp;lt;/code&amp;gt; =&amp;gt; multiply each of the 4 single floats in q1 by single float d4[0] and place results in q0).&lt;br /&gt;
&lt;br /&gt;
Another feature is to have instructions that saturate the results, which is quite useful when working with colouring. So e.g. &amp;lt;code&amp;gt;vqadd.u8 q0,q1,q2&amp;lt;/code&amp;gt; would add 4 true colour RGB pixels (datatype &amp;lt;code&amp;gt;u8&amp;lt;/code&amp;gt; = unsgined 8 Bit integer) from q1 to the ones in q2. If an overflow occurres the value would be saturated to 255.&lt;br /&gt;
&lt;br /&gt;
The pure amount of available NEON instructions and their variations (saturating, narrowing, widening,...) is quite huge, make sure to check the links below to read up on that. Like explained before the register set of VFP and NEON and the registers sets themselves are mapped on each other. So modifying &amp;lt;code&amp;gt;s0&amp;lt;/code&amp;gt; would result in modifying the low 32 Bits of &amp;lt;code&amp;gt;d0&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;q0&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Before we can use the VFP/NEON unit within our Risc OS code we need to invest some bytes in requesting a so called &amp;lt;i&amp;gt;VFPContext&amp;lt;/i&amp;gt; for initialization. To do so with ARM code that would look like this.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;mov r0,#3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
mov r1,#32          //request full set of 32 VFP/NEON registers&lt;br /&gt;
mov r2,#0&lt;br /&gt;
swi VFPSupport_CreateContext&amp;lt;/syntaxhighlight&amp;gt;Further documentation on the SWI &amp;lt;i&amp;gt;VFPSupport_CreateContext&amp;lt;/i&amp;gt; can be found [https://www.riscosopen.org/wiki/documentation/show/VFPSupport_CreateContext here]&lt;br /&gt;
&lt;br /&gt;
The same in THUMB-2:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;movs r1,#32&lt;br /&gt;
lsls r0,r1,#26      //reuse r1&lt;br /&gt;
adds r0,r0,#3       //r0=3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
movs r2,#0&lt;br /&gt;
movw r10,#0x8ec1&lt;br /&gt;
movt r10,#0x5&lt;br /&gt;
swi OS_CallASWI     //needed due to swi number &amp;gt;0xff&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The major interest on using NEON is speed (for floats and integers) and working with floats in general, not so much size, as like shown above the setup consumes some bytes. If your code doesn't need floats and the speed is good enough there might not be much need for using NEON. You will find also a small amount of parallel arithmetic and saturating capable instructions for normal ARM integer code in the instruction set. But as shown in Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]...that wouldn't be probably possible without NEON in that size/speed.&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0006/e/ Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0001/m ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://developer.arm.com/documentation/qrc0007/e Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=656</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=656"/>
				<updated>2020-07-06T12:23:44Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (&amp;lt;code&amp;gt;r0...15&amp;lt;/code&amp;gt;, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;), 32 64-Bit multi purpose or double precision registers for SIMD (&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;), and 16 128-Bit multi purpose registers for SIMD (&amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt;). All those registers are mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
VFP and NEON are basically the FPU and the SIMD (single instruction, multiple data) extension of the ARM instruction set. VFP works with 32 single (32 Bit) and double precision (64 bit) floating point registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;,&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
NEON uses the same register set regarding &amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt; and adds the 128 Bit sized &amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt; registers. It can use and operate with multiple integer data types (8,16,32,64 Bit signed and unsigned) and single precision floating point (32 Bit) numbers. It's also possible to use an indexed register like d0[0], e.g. for multiplying multiple data in a register by a single scalar from another register (like &amp;lt;code&amp;gt;vmul.f32 q0,q1,d4[0]&amp;lt;/code&amp;gt; =&amp;gt; multiply each of the 4 single floats in q1 by single float d4[0] and place results in q0).&lt;br /&gt;
&lt;br /&gt;
Another feature is to have instructions that saturate the results, which is quite useful when working with colouring. So e.g. &amp;lt;code&amp;gt;vqadd.u8 q0,q1,q2&amp;lt;/code&amp;gt; would add 4 true colour RGB pixels (datatype &amp;lt;code&amp;gt;u8&amp;lt;/code&amp;gt; = unsgined 8 Bit integer) from q1 to the ones in q2. If an overflow occurres the value would be saturated to 255.&lt;br /&gt;
&lt;br /&gt;
The pure amount of available NEON instructions and their variations (saturating, narrowing, widening,...) is quite huge, make sure to check the links below to read up on that. Like explained before the register set of VFP and NEON and the registers sets themselves are mapped on each other. So modifying &amp;lt;code&amp;gt;s0&amp;lt;/code&amp;gt; would result in modifying the low 32 Bits of &amp;lt;code&amp;gt;d0&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;q0&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Before we can use the VFP/NEON unit within our Risc OS code we need to invest some bytes in requesting a so called &amp;lt;i&amp;gt;VFPContext&amp;lt;/i&amp;gt; for initialization. To do so with ARM code that would look like this.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;mov r0,#3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
mov r1,#32          //request full set of 32 VFP/NEON registers&lt;br /&gt;
mov r2,#0&lt;br /&gt;
swi VFPSupport_CreateContext&amp;lt;/syntaxhighlight&amp;gt;Further documentation on the SWI &amp;lt;i&amp;gt;VFPSupport_CreateContext&amp;lt;/i&amp;gt; can be found [https://www.riscosopen.org/wiki/documentation/show/VFPSupport_CreateContext here]&lt;br /&gt;
&lt;br /&gt;
The same in THUMB-2:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;movs r1,#32&lt;br /&gt;
lsls r0,r1,#26      //reuse r1&lt;br /&gt;
adds r0,r0,#3       //r0=3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
movs r2,#0&lt;br /&gt;
movw r10,#0x8ec1&lt;br /&gt;
movt r10,#0x5&lt;br /&gt;
swi OS_CallASWI     //needed due to swi number &amp;gt;0xff&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The major interest on using NEON is speed (for floats and integers) and working with floats in general, not so much size, as like shown above the setup consumes some bytes. If your code doesn't need floats and the speed is good enough there might not be much need for using NEON. You will find also a small amount of parallel arithmetic and saturating capable instructions for normal ARM integer code in the instruction set. But as shown in Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]...that wouldn't be probably possible without NEON in that size/speed.&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=655</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=655"/>
				<updated>2020-07-06T12:22:30Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (&amp;lt;code&amp;gt;r0...15&amp;lt;/code&amp;gt;, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;), 32 64-Bit multi purpose or double precision registers for SIMD (&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;), and 16 128-Bit multi purpose registers for SIMD (&amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt;). All those registers are mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
VFP and NEON are basically the FPU and the SIMD (single instruction, multiple data) extension of the ARM instruction set. VFP works with 32 single (32 Bit) and double precision (64 bit) floating point registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;,&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
NEON uses the same register set regarding &amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt; and adds the 128 Bit sized &amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt; registers. It can use and operate with multiple integer data types (8,16,32,64 Bit signed and unsigned) and single precision floating point (32 Bit) numbers. It's also possible to use an indexed register like d0[0], e.g. for multiplying multiple data in a register by a single scalar from another register (like &amp;lt;code&amp;gt;vmul.f32 q0,q1,d4[0]&amp;lt;/code&amp;gt; =&amp;gt; multiply each of the 4 single floats in q1 by single float d4[0] and place results in q0).&lt;br /&gt;
&lt;br /&gt;
Another feature is to have instructions that saturate the results, which is quite useful when working with colouring. So e.g. &amp;lt;code&amp;gt;vqadd.u8 q0,q1,q2&amp;lt;/code&amp;gt; would add 4 true colour RGB pixels (datatype &amp;lt;code&amp;gt;u8&amp;lt;/code&amp;gt; = unsgined 8 Bit integer) from q1 to the ones in q2. If an overflow occurres the value would be saturated to 255.&lt;br /&gt;
&lt;br /&gt;
The pure amount of available NEON instructions is quite huge, make sure to check the links below to read up on that. Like explained before the register set of VFP and NEON and the registers sets themselves are mapped on each other. So modifying &amp;lt;code&amp;gt;s0&amp;lt;/code&amp;gt; would result in modifying the low 32 Bits of &amp;lt;code&amp;gt;d0&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;q0&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Before we can use the VFP/NEON unit within our Risc OS code we need to invest some bytes in requesting a so called &amp;lt;i&amp;gt;VFPContext&amp;lt;/i&amp;gt; for initialization. To do so with ARM code that would look like this.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;mov r0,#3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
mov r1,#32          //request full set of 32 VFP/NEON registers&lt;br /&gt;
mov r2,#0&lt;br /&gt;
swi VFPSupport_CreateContext&amp;lt;/syntaxhighlight&amp;gt;Further documentation on the SWI &amp;lt;i&amp;gt;VFPSupport_CreateContext&amp;lt;/i&amp;gt; can be found [https://www.riscosopen.org/wiki/documentation/show/VFPSupport_CreateContext here]&lt;br /&gt;
&lt;br /&gt;
The same in THUMB-2:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;movs r1,#32&lt;br /&gt;
lsls r0,r1,#26      //reuse r1&lt;br /&gt;
adds r0,r0,#3       //r0=3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
movs r2,#0&lt;br /&gt;
movw r10,#0x8ec1&lt;br /&gt;
movt r10,#0x5&lt;br /&gt;
swi OS_CallASWI     //needed due to swi number &amp;gt;0xff&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The major interest on using NEON is speed (for floats and integers) and working with floats in general, not so much size, as like shown above the setup consumes some bytes. If your code doesn't need floats and the speed is good enough there might not be much need for using NEON. You will find also a small amount of parallel arithmetic and saturating capable instructions for normal ARM integer code in the instruction set. But as shown in Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]...that wouldn't be probably possible without NEON in that size/speed.&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=654</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=654"/>
				<updated>2020-07-06T12:20:08Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: VFP/NEON section extended&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (&amp;lt;code&amp;gt;r0...15&amp;lt;/code&amp;gt;, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;), 32 64-Bit multi purpose or double precision registers for SIMD (&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;), and 16 128-Bit multi purpose registers for SIMD (&amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt;). All those registers are mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
VFP and NEON are basically the FPU and the SIMD (single instruction, multiple data) extension of the ARM instruction set. VFP works with 32 single (32 Bit) and double precision (64 bit) floating point registers (&amp;lt;code&amp;gt;s0...s31&amp;lt;/code&amp;gt;,&amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
NEON uses the same register set regarding &amp;lt;code&amp;gt;d0...d31&amp;lt;/code&amp;gt; and adds the 128 Bit sized &amp;lt;code&amp;gt;q0...q15&amp;lt;/code&amp;gt; registers. It can use and operate with multiple integer data types (8,16,32,64 Bit signed and unsigned) and single precision floating point (32 Bit) numbers. It's also possible to use an indexed register like d0[0], e.g. for multiplying multiple data in a register by a single scalar from another register (like &amp;lt;code&amp;gt;vmul.f32 q0,q1,d4[0]&amp;lt;/code&amp;gt; =&amp;gt; multiply each of the 4 single floats in q1 by single float d4[0] and place results in q0).&lt;br /&gt;
&lt;br /&gt;
Another feature is to have instructions that saturate the results, which is quite useful when working with colouring. So e.g. &amp;lt;code&amp;gt;vqadd.u8 q0,q1,q2&amp;lt;/code&amp;gt; would add 4 true colour RGB pixels from q1 to the ones in q2. If an overflow occurres the value would be saturated to 255.&lt;br /&gt;
&lt;br /&gt;
The pure amount of available NEON instructions is quite huge, make sure to check the links below to read up on that. Like explained before the register set of VFP and NEON and the registers sets themselves are mapped on each other. So modifying &amp;lt;code&amp;gt;s0&amp;lt;/code&amp;gt; would result in modifying the low 32 Bits of &amp;lt;code&amp;gt;d0&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;q0&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Before we can use the VFP/NEON unit within our Risc OS code we need to invest some bytes in requesting a so called &amp;lt;i&amp;gt;VFPContext&amp;lt;/i&amp;gt; for initialization. To do so with ARM code that would look like this.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;mov r0,#3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
mov r1,#32          //request full set of 32 VFP/NEON registers&lt;br /&gt;
mov r2,#0&lt;br /&gt;
swi VFPSupport_CreateContext&amp;lt;/syntaxhighlight&amp;gt;Further documentation on the SWI &amp;lt;i&amp;gt;VFPSupport_CreateContext&amp;lt;/i&amp;gt; can be found [https://www.riscosopen.org/wiki/documentation/show/VFPSupport_CreateContext here]&lt;br /&gt;
&lt;br /&gt;
The same in THUMB-2:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;movs r1,#32&lt;br /&gt;
lsls r0,r1,#26      //reuse r1&lt;br /&gt;
adds r0,r0,#3       //r0=3+(1&amp;lt;&amp;lt;31)&lt;br /&gt;
movs r2,#0&lt;br /&gt;
movw r10,#0x8ec1&lt;br /&gt;
movt r10,#0x5&lt;br /&gt;
swi OS_CallASWI     //needed due to swi number &amp;gt;0xff&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The major interest on using NEON is speed (for floats and integers) and working with floats in general, not so much size, as like shown above the setup consumes some bytes. If your code doesn't need floats and the speed is good enough there might not be much need for using NEON. You will find also a small amount of parallel arithmetic and saturating capable instructions for normal ARM integer code in the instruction set. But as shown in Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]...that wouldn't be probably possible without NEON in that size/speed.&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=611</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=611"/>
				<updated>2020-06-20T19:56:06Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=610</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=610"/>
				<updated>2020-06-19T07:15:25Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* 256 colour modes have  a full size RGB (8 Bit each value) palette compared to DOS (6 Bit) &lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=609</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=609"/>
				<updated>2020-06-18T19:49:12Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: /* Code Examples - Simple sizecoding framework and output to screen */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=608</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=608"/>
				<updated>2020-06-18T19:47:19Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged)&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB/THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=607</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=607"/>
				<updated>2020-06-18T19:46:47Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4  (r3 is only shifted for the add and remains unchanged after)//syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB/THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=606</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=606"/>
				<updated>2020-06-18T19:45:27Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4 without changing r3&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB/THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=605</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=605"/>
				<updated>2020-06-18T19:44:11Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB/THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=604</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=604"/>
				<updated>2020-06-18T19:43:05Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: More stuff on conditional execution in ARM and THUMB-2&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next instruction. You can continue endlessly with conditional instructions until the code executes an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this by the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0,r1&lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0, r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (only partly relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* an FPU (like x87) with trigonometric or logarithmic functions&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB/THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=603</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=603"/>
				<updated>2020-06-18T19:35:11Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
The conditional execution in ARM mode isn't limited to the next intruction. You can basicall continue endlessly to make the following instructions also conditional until you have an instruction that triggers the flags like e.g. &amp;lt;code&amp;gt;cmp&amp;lt;/code&amp;gt; or an instruction with the suffix &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; added like e.g. &amp;lt;code&amp;gt;adds r0,r1,r2&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;it&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this with the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0, r1&lt;br /&gt;
   subgt r0, r0, r1&lt;br /&gt;
   suble r1, r1, r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0, r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB/THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=602</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=602"/>
				<updated>2020-06-18T19:31:11Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
When it comes to THUMB mode unfortunately only branches are conditional. But with THUMB-2 the &amp;lt;code&amp;gt;IT&amp;lt;/code&amp;gt; instruction was introduced with that up to 4 following instructions can be conditional. Some code from the ARM Information center explains this with the GDC algortithm (Greatest Common Divisor).&lt;br /&gt;
&lt;br /&gt;
ARM (16 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0, r1&lt;br /&gt;
   subgt r0, r0, r1&lt;br /&gt;
   suble r1, r1, r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
THUMB-2 (10 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;gcd:&lt;br /&gt;
   cmp   r0, r1&lt;br /&gt;
   ite   gt &lt;br /&gt;
   subgt r0,r0,r1&lt;br /&gt;
   suble r1,r1,r0&lt;br /&gt;
   bne gcd&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB/THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=601</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=601"/>
				<updated>2020-06-18T19:19:29Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsr#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;gt;&amp;gt;4&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shr eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB/THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=600</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=600"/>
				<updated>2020-06-18T12:27:54Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsl#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;lt;&amp;lt;4&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shl eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB/THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=599</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=599"/>
				<updated>2020-06-18T12:15:14Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsl#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;lt;&amp;lt;2&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shl eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB/THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address stored in the filesystem as an attribute) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=598</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=598"/>
				<updated>2020-06-18T12:12:57Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsl#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;lt;&amp;lt;2&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shl eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB/THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode for the resulting executable in general you can conveniently set the first Bit of the start address (executeables in Risc OS have a load address and a start address) by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=597</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=597"/>
				<updated>2020-06-18T12:06:17Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsl#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;lt;&amp;lt;2&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shl eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB/THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode in general you can conveniently set the first Bit of the start address by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=596</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=596"/>
				<updated>2020-06-18T11:59:46Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsl#4   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;lt;&amp;lt;2&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip: &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shl eax,4&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode in general you can conveniently set the first Bit of the start address by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=595</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=595"/>
				<updated>2020-06-17T19:22:28Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsl#2   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;lt;&amp;lt;2&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip: &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shl eax,2&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode in general you can conveniently set the first Bit of the start address by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...by now check out the source code of Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=594</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=594"/>
				<updated>2020-06-17T19:19:22Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsl#2   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;lt;&amp;lt;2&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip: &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shl eax,2&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode in general you can conveniently set the first Bit of the start address by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=593</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=593"/>
				<updated>2020-06-17T19:17:53Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: Risc OS ByteBeat added&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsl#2   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;lt;&amp;lt;2&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip: &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shl eax,2&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of OpenGL or DirectX&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
.set OS_CallASWI 0x6f&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode in general you can conveniently set the first Bit of the start address by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
For basic sound output the principle of a so called timer based bytebeat could be used. For further reference check out this thread on pout&lt;br /&gt;
[https://www.pouet.net/topic.php?which=8357&amp;amp;page=1 Experimental music from very short C programs]. I took an example bytebeat from rrrola (shortened by ryg).&lt;br /&gt;
&lt;br /&gt;
To achieve that we need to set up an interrupt handler to take care of a timed output to the systems sound buffer. Here comes a bit of an obstacle. The SWI's for that purpose have a number that exceeds 0xff which would be fine for normall ARM code but not for THUMB. So here we've got to use the &amp;lt;code&amp;gt;SWI OS_CallASWI&amp;lt;/code&amp;gt; to call those SWI's indirectly. The SWI number to be called has to be set in r10. As we need 3 different SWI's for that in total (install handler, sample rate, remove handler) and those SWI's are within a short range of numbers we can save some bytes by just add/sub an offset for the other calls. Check out the code here:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb&lt;br /&gt;
//--- set up shared sound interrupt handler ---------------&lt;br /&gt;
adr.w r0,soundcode+1    //+1 as code address for interrupt routine needs to be in thumb state also&lt;br /&gt;
movs  r2,#0             //immediate handler&lt;br /&gt;
adr.n r3,soundhandler_title&lt;br /&gt;
str   r2,[r3]           //dummy title string&lt;br /&gt;
movw  r10,#0xb440&lt;br /&gt;
movt  r10,#0x6          //install XSharedSound handler (SWI 0x6b440)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
mov   r4,r0             //backup handler number (r0 gets corrupted by SharedSound_SampleRate)&lt;br /&gt;
mov   r1,#8000*1024     //sample rate *1024&lt;br /&gt;
add   r10,r10,#6        //XSharedSound_SampleRate (SWI 0x6b446)&lt;br /&gt;
swi   OS_CallASWI&lt;br /&gt;
sub   r10,r10,#5        //prepare r10 for XSharedSound_RemoveHandler (SWI 0x6b441) on exit later&lt;br /&gt;
//--- main intro loop -------------------------------------&lt;br /&gt;
mainloop:&lt;br /&gt;
//any graphics code or whatever would be here&lt;br /&gt;
swi OS_ReadEscapeState&lt;br /&gt;
bcc mainloop&lt;br /&gt;
mov r0,r4               //restore handler number&lt;br /&gt;
swi OS_CallASWI         //Remove XSharedSound handler&lt;br /&gt;
swi OS_Exit&lt;br /&gt;
//--- interrupt routine/sound generation ------------------&lt;br /&gt;
// r1 -&amp;gt; base of buffer, r2 -&amp;gt; end of buffer, r6 = 8.24 fractional step&lt;br /&gt;
// ByteBeat formula is = t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
soundcode:&lt;br /&gt;
push {r0-r7,LR}&lt;br /&gt;
lsrs  r6,r6,#8          //adjust fractional step&lt;br /&gt;
ldr.n r0,soundtimer     //t = soundtimer&lt;br /&gt;
soundloop:&lt;br /&gt;
   lsrs r5,r0,#16       //adjust timer for bytebeat&lt;br /&gt;
   movw r7,#0xca98      //bytebeat multi constant&lt;br /&gt;
   lsrs r4,r5,#9        //t&amp;gt;&amp;gt;9&lt;br /&gt;
   and  r4,r4,#14       //(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   lsrs r7,r7,r4        //0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14&lt;br /&gt;
   and  r7,r7,#15       //(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15&lt;br /&gt;
   muls r7,r5,r7        //t*((0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9)&amp;amp;14)&amp;amp;15)&lt;br /&gt;
   orr  r7,r7,r5,lsr#8  //t*(0xca98&amp;gt;&amp;gt;(t&amp;gt;&amp;gt;9&amp;amp;14)&amp;amp;15)|t&amp;gt;&amp;gt;8&lt;br /&gt;
   lsls r7,r7,#8        //8Bit =&amp;gt; 16Bit sound&lt;br /&gt;
   orr  r7,r7,r7,lsl#16 //mono =&amp;gt; stereo copy&lt;br /&gt;
   stm  r1!,{r7}        //store sound word&lt;br /&gt;
   adds r0,r0,r6        //inc timer by fractional step&lt;br /&gt;
   cmp  r1,r2           //check if buffer filled&lt;br /&gt;
bne soundloop&lt;br /&gt;
adr.n r4,soundtimer&lt;br /&gt;
str r0,[r4]             //save timer...no pc relative str in Thumb...&lt;br /&gt;
pop {r0-r7,PC}&lt;br /&gt;
//--- data ----------------------------------------------&lt;br /&gt;
.align 2&lt;br /&gt;
soundhandler_title:&lt;br /&gt;
soundtimer:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
This assembles to 96 Bytes. &lt;br /&gt;
&lt;br /&gt;
There are other ways to do sound on Risc OS, but those were not evaluated at the time of writing. Also BBC Basic has ways to create sounds by note or frequency (Link is [http://www.riscos.com/support/developers/bbcbasic/part2/sound.html here]). Check out the Sound SWI calls in Detail here [https://www.riscosopen.org/wiki/documentation/show/Sound%20SWI%20Calls Sound SWI Calls]. Some further insights on the sound system can be found here [http://jcgl.orpheusweb.co.uk/temp/ROSSDocument.pdf The Risc OS sound system by j. Lesurf].&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=592</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=592"/>
				<updated>2020-06-16T19:34:56Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: /* Links on the OS */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsl#2   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;lt;&amp;lt;2&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip: &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shl eax,2&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of open-gl or direct-x&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode in general you can conveniently set the first Bit of the start address by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ RISC OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=591</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=591"/>
				<updated>2020-06-16T19:34:36Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: /* Links on the OS */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsl#2   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;lt;&amp;lt;2&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip: &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shl eax,2&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of open-gl or direct-x&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode in general you can conveniently set the first Bit of the start address by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ Risc OS Open - Home of the current OS version, documentation on the OS (e.g. SWI's) and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=590</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=590"/>
				<updated>2020-06-16T19:33:25Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;cmp   r0,r1            //compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsl#2   //if r0&amp;gt;r1 then r0 = r2 + r3&amp;lt;&amp;lt;2&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip: &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shl eax,2&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of open-gl or direct-x&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;gas&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode in general you can conveniently set the first Bit of the start address by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ Risc OS Open - Home of the current OS version and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=589</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=589"/>
				<updated>2020-06-16T19:31:28Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;arm&amp;quot;&amp;gt;cmp   r0,r1            ;compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsl#2   ;if r0&amp;gt;r1 then r0 = r2 + r3&amp;lt;&amp;lt;2&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip: &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shl eax,2&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of open-gl or direct-x&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;arm&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;arm&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode in general you can conveniently set the first Bit of the start address by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ Risc OS Open - Home of the current OS version and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=588</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=588"/>
				<updated>2020-06-16T17:32:25Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:&lt;br /&gt;
* 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
* VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other&lt;br /&gt;
* THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp   r0,r1            ;compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsl#2   ;if r0&amp;gt;r1 then r0 = r2 + r3&amp;lt;&amp;lt;2&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip: &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shl eax,2&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of open-gl or direct-x&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode in general you can conveniently set the first Bit of the start address by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ Risc OS Open - Home of the current OS version and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Main_Page&amp;diff=587</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Main_Page&amp;diff=587"/>
				<updated>2020-06-16T17:24:12Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Welcome to SizeCoding.org! ==&lt;br /&gt;
&lt;br /&gt;
'''SizeCoding.org is a wiki dedicated to the art of creating very tiny programs for the 80x86 family of CPUs. As sizecoding is also popular on other hardware we recently opened the website for other plaforms as well, check the links below.'''  By &amp;quot;very tiny programs&amp;quot;, we mean programs that are '''256 bytes or less in size''', typically created by members of the [https://en.wikipedia.org/wiki/Demoscene demoscene] as a show of programming skill.  The size of these tiny programs is measured by their total size in opcode bytes, and are usually presented as executable .COM files to be run in pure DOS, a DOS VM running inside another operating system, or an emulator that can run DOS such as DOSBox.  &lt;br /&gt;
&lt;br /&gt;
'''Despite their tiny size, these programs are able to produce amazing graphical displays, playable games, and sometimes music.'''  There are even some surprisingly effective programs in under '''16 bytes''', such as [https://trixter.oldskool.org/2012/12/17/maze-generation-in-thirteen-bytes/ a maze generator in 10 bytes], a [[Paint16b|mouse-driven paint program in 16 bytes]], and an [[M8trix_8b|&amp;quot;Enter the Matrix&amp;quot; screensaver]] in only 8 bytes. &lt;br /&gt;
&lt;br /&gt;
'''The intent of this wiki is to teach x86 assembler programmers the various techniques used to create tiny demoscene intros.'''  While these techniques can be used for other applications (boot sectors, BIOS and firmware code, etc.), the information presented here is firmly oriented towards the demoscene.  Practicality and common sense are sometimes thrown out the window just to shave a single byte -- consider yourself warned.&lt;br /&gt;
&lt;br /&gt;
This wiki is divided into the following major sections for x86:&lt;br /&gt;
&lt;br /&gt;
*'''[[Getting Started]]''' - What do you need to know before starting your first tinyprog?&lt;br /&gt;
&lt;br /&gt;
*'''[[Techniques|Tips, Tricks, and Techniques]]''' - The meat of the wiki.  Lists of small opcodes, default environment settings, size optimization basics, and a ton of random tips.&lt;br /&gt;
&lt;br /&gt;
*'''[[Case Studies]]''' - Analysis of existing award-winning tiny programs, with comments on what choices were made and why.&lt;br /&gt;
&lt;br /&gt;
*'''[[Resources|Additional Resources]]''' - Discussion threads, competitions, repositories, and other external references that can aid you in your quest.&lt;br /&gt;
&lt;br /&gt;
For other platforms check out the sections here:&lt;br /&gt;
&lt;br /&gt;
*'''[[Risc OS on ARM based CPUs]]''' - Tired of x86 ? Create your tiny intro on ARM hardware on Risc OS.&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Main_Page&amp;diff=586</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Main_Page&amp;diff=586"/>
				<updated>2020-06-16T17:23:39Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Welcome to SizeCoding.org! ==&lt;br /&gt;
&lt;br /&gt;
'''SizeCoding.org is a wiki dedicated to the art of creating very tiny programs for the 80x86 family of CPUs. As sizecoding is also popular on other hardware we recently opened the website also for other plaforms, check the links below.'''  By &amp;quot;very tiny programs&amp;quot;, we mean programs that are '''256 bytes or less in size''', typically created by members of the [https://en.wikipedia.org/wiki/Demoscene demoscene] as a show of programming skill.  The size of these tiny programs is measured by their total size in opcode bytes, and are usually presented as executable .COM files to be run in pure DOS, a DOS VM running inside another operating system, or an emulator that can run DOS such as DOSBox.  &lt;br /&gt;
&lt;br /&gt;
'''Despite their tiny size, these programs are able to produce amazing graphical displays, playable games, and sometimes music.'''  There are even some surprisingly effective programs in under '''16 bytes''', such as [https://trixter.oldskool.org/2012/12/17/maze-generation-in-thirteen-bytes/ a maze generator in 10 bytes], a [[Paint16b|mouse-driven paint program in 16 bytes]], and an [[M8trix_8b|&amp;quot;Enter the Matrix&amp;quot; screensaver]] in only 8 bytes. &lt;br /&gt;
&lt;br /&gt;
'''The intent of this wiki is to teach x86 assembler programmers the various techniques used to create tiny demoscene intros.'''  While these techniques can be used for other applications (boot sectors, BIOS and firmware code, etc.), the information presented here is firmly oriented towards the demoscene.  Practicality and common sense are sometimes thrown out the window just to shave a single byte -- consider yourself warned.&lt;br /&gt;
&lt;br /&gt;
This wiki is divided into the following major sections for x86:&lt;br /&gt;
&lt;br /&gt;
*'''[[Getting Started]]''' - What do you need to know before starting your first tinyprog?&lt;br /&gt;
&lt;br /&gt;
*'''[[Techniques|Tips, Tricks, and Techniques]]''' - The meat of the wiki.  Lists of small opcodes, default environment settings, size optimization basics, and a ton of random tips.&lt;br /&gt;
&lt;br /&gt;
*'''[[Case Studies]]''' - Analysis of existing award-winning tiny programs, with comments on what choices were made and why.&lt;br /&gt;
&lt;br /&gt;
*'''[[Resources|Additional Resources]]''' - Discussion threads, competitions, repositories, and other external references that can aid you in your quest.&lt;br /&gt;
&lt;br /&gt;
For other platforms check out the sections here:&lt;br /&gt;
&lt;br /&gt;
*'''[[Risc OS on ARM based CPUs]]''' - Tired of x86 ? Create your tiny intro on ARM hardware on Risc OS.&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=585</id>
		<title>Risc OS on ARM based CPUs</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Risc_OS_on_ARM_based_CPUs&amp;diff=585"/>
				<updated>2020-06-16T17:20:12Z</updated>
		
		<summary type="html">&lt;p&gt;Kuemmel: Added Risc OS section&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Why ARM and why on Risc OS ? ==&lt;br /&gt;
&lt;br /&gt;
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.&lt;br /&gt;
&lt;br /&gt;
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.&lt;br /&gt;
&lt;br /&gt;
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.&lt;br /&gt;
&lt;br /&gt;
=== What does ARM offer compared to x86 ? ===&lt;br /&gt;
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from a 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. During the years a lot of enhancements took place. In general you got:&lt;br /&gt;
- 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)&lt;br /&gt;
- VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD, and 16 128-Bit multi purpose registers for SIMD. All those registers are fully mapped on each other&lt;br /&gt;
- THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)&lt;br /&gt;
&lt;br /&gt;
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2 by now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.  &lt;br /&gt;
&lt;br /&gt;
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.&lt;br /&gt;
&lt;br /&gt;
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example: &lt;br /&gt;
&lt;br /&gt;
ARM (8 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp   r0,r1            ;compare r0 with r1&lt;br /&gt;
addhi r0,r2,r3,lsl#2   ;if r0&amp;gt;r1 then r0 = r2 + r3&amp;lt;&amp;lt;2&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
x86 (11 Bytes)&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;cmp eax,ebx&lt;br /&gt;
jna skip: &lt;br /&gt;
   mov eax,edx&lt;br /&gt;
   shl eax,2&lt;br /&gt;
   add eax,ecx&lt;br /&gt;
skip:&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
=== What does Risc OS offer for sizecoding ? ===&lt;br /&gt;
* more or less easy access to common screen modes&lt;br /&gt;
* all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS&lt;br /&gt;
* convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).  &lt;br /&gt;
* up to date 16-Bit sound system, for e.g. generating bytebeat based stuff &lt;br /&gt;
* built in BBC Basic including an Assembler&lt;br /&gt;
&lt;br /&gt;
=== What does it lack (but mostly not relevant to tiny intro sizecoding) ? ===&lt;br /&gt;
* no multicore support&lt;br /&gt;
* no shader access or any kind of open-gl or direct-x&lt;br /&gt;
* lack of software development in general, so web browsing is there but a bit limited&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Simple sizecoding framework and output to screen ==&lt;br /&gt;
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.&lt;br /&gt;
&lt;br /&gt;
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_RemoveCursors, 0x36&lt;br /&gt;
.set OS_ScreenMode, 0x65&lt;br /&gt;
.set OS_ReadVduVariables, 0x31&lt;br /&gt;
.set OS_ReadMonotonicTime, 0x42&lt;br /&gt;
.set OS_ReadEscapeState, 0x2c&lt;br /&gt;
.set OS_Exit, 0x11&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
So for a basic intro loop in THUMB-2 this would look like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#0               //reason code to set screen mode by number&lt;br /&gt;
movs r1,#13              //screen mode 13 = 320x256 256 colours&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
yloop:&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      adds r3,r1,r0      //p = x+timer&lt;br /&gt;
      eors r3,r3,r2      //p = (x+timer) xor y&lt;br /&gt;
      strb r3,[r7],1     //plot result as byte (with standard palette)&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 52 Bytes. &lt;br /&gt;
&lt;br /&gt;
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: [https://www.riscosopen.org/wiki/documentation/show/Screen%20Modes Screen Modes]). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode). &lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85682 Edgedancer]&lt;br /&gt;
&lt;br /&gt;
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI ([https://www.riscosopen.org/wiki/documentation/show/OS_ScreenMode Check out this link for further information]). That would look like this code snippet:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;NASM&amp;quot;&amp;gt;.syntax unified&lt;br /&gt;
.thumb                   //assemble using thumb mode&lt;br /&gt;
movs r0,#15              //reason code to request screen mode by string     &lt;br /&gt;
adr.n r1,mode_string     //pointer to string&lt;br /&gt;
swi OS_ScreenMode        //set screen mode &lt;br /&gt;
adr.n r0,screen_address  //address of input block to read screen mode address&lt;br /&gt;
movs r1,r0               //address of output block where screen mode address is stored  &lt;br /&gt;
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks &lt;br /&gt;
&lt;br /&gt;
mainloop:&lt;br /&gt;
ldr.n r7,screen_address  //read screen address&lt;br /&gt;
swi OS_ReadMonotonicTime //get OS timer to r0&lt;br /&gt;
movs r2,#255             //screen y&lt;br /&gt;
ands r0,r0,r2            //get lowest byte of timer&lt;br /&gt;
lsls r0,r0,#8            //create 'B' for RGB from timer&lt;br /&gt;
yloop:&lt;br /&gt;
   lsls r4,r2,#16        //create 'R' for RGB from y&lt;br /&gt;
   orrs r4,r4,r0         //combine 'R' and 'B'&lt;br /&gt;
   movs r1,#320          //screen x&lt;br /&gt;
   xloop:&lt;br /&gt;
      lsrs r3,r1,#1      //x&amp;gt;&amp;gt;1 for 'G' as x&amp;gt;256&lt;br /&gt;
      orrs r3,r3,r4      //finalize RGB value &lt;br /&gt;
      stmia r7!,{r3}     //store true colour pixel and increment address&lt;br /&gt;
      subs r1,r1,#1      //dec x &lt;br /&gt;
   bne xloop&lt;br /&gt;
   subs r2,r2,#1         //dec y&lt;br /&gt;
bge yloop&lt;br /&gt;
swi OS_ReadEscapeState   //ESC pressed ?&lt;br /&gt;
bcc mainloop&lt;br /&gt;
swi OS_Exit              //if yes exit to OS&lt;br /&gt;
&lt;br /&gt;
.align 2                 //align&lt;br /&gt;
mode_string:&lt;br /&gt;
.string &amp;quot;13 C16M&amp;quot;        //screen mode string (terminated by 0) =&amp;gt; 13 = 320*256 C16M = true colour&lt;br /&gt;
screen_address:&lt;br /&gt;
.word 148                //input block to read screen address&lt;br /&gt;
.word -1                 //request block needs to be terminated by -1&amp;lt;/syntaxhighlight&amp;gt; &lt;br /&gt;
&lt;br /&gt;
This assembles to 68 Bytes. &lt;br /&gt;
 &lt;br /&gt;
An intro showing that technique is e.g. Exoticorn's [https://www.pouet.net/prod.php?which=85683 Elsecaller]&lt;br /&gt;
&lt;br /&gt;
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:&lt;br /&gt;
&lt;br /&gt;
An intro showing that technique is e.g. Kuemmels's [https://www.pouet.net/prod.php?which=85284 Risc OS 3dball]. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.&lt;br /&gt;
&lt;br /&gt;
To trigger the THUMB mode in general you can conveniently set the first Bit of the start address by the following command on the command line in Risc OS (&amp;amp;8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;SYS &amp;quot;OS_File&amp;quot;,1,&amp;quot;filename&amp;quot;,&amp;amp;8000,&amp;amp;8001,,19&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is &amp;quot;SpecialFX&amp;quot; and needs to be removed by &amp;quot;rmkill SpecialFX&amp;quot; on the command line or by any batch file as shown in the intro links from above.&lt;br /&gt;
&lt;br /&gt;
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI &amp;quot;OS_NewLine&amp;quot; to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Using VFP/NEON code ==&lt;br /&gt;
&lt;br /&gt;
...work in progress&lt;br /&gt;
&lt;br /&gt;
== Code Examples - Sound output by interrupt driven bytebeat ==&lt;br /&gt;
&lt;br /&gt;
...work in progress...&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&lt;br /&gt;
=== Links on the OS ===&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosopen.org/content/ Risc OS Open - Home of the current OS version and discussion forum]&lt;br /&gt;
&lt;br /&gt;
[https://www.riscosdev.com/direct/ RISC OS Direct - Easy installation package for Risc OS and all needed sizecoding tools for your Raspberry Pi including !GCC (includes gnu assembler) and !StrongED (most popular text editor)]&lt;br /&gt;
&lt;br /&gt;
=== Links on ARM coding ===&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0006e/QRC0006_UAL16.pdf Thumb 16-bit Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf ARM and Thumb-2 Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[http://infocenter.arm.com/help/topic/com.arm.doc.qrc0007e/QRC0007_VFP.pdf Vector Floating Point Instruction Set Quick Reference Card]&lt;br /&gt;
&lt;br /&gt;
[https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf NEON Programmer's Guide]&lt;br /&gt;
&lt;br /&gt;
[http://www.keil.com/support/man/docs/armasm/armasm_dom1361289850039.htm Instruction Set Overview]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/03/17/coding-for-neon--part-1-load-and-stores Coding for NEON - Part 1 - load and stores]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/05/10/coding-for-neon--part-2-dealing-with-leftovers Coding for NEON - Part 2 - dealing with leftovers]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/06/28/coding-for-neon--part-3-matrix-multiplication Coding for NEON - Part 3 - matrix multiplication]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--part-4-shifting-left-and-right Coding for NEON - Part 4 - shifting left and right]&lt;br /&gt;
&lt;br /&gt;
[http://community.arm.com/groups/processors/blog/2012/03/13/coding-for-neon--part-5-rearranging-vectors Coding for NEON - Part 5 - rearranging vectors]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes Condition Codes 1: Condition Flags and Codes]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-2-conditional-execution Condition Codes 2: Conditional Execution]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-3-conditional-execution-in-thumb-2 Condition Codes 3: Conditional Execution in Thumb-2]&lt;br /&gt;
&lt;br /&gt;
[https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-4-floating-point-comparisons-using-vfp Condition Codes 4: Floating-Point Comparisons Using VFP]&lt;/div&gt;</summary>
		<author><name>Kuemmel</name></author>	</entry>

	</feed>