<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>http://www.sizecoding.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Pestis</id>
		<title>SizeCoding - User contributions [en]</title>
		<link rel="self" type="application/atom+xml" href="http://www.sizecoding.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Pestis"/>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/wiki/Special:Contributions/Pestis"/>
		<updated>2026-05-03T22:12:19Z</updated>
		<subtitle>User contributions</subtitle>
		<generator>MediaWiki 1.27.0</generator>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1813</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1813"/>
				<updated>2025-10-09T08:57:55Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Vector arithmetic (examples) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| sign(x) || 6 || Computed as x/abs(x)&amp;lt;br/&amp;gt;fld st0; fabs; fdivp st1, st0&amp;lt;br/&amp;gt;Note that this does not handle the case x=0&lt;br /&gt;
|-&lt;br /&gt;
| mix(x,y,a) || 10 || Stack in: a x y. Stack out: x*(1-a)+y*a.&amp;lt;br/&amp;gt;fmul st2, st0; fld1; fsubrp st1, st0; fmulp st1, st0; faddp st1, st0&lt;br /&gt;
|-&lt;br /&gt;
| 2*min(x,y) || 10 || Computed as a+b-abs(a-b) i.e.&amp;lt;br/&amp;gt;fld st0; fsub st0, st2; fabs; fsubp st1, st0; faddp st1, st0&amp;lt;br/&amp;gt;Replace fsubp with faddp for 2*max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| min(x,y) || 11 || fcom st0, st1; fnstsw ax; sahf; jc .S; fxch st0, st1; .S: fstp st1, st0&amp;lt;br/&amp;gt;Replace jc with jnc for max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| 2*clamp(x,-1,1) || 14 || Computed as abs(1+x) - abs(1-x) i.e.&amp;lt;br/&amp;gt;fld1; fadd st0, st1; fabs; fld1; fsub st0, st2; fabs; fsubp st1, st0&amp;lt;br/&amp;gt;You can use other constants in place of fld1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
Notice the significant byte cost of min and max operations, primarily due to the absence of fcomi and fmovcc instructions on older processors. This can catch people off guard, as many shaders use min/max extensively. For example, raymarchers often rely on them to compute shape unions. Avoiding unnecessary use of min and max can prevent a few headaches later.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 10, round up), followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 01, round down), followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| trunc(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 11, round towards zero), followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 14 || If a is not needed later: fmul st0, st0; fxch st0, st2; fmul st0, st0; faddp st2, st0; fmul st0, st0; faddp st1, st0; fsqrt&lt;br /&gt;
|-&lt;br /&gt;
|| mix(a,b,k) || 21-22 || Stack in: k a.x a.y a.z b.x b.y b.z&amp;lt;br/&amp;gt;fmul st4, st0; fmul st5, st0; fmul st6, st0; fld1; fsubrp st1, st0; fmul st1, st0; fmul st2, st0; fmulp st3, st0; faddp st3, st0; faddp st3, st0; faddp st3, st0&amp;lt;br/&amp;gt;For the last three repeating instructions, you might be able to use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(a)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;a/=length(a)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:                ; for(int j=0;j&amp;lt;ITERS;j++) {&lt;br /&gt;
    fld     st0&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop     ; }&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1812</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1812"/>
				<updated>2025-10-09T08:48:09Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| sign(x) || 6 || Computed as x/abs(x)&amp;lt;br/&amp;gt;fld st0; fabs; fdivp st1, st0&amp;lt;br/&amp;gt;Note that this does not handle the case x=0&lt;br /&gt;
|-&lt;br /&gt;
| mix(x,y,a) || 10 || Stack in: a x y. Stack out: x*(1-a)+y*a.&amp;lt;br/&amp;gt;fmul st2, st0; fld1; fsubrp st1, st0; fmulp st1, st0; faddp st1, st0&lt;br /&gt;
|-&lt;br /&gt;
| 2*min(x,y) || 10 || Computed as a+b-abs(a-b) i.e.&amp;lt;br/&amp;gt;fld st0; fsub st0, st2; fabs; fsubp st1, st0; faddp st1, st0&amp;lt;br/&amp;gt;Replace fsubp with faddp for 2*max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| min(x,y) || 11 || fcom st0, st1; fnstsw ax; sahf; jc .S; fxch st0, st1; .S: fstp st1, st0&amp;lt;br/&amp;gt;Replace jc with jnc for max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| 2*clamp(x,-1,1) || 14 || Computed as abs(1+x) - abs(1-x) i.e.&amp;lt;br/&amp;gt;fld1; fadd st0, st1; fabs; fld1; fsub st0, st2; fabs; fsubp st1, st0&amp;lt;br/&amp;gt;You can use other constants in place of fld1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
Notice the significant byte cost of min and max operations, primarily due to the absence of fcomi and fmovcc instructions on older processors. This can catch people off guard, as many shaders use min/max extensively. For example, raymarchers often rely on them to compute shape unions. Avoiding unnecessary use of min and max can prevent a few headaches later.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 10, round up), followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 01, round down), followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| trunc(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 11, round towards zero), followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 14 || If a is not needed later: fmul st0, st0; fxch st0, st2; fmul st0, st0; faddp st2, st0; fmul st0, st0; faddp st1, st0; fsqrt&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(a)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;a/=length(a)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:                ; for(int j=0;j&amp;lt;ITERS;j++) {&lt;br /&gt;
    fld     st0&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop     ; }&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1811</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1811"/>
				<updated>2025-10-09T08:19:31Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| mix(x,y,a) || 10 || Stack in: a x y. Stack out: x*(1-a)+y*a.&amp;lt;br/&amp;gt;fmul st2, st0; fld1; fsubrp st1, st0; fmulp st1, st0; faddp st1, st0&lt;br /&gt;
|-&lt;br /&gt;
| 2*min(x,y) || 10 || Computed as a+b-abs(a-b) i.e.&amp;lt;br/&amp;gt;fld st0; fsub st0, st2; fabs; fsubp st1, st0; faddp st1, st0&amp;lt;br/&amp;gt;Replace fsubp with faddp for 2*max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| min(x,y) || 11 || fcom st0, st1; fnstsw ax; sahf; jc .S; fxch st0, st1; .S: fstp st1, st0&amp;lt;br/&amp;gt;Replace jc with jnc for max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| 2*clamp(x,-1,1) || 14 || Computed as abs(1+x) - abs(1-x) i.e.&amp;lt;br/&amp;gt;fld1; fadd st0, st1; fabs; fld1; fsub st0, st2; fabs; fsubp st1, st0&amp;lt;br/&amp;gt;You can use other constants in place of fld1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
Notice the significant byte cost of min and max operations, primarily due to the absence of fcomi and fmovcc instructions on older processors. This can catch people off guard, as many shaders use min/max extensively. For example, raymarchers often rely on them to compute shape unions. Avoiding unnecessary use of min and max can prevent a few headaches later.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 10, round up), followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 01, round down), followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| trunc(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 11, round towards zero), followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 14 || If a is not needed later: fmul st0, st0; fxch st0, st2; fmul st0, st0; faddp st2, st0; fmul st0, st0; faddp st1, st0; fsqrt&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(a)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;a/=length(a)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:                ; for(int j=0;j&amp;lt;ITERS;j++) {&lt;br /&gt;
    fld     st0&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop     ; }&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1810</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1810"/>
				<updated>2025-10-09T08:12:40Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Scalar functions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| 2*min(x,y) || 10 || Computed as a+b-abs(a-b) i.e.&amp;lt;br/&amp;gt;fld st0; fsub st0, st2; fabs; fsubp st1, st0; faddp st1, st0&amp;lt;br/&amp;gt;Replace fsubp with faddp for 2*max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| min(x,y) || 11 || fcom st0, st1; fnstsw ax; sahf; jc .S; fxch st0, st1; .S: fstp st1, st0&amp;lt;br/&amp;gt;Replace jc with jnc for max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| 2*clamp(x,-1,1) || 14 || Computed as abs(1+x) - abs(1-x) i.e.&amp;lt;br/&amp;gt;fld1; fadd st0, st1; fabs; fld1; fsub st0, st2; fabs; fsubp st1, st0&amp;lt;br/&amp;gt;You can use other constants in place of fld1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
Notice the significant byte cost of min and max operations, primarily due to the absence of fcomi and fmovcc instructions on older processors. This can catch people off guard, as many shaders use min/max extensively. For example, raymarchers often rely on them to compute shape unions. Avoiding unnecessary use of min and max can prevent a few headaches later.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 10, round up), followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 01, round down), followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| trunc(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 11, round towards zero), followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 14 || If a is not needed later: fmul st0, st0; fxch st0, st2; fmul st0, st0; faddp st2, st0; fmul st0, st0; faddp st1, st0; fsqrt&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(a)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;a/=length(a)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:                ; for(int j=0;j&amp;lt;ITERS;j++) {&lt;br /&gt;
    fld     st0&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop     ; }&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1809</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1809"/>
				<updated>2025-10-09T07:56:57Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Rounding and remainders */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| 2*min(x,y) || 10 || Computed as a+b-abs(a-b) i.e.&amp;lt;br/&amp;gt;fld st0; fsub st0, st2; fabs; fsubp st1, st0; faddp st1, st0&amp;lt;br/&amp;gt;Replace fsubp with faddp for 2*max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| min(x,y) || 11 || fcom st0, st1; fnstsw ax; sahf; jc .S; fxch st0, st1; .S: fstp st1, st0&amp;lt;br/&amp;gt;Replace jc with jnc for max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
Notice the significant byte cost of min and max operations, primarily due to the absence of fcomi and fmovcc instructions on older processors. This can catch people off guard, as many shaders use min/max extensively. For example, raymarchers often rely on them to compute shape unions. Avoiding unnecessary use of min and max can prevent a few headaches later.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 10, round up), followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 01, round down), followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| trunc(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw (RC field = 11, round towards zero), followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 14 || If a is not needed later: fmul st0, st0; fxch st0, st2; fmul st0, st0; faddp st2, st0; fmul st0, st0; faddp st1, st0; fsqrt&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(a)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;a/=length(a)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:                ; for(int j=0;j&amp;lt;ITERS;j++) {&lt;br /&gt;
    fld     st0&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop     ; }&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1808</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1808"/>
				<updated>2025-10-07T17:51:45Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| 2*min(x,y) || 10 || Computed as a+b-abs(a-b) i.e.&amp;lt;br/&amp;gt;fld st0; fsub st0, st2; fabs; fsubp st1, st0; faddp st1, st0&amp;lt;br/&amp;gt;Replace fsubp with faddp for 2*max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| min(x,y) || 11 || fcom st0, st1; fnstsw ax; sahf; jc .S; fxch st0, st1; .S: fstp st1, st0&amp;lt;br/&amp;gt;Replace jc with jnc for max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
Notice the significant byte cost of min and max operations, primarily due to the absence of fcomi and fmovcc instructions on older processors. This can catch people off guard, as many shaders use min/max extensively. For example, raymarchers often rely on them to compute shape unions. Avoiding unnecessary use of min and max can prevent a few headaches later.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 14 || If a is not needed later: fmul st0, st0; fxch st0, st2; fmul st0, st0; faddp st2, st0; fmul st0, st0; faddp st1, st0; fsqrt&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(a)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;a/=length(a)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:                ; for(int j=0;j&amp;lt;ITERS;j++) {&lt;br /&gt;
    fld     st0&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop     ; }&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1807</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1807"/>
				<updated>2025-10-07T17:50:35Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| 2*min(x,y) || 10 || Computed as a+b-abs(a-b) i.e.&amp;lt;br/&amp;gt;fld st0; fsub st0, st2; fabs; fsubp st1, st0; faddp st1, st0&amp;lt;br/&amp;gt;Replace fsubp with faddp for 2*max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| min(x,y) || 11 || fcom st0, st1; fnstsw ax; sahf; jc .S; fxch st0, st1; .S: fstp st1, st0&amp;lt;br/&amp;gt;Replace jc with jnc for max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
Notice the significant byte cost of min and max operations, primarily due to the absence of fcomi and fmovcc instructions on older processors. This can catch people off guard, as many shaders use min/max extensively. For example, raymarchers often rely on them to compute shape unions. Avoiding unnecessary use of min and max can prevent a few headaches later.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 14 || If a is not needed later: fmul st0, st0; fxch st0, st2; fmul st0, st0; faddp st2, st0; fmul st0, st0; faddp st1, st0; fsqrt&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(a)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;a/=length(a)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1806</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1806"/>
				<updated>2025-10-07T17:46:34Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| 2*min(x,y) || 10 || Computed as a+b-abs(a-b) i.e.&amp;lt;br/&amp;gt;fld st0; fsub st0, st2; fabs; fsubp st1, st0; faddp st1, st0&amp;lt;br/&amp;gt;Replace fsubp with faddp for 2*max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| min(x,y) || 11 || fcom st0, st1; fnstsw ax; sahf; jc .S; fxch st0, st1; .S: fstp st1, st0&amp;lt;br/&amp;gt;Replace jc with jnc for max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
Notice the significant byte cost of min and max operations, primarily due to the absence of fcomi and fmovcc instructions on older processors. This can catch people off guard, as many shaders use min/max extensively. For example, raymarchers often rely on them to compute shape unions. Avoiding unnecessary use of min and max can prevent a few headaches later.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 14 || If a is not needed later: fmul st0, st0; fxch st0, st2; fmul st0, st0; faddp st2, st0; fmul st0, st0; faddp st1, st0; fsqrt&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(a)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;a/=length(a)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1805</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1805"/>
				<updated>2025-10-07T17:42:48Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| 2*min(x,y) || 10 || Computed as a+b-abs(a-b) i.e.&amp;lt;br/&amp;gt;fld st0; fsub st0, st2; fabs; fsubp st1, st0; faddp st1, st0&amp;lt;br/&amp;gt;Replace fsubp with faddp for 2*max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| min(x,y) || 11 || fcom st0, st1; fnstsw ax; sahf; jc .S; fxch st0, st1; .S: fstp st1, st0&amp;lt;br/&amp;gt;Replace jc with jnc for max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
Notice the significant byte cost of min and max operations, primarily due to the absence of fcomi and fmovcc instructions on older processors. This can catch people off guard, as many shaders use min/max extensively. For example, raymarchers often rely on them to compute shape unions. Avoiding unnecessary use of min and max can prevent a few headaches later.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 14 || If a is not needed later: fmul st0, st0; fxch st0, st2; fmul st0, st0; faddp st2, st0; fmul st0, st0; faddp st1, st0; fsqrt&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1804</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1804"/>
				<updated>2025-10-07T09:53:28Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| 2*min(x,y) || 10 || Computed as a+b-abs(a-b) i.e.&amp;lt;br/&amp;gt;fld st0; fsub st0, st2; fabs; fsubp st1, st0; faddp st1, st0&amp;lt;br/&amp;gt;Replace fsubp with faddp for 2*max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| min(x,y) || 11 || fcom st0, st1; fnstsw ax; sahf; jc .S; fxch st0, st1; .S: fstp st1, st0&amp;lt;br/&amp;gt;Replace jc with jnc for max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
Notice the significant byte cost of min and max operations, primarily due to the absence of fcomi and fmovcc instructions on older processors. This can catch people off guard, as many shaders use min/max extensively. For example, raymarchers often rely on them to compute shape unions. Avoiding unnecessary use of min and max can prevent a few headaches later.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1803</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1803"/>
				<updated>2025-10-07T09:52:36Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| 2*min(x,y) || 10 || Computed as a+b-abs(a-b) i.e.&amp;lt;br/&amp;gt;fld st0; fsub st0, st2; fabs; fsubp st1, st0; faddp st1, st0&amp;lt;br/&amp;gt;Replace fsubp with faddp for max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| min(x,y) || 11 || fcom st0, st1; fnstsw ax; sahf; jc .S; fxch st0, st1; .S: fstp st1, st0&amp;lt;br/&amp;gt;Replace jc with jnc for 2*max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
Notice the significant byte cost of min and max operations, primarily due to the absence of fcomi and fmovcc instructions on older processors. This can catch people off guard, as many shaders use min/max extensively. For example, raymarchers often rely on them to compute shape unions. Avoiding unnecessary use of min and max can prevent a few headaches later.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1802</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1802"/>
				<updated>2025-10-07T09:51:23Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| 2*min(x,y) || 10 || Computed as a+b-abs(a-b) i.e.&amp;lt;br/&amp;gt;fld st0; fsub st0, st2; fabs; fsubp st1, st0; faddp st1, st0&amp;lt;br/&amp;gt;Replace fsubp with faddp for max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| min(x,y) || 11 || fcom st0, st1; fnstsw ax; sahf; jc .S; fxch st0, st1; .S: fstp st1, st0&amp;lt;br/&amp;gt;Replace jc with jnc for max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
Notice the significant byte cost of min and max operations, primarily due to the absence of fcomi and fmovcc instructions on older processors. This can catch people off guard, as many shaders use min/max extensively. For example, raymarchers often rely on them to compute shape unions. Avoiding unnecessary use of min and max can prevent a few headaches later.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1801</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1801"/>
				<updated>2025-10-07T07:27:04Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: added 2*min(x,y)&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| 2*min(x,y) || 10 || Computed as a+b-abs(a-b) i.e.&amp;lt;br/&amp;gt;fld st0; fsub st0, st2; fabs; fsubp st1, st0; faddp st1, st0&lt;br /&gt;
|-&lt;br /&gt;
| min(x,y) || 11 || fcom st0, st1; fnstsw ax; sahf; jc .S; fxch st0, st1; .S: fstp st1, st0&amp;lt;br/&amp;gt;Replace jc with jnc for max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
Notice the significant byte cost of min and max operations, primarily due to the absence of fcomi and fmovcc instructions on older processors. This can catch people off guard, as many shaders use min/max extensively. For example, raymarchers often rely on them to compute shape unions. Avoiding unnecessary use of min and max can prevent a few headaches later.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1800</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1800"/>
				<updated>2025-10-07T05:32:05Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Scalar functions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| min(x,y) || 11 || fcom st0, st1; fnstsw ax; sahf; jc .S; fxch st0, st1; .S: fstp st1, st0&amp;lt;br/&amp;gt;Replace jc with jnc for max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
Notice the significant byte cost of min and max operations, primarily due to the absence of fcomi and fmovcc instructions on older processors. This can catch people off guard, as many shaders use min/max extensively. For example, raymarchers often rely on them to compute shape unions. Avoiding unnecessary use of min and max can prevent a few headaches later.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1799</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1799"/>
				<updated>2025-10-07T05:30:18Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Scalar functions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| min(x,y) || 11 || fcom st0, st1; fnstsw ax; sahf; jc .S; fxch st0, st1; .S: fstp st1, st0&amp;lt;br/&amp;gt;Replace jc with jnc for max(x,y)&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
Notice the pretty significant byte cost for min and max, due to lack of fcomi and fmovcc instructions in old processors. This can catch people off-guard, as many shaders use min/max liberally (e.g., raymarchers for taking unions of shapes). Avoiding unnecessary min and max generally saves a few head-aches later.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1798</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1798"/>
				<updated>2025-10-07T05:07:35Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS. Furthermore, the x87's limit of 8 stack items imposes significant constraints on the number of temporary variables you can use: for example, two 3D vectors already occupy 6 stack slots, and you typically need at least one additional item as scratch space, so that's almost the entire stack already.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1797</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1797"/>
				<updated>2025-10-07T05:00:59Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1796</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1796"/>
				<updated>2025-10-07T04:59:07Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Rounding and remainders */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! Rough x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint with the default rounding mode, which is nearest&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2-7 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1795</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1795"/>
				<updated>2025-10-07T04:57:21Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Scalar functions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! Rough x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| exp2(y) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(y) || 18 || Computed as 2^(y*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(y) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint (the default rounding mode is to nearest)&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2 + up to 5 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2 + up to 5 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1794</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1794"/>
				<updated>2025-10-06T11:52:59Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Rounding and remainders */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! Rough x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| exp2(x) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(x) || 18 || Computed as 2^(x*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(x) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint (the default rounding mode is to nearest)&lt;br /&gt;
|-&lt;br /&gt;
| mod(x,y) || 2 || fprem or fprem1. Notice they compute the remainder, not modulo, so they are the same only for positive values of x.&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2 + up to 5 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2 + up to 5 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1793</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1793"/>
				<updated>2025-10-06T10:33:00Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing them in x86/x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for creating such prototypes. However, the shaders are written in WebGL, which is a relatively powerful language and includes native support for vectors, matrices, many built-in functions, and arithmetic operations. Most of these features are not available in x86 assembly. It is fairly easy to write tiny shaders in Shadertoy that end up well over 256 bytes once finally ported to DOS.&lt;br /&gt;
&lt;br /&gt;
To make sure your Shadertoy prototype is portable to DOS, you should avoid operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you’ll find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! Rough x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;fdivr&amp;lt;/code&amp;gt; instructions, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| exp2(x) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(x) || 18 || Computed as 2^(x*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(x) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are no &amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; instructions on x87 and implementing them yourself is probably not worth your bytes. This is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint (the default rounding mode is to nearest)&lt;br /&gt;
|-&lt;br /&gt;
| x % y || 2 || fprem or fprem1&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2 + up to 5 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2 + up to 5 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need the full 4 bytes to define a single IEEE floating-point number—sometimes even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code or data where at least the first few bits of the mantissa are correct, to slightly increase the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating-point values encode to, and what different byte patterns represent as floating-point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1792</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1792"/>
				<updated>2025-10-05T19:49:35Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing it in x86 / x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for making such prototypes. However, the ShaderToy language is a WebGL, which is a relatively powerful language, and includes native supports for vectors, matrices, built-in functions, and arithmetic, most of which are not available in x86 assembly. Thus, it is fairly easy to write stuff that looks tiny in ShaderToy but goes way over 256b when porting it finally to DOS.&lt;br /&gt;
&lt;br /&gt;
To make sure your ShaderToy prototype is portable to DOS, you should avoid all the operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! Rough x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; instruction, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| exp2(x) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(x) || 18 || Computed as 2^(x*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(x) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; are probably not worth your time, which is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint (the default rounding mode is to nearest)&lt;br /&gt;
|-&lt;br /&gt;
| x % y || 2 || fprem or fprem1&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2 + up to 5 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2 + up to 5 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is a short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need all full 4 bytes to define a single IEEE floating point number, but sometimes you even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code/data where at least the first few bits of mantissa are correct, to increase the accuracy slightly. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating point values encode to, and what different byte patterns are as floating point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1791</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1791"/>
				<updated>2025-10-05T19:46:32Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing it in x86 / x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for making such prototypes. However, the ShaderToy language is a WebGL, which is a relatively powerful language, and includes native supports for vectors, matrices, built-in functions, and arithmetic, most of which are not available in x86 assembly. Thus, it is fairly easy to write stuff that looks tiny in ShaderToy but goes way over 256b when porting it finally to DOS.&lt;br /&gt;
&lt;br /&gt;
To make sure your ShaderToy prototype is portable to DOS, you should avoid all the operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! Rough x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; instruction, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| exp2(x) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(x) || 18 || Computed as 2^(x*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(x) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; are probably not worth your time, which is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint (the default rounding mode is to nearest)&lt;br /&gt;
|-&lt;br /&gt;
| x % y || 2 || fprem or fprem1&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2 + up to 5 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2 + up to 5 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a trashable register with suitable parity, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants are built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi; fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is a short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need all full 4 bytes to define a single IEEE floating point number, but sometimes you even a single byte suffices. With a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code/data where at least the first few bits of mantissa are correct, to increase the accuracy slightly. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating point values encode to, and what different byte patterns are as floating point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1790</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1790"/>
				<updated>2025-10-05T19:36:00Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing it in x86 / x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for making such prototypes. However, the ShaderToy language is a WebGL, which is a relatively powerful language, and includes native supports for vectors, matrices, built-in functions, and arithmetic, most of which are not available in x86 assembly. Thus, it is fairly easy to write stuff that looks tiny in ShaderToy but goes way over 256b when porting it finally to DOS.&lt;br /&gt;
&lt;br /&gt;
To make sure your ShaderToy prototype is portable to DOS, you should avoid all the operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! Rough x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; instruction, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| exp2(x) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(x) || 18 || Computed as 2^(x*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(x) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; are probably not worth your time, which is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint (the default rounding mode is to nearest)&lt;br /&gt;
|-&lt;br /&gt;
| x % y || 2 || fprem or fprem1&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2 + up to 5 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2 + up to 5 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a register with suitable parity available to trash, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants are built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi;fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is a short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need all full 4 bytes to define a single IEEE floating point number, but sometimes you even a single byte suffices: with a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code/data so that at least the first few bits of mantissa are correct, to increase a bit of the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating point values encode to, and what different byte patterns are as floating point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1789</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1789"/>
				<updated>2025-10-05T19:23:44Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing it in x86 / x87 assembly. [https://shadertoy.com/ Shadertoy] is a popular choice for making such prototypes. However, the ShaderToy language is a WebGL, which is a relatively powerful language, and includes native supports for vectors, matrices, built-in functions, and arithmetic, most of which are not available in x86 assembly. Thus, it is fairly easy to write stuff that looks tiny in ShaderToy but goes way over 256b when porting it finally to DOS.&lt;br /&gt;
&lt;br /&gt;
To make sure your ShaderToy prototype is portable to DOS, you should avoid all the operations that are going to be costly (in terms of bytes) and only use ones that will be cheap in assembly. Below you find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! Rough x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; instruction, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| exp2(x) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(x) || 18 || Computed as 2^(x*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(x) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; are probably not worth your time, which is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint (the default rounding mode is to nearest)&lt;br /&gt;
|-&lt;br /&gt;
| x % y || 2 || fprem or fprem1&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2 + up to 5 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2 + up to 5 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! x87 equivalent !! Bytes&lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a register with suitable parity available to trash, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants are built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi;fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is a short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need all full 4 bytes to define a single IEEE floating point number, but sometimes you even a single byte suffices: with a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code/data so that at least the first few bits of mantissa are correct, to increase a bit of the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating point values encode to, and what different byte patterns are as floating point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1788</id>
		<title>Prototyping DOS effects with ShaderToy</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Prototyping_DOS_effects_with_ShaderToy&amp;diff=1788"/>
				<updated>2025-10-05T19:21:08Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: Created page with &amp;quot;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing it in x86 assembly / x87 opcodes. [https://shadertoy.com/ Shadertoy] is a...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Sometimes it is useful to prototype ideas for DOS effects before going through the trouble of writing it in x86 assembly / x87 opcodes. [https://shadertoy.com/ Shadertoy] is a popular choice for making such prototypes. However, the ShaderToy language is a WebGL, which is a relatively powerful language including native supports for vectors, matrices, and lots of built-in functions and arithmetic, most of which x86 assembly does not have. Thus, it is fairly easy to write stuff that looks tiny in ShaderToy but goes way over 256b when porting it finally to DOS.&lt;br /&gt;
&lt;br /&gt;
To make sure your ShaderToy prototype is portable to DOS, you should avoid all the operations that are going to be costly, in terms of bytes, and only use ones that will be (fairly) cheap in x87. Below you find some size estimates for WebGL code once ported to x87 math.&lt;br /&gt;
&lt;br /&gt;
== Scalar operators ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! Rough x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| x+=y || 2 || faddp st1, st0 &lt;br /&gt;
|-&lt;br /&gt;
| x+y || 4 || If both x and y are needed later:&amp;lt;br/&amp;gt;fld st0; fadd st0, st2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The cost for &amp;lt;code&amp;gt;-&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt; scalar operations is identical. A lot of this depends on how your x87 stack is organized (which variable is at the top of the stack at st0) and whether you need to keep copies of the variables for later use. In the last optimization phases, you can often save a few bytes by reorganizing your stack, to avoid unnecessary &amp;lt;code&amp;gt;fld&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;fxch&amp;lt;/code&amp;gt; instructions.&lt;br /&gt;
&lt;br /&gt;
Notice the existence of &amp;lt;code&amp;gt;fsubr&amp;lt;/code&amp;gt; instruction, so &amp;lt;code&amp;gt;x=(y/x)&amp;lt;/code&amp;gt; can still be just 2 bytes, even if it looks more complicated in ShaderToy.&lt;br /&gt;
&lt;br /&gt;
Also notice that operating on a single component of a vector (&amp;lt;code&amp;gt;b.x += a.x&amp;lt;/code&amp;gt;) is actually a scalar operation and thus takes the same 2-4 bytes.&lt;br /&gt;
&lt;br /&gt;
== Scalar functions ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| -x || 2 || fchs&lt;br /&gt;
|-&lt;br /&gt;
| abs(x) || 2 || fabs&lt;br /&gt;
|-&lt;br /&gt;
| sqrt(x) || 2 || fsqrt&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) || 2 || fsin&lt;br /&gt;
|-&lt;br /&gt;
| cos(x) || 2 || fcos&lt;br /&gt;
|-&lt;br /&gt;
| sin(x) ... cos(x) || 2 || fsincos&lt;br /&gt;
|-&lt;br /&gt;
| tan(x) || 2 || fptan&lt;br /&gt;
|-&lt;br /&gt;
| atan(y,x) || 2 || fpatan&lt;br /&gt;
|-&lt;br /&gt;
| log2(x) || 4 || fld1&amp;lt;br&amp;gt;...&amp;lt;br&amp;gt;fyl2x&lt;br /&gt;
|-&lt;br /&gt;
| exp2(x) || 14 || fld1; fld st1; fprem; f2xm1; faddp st1,st0; fscale; fstp st1&lt;br /&gt;
|-&lt;br /&gt;
| pow(x,y) || 16 || Computed as 2^(y*log2(x)) i.e. fyl2x, followed by the exp2(x) code&lt;br /&gt;
|-&lt;br /&gt;
| exp(x) || 18 || Computed as 2^(x*log2(e)) i.e. fldl2e and fmulp, followed by the exp2(x) code&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;acos&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asin&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;sinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;cosh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;asinh&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;acosh&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;atanh&amp;lt;/code&amp;gt; are probably not worth your time, which is a pity, as &amp;lt;code&amp;gt;tanh&amp;lt;/code&amp;gt; is a classic &amp;quot;squash&amp;quot; function to get any number into -1 .. 1 range.&lt;br /&gt;
&lt;br /&gt;
== Rounding and remainders ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! Bytes !! x87 equivalent &lt;br /&gt;
|-&lt;br /&gt;
| round(x) || 2 || frndint (the default rounding mode is to nearest)&lt;br /&gt;
|-&lt;br /&gt;
| x % y || 2 || fprem or fprem1&lt;br /&gt;
|-&lt;br /&gt;
| ceil(x) || 2 + up to 5 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|-&lt;br /&gt;
| floor(x) || 2 + up to 5 || Up to 5 bytes to setup the rounding mode with fldcw, followed by frndint&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Notice that &amp;lt;code&amp;gt;x-round(x)&amp;lt;/code&amp;gt; is a very compact way to do domain repetition for raymarchers.&lt;br /&gt;
&lt;br /&gt;
== Vector arithmetic (examples) ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! ShaderToy !! x87 equivalent !! Bytes&lt;br /&gt;
|-&lt;br /&gt;
|| a.xy = a.yx || 2 || fxch st0, st1&lt;br /&gt;
|-&lt;br /&gt;
|| a.xyz = a.yzx || 4 || fxch st0, st2; fxch st0, st1;&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 5-6 || Assuming b is not needed later.&amp;lt;br/&amp;gt;6 bytes: faddp st3, st0; faddp st3, st0; faddp st3, st0;&amp;lt;br/&amp;gt;5 bytes: if you have a register with suitable parity available to trash, use the [[General_Coding_Tricks#Looping_three_times|looping three times trick]]&lt;br /&gt;
|-&lt;br /&gt;
|| dot(a,b) || 9-10 || If neither a or b is needed later, compute this as a*=b followed by a.z+=a.y+=a.x&lt;br /&gt;
|-&lt;br /&gt;
|| a+=b || 10 || If b is needed later: fadd st3; fld st1; faddp st5; fld st2; faddp st6&lt;br /&gt;
|-&lt;br /&gt;
|| length(a) || 16 || If a is not needed later: fmul st0, st0; fld st1; fmul st0, st0; faddp st1, st0; fld st2; fmul st0, st0; faddp st1, st0; fsqrt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
From this you can already see that a simple &amp;lt;code&amp;gt;normalize(x)&amp;lt;/code&amp;gt; is going to take a lot of bytes, as it has to be computed as &amp;lt;code&amp;gt;x/=length(x)&amp;lt;/code&amp;gt;. Therefore, normalizing your raymarchers rays is usually to be avoided. &amp;lt;code&amp;gt;cross&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;reflect&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;refract&amp;lt;/code&amp;gt; are probably also too costly for sizecoding.&lt;br /&gt;
&lt;br /&gt;
== Floating point constants ==&lt;br /&gt;
&lt;br /&gt;
x87 has the following constants are built-in and loading each takes just 2 bytes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Constant !! Approximation !! Instruction&lt;br /&gt;
|-&lt;br /&gt;
| 0.0 || 0.0 || fldz&lt;br /&gt;
|-&lt;br /&gt;
| 1.0 || 1.0 || fld1&lt;br /&gt;
|-&lt;br /&gt;
| pi || 3.14159... || fldpi&lt;br /&gt;
|-&lt;br /&gt;
| log2(e) || 1.44270... || fldl2e&lt;br /&gt;
|-&lt;br /&gt;
| loge(2) || 0.69315... || fldln2&lt;br /&gt;
|-&lt;br /&gt;
| log2(10) || 3.32193... || fldl2t&lt;br /&gt;
|-&lt;br /&gt;
| log10(2)|| 0.30103... || fldlg2&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Thus, if you just need &amp;quot;some random constant&amp;quot; in your shader, using one of these can save bytes. Notice, however, that &amp;lt;code&amp;gt;fldpi;fmulp st1, st0&amp;lt;/code&amp;gt; is still 4 bytes, whereas &amp;lt;code&amp;gt;fmul st0, dword [bp+offset]&amp;lt;/code&amp;gt; can be as little as 3 bytes, if the offset is a short and you can reuse code or another value as the constant.&lt;br /&gt;
&lt;br /&gt;
Even if you need to define a new constant, you don't always need all full 4 bytes to define a single IEEE floating point number, but sometimes you even a single byte suffices: with a single byte, you can already define the exponent of a float, so the order of magnitude is already correct. You can then try to place this somewhere in your code/data so that at least the first few bits of mantissa are correct, to increase a bit of the accuracy. You can use tools like [https://www.h-schmidt.net/FloatConverter/IEEE754.html this] to see what floating point values encode to, and what different byte patterns are as floating point constants.&lt;br /&gt;
&lt;br /&gt;
== Case study: Balrog == &lt;br /&gt;
&lt;br /&gt;
With all the earlier in mind, [https://github.com/vsariola/balrog/ Balrog] 256b executable graphics can serve as a case study. Balrog is a fractal raymarcher, with the innermost loop of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for(int j=0;j&amp;lt;ITERS;j++){                    &lt;br /&gt;
    t.x = abs(t.x - round(t.x)); // abs is folding, t.x - round(t.x) is domain repetition               &lt;br /&gt;
    t.x += t.x; // domain scaling&lt;br /&gt;
    r *= RSCALE;          &lt;br /&gt;
    r += t.x*t.x;&lt;br /&gt;
    t.xyz = t.yzx; // shuffle coordinates so next time we operate on previous y etc.&lt;br /&gt;
    t.x += t.z * o; // rotation, but using very poor math&lt;br /&gt;
    t.z -= t.x * o;               &lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Even if there's vectors, the code mostly does scalar math, and then uses coordinate shuffling (t.xyz = t.yzx) to do math on other coordinates. That code ports to:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    mov     cl, ITERS&lt;br /&gt;
.maploop:&lt;br /&gt;
    fld     st0          ; t.x t.x&lt;br /&gt;
    frndint&lt;br /&gt;
    fsubp   st1, st0     ; t.x-round(t.x)&lt;br /&gt;
    fabs                 ; t.x = abs(t.x - round(t.x))&lt;br /&gt;
    fadd    st0          ; t.x += t.x;&lt;br /&gt;
    fld     dword [c_rscale+bp-BASE]&lt;br /&gt;
    fmulp   st4, st0     ; r *= RSCALE&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    st0&lt;br /&gt;
    faddp   st4, st0     ; r += t.x*t.x&lt;br /&gt;
    fxch    st2, st0&lt;br /&gt;
    fxch    st1, st0     ; t.xyz = t.yzx&lt;br /&gt;
    fld     st2&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    faddp   st1, st0     ; t.x += t.z * o;&lt;br /&gt;
    fld     st0&lt;br /&gt;
    fmul    dword [si]&lt;br /&gt;
    fsubp   st3, st0     ; t.z -= t.x * o&lt;br /&gt;
    loop    .maploop&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The comments show exactly how each ShaderToy line maps to different x87 instructions.&lt;br /&gt;
&lt;br /&gt;
The Balrog code also later exemplifies the floating point truncation technique:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
c_mindist equ $-3&lt;br /&gt;
    db      0x38  ; 0.0001&lt;br /&gt;
c_glowamount equ $-2&lt;br /&gt;
c_colorscale equ $-2&lt;br /&gt;
    dw      0x3d61  ; 0.055&lt;br /&gt;
c_stepsizediv equ $-1&lt;br /&gt;
    db      0x03 ; 807&lt;br /&gt;
c_stepsizediv_z equ $-3&lt;br /&gt;
    db      0x40 ; 2.1006666666666662&lt;br /&gt;
c_glowdecay equ $-2&lt;br /&gt;
    dw      0x461c ; 1e4&lt;br /&gt;
c_rscale equ $-2&lt;br /&gt;
    db      0xa1, 0x3f  ; 1.2599210498948732&lt;br /&gt;
c_rdiv equ $-2&lt;br /&gt;
    dw      0x434b ; 203.18733465192963&lt;br /&gt;
c_camz equ $-1&lt;br /&gt;
    db      0xcc, 0x12, 0x42 ; 36.7&lt;br /&gt;
c_xdiv equ $-1&lt;br /&gt;
    db      0x09, 0x00, 0x40 ; 2.0006&lt;br /&gt;
c_xmult equ $-2&lt;br /&gt;
    dw      0x3f2a&lt;br /&gt;
c_camy equ $-2&lt;br /&gt;
    dw      0x3f1c ; 0.61&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Two of the constants were finally the same constant (c_glowamount and c_colorscale), many are only have the exponents (single db), and two of the constants required as much as 3 bytes to get enough precision (c_camz and c_xdiv). The ordering of the constants was carefully chosen, so that when the exponent of one constant serves as a part of the mantissa of next constant, the value is at least roughly correct.&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Techniques&amp;diff=1787</id>
		<title>Techniques</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Techniques&amp;diff=1787"/>
				<updated>2025-10-05T15:46:45Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;'''[[General Coding Tricks]]'''&lt;br /&gt;
&lt;br /&gt;
'''[[Floating-point Opcodes]]'''&lt;br /&gt;
&lt;br /&gt;
'''[[Output|Textmode Output]]'''&lt;br /&gt;
&lt;br /&gt;
'''[[Input|Handling Input]]'''&lt;br /&gt;
&lt;br /&gt;
'''[[Prototyping DOS effects with ShaderToy]]'''&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1204</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1204"/>
				<updated>2022-12-19T09:46:40Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: Add raytraced tunnel to the examples.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Byte Battles ==&lt;br /&gt;
&lt;br /&gt;
Byte Battles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.01&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;t=time()*.6&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
Any arguments passes to a function defined with &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be fetched with the ellipsis (&amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt;). For example, &amp;lt;code&amp;gt;SCN=function(x)poke(16320,x)end&amp;lt;/code&amp;gt; can be implemented as:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'poke(16320,...)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This saves 6 characters. Multiple arguments can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i%.7&amp;lt;/code&amp;gt;              || 4      || [[File:Modulo dithering.png]]       ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i^2.5%1&amp;lt;/code&amp;gt;           || 7      || [[File:Power dithering.png]]        || &amp;quot;random&amp;quot; dithering&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*i%1&amp;lt;/code&amp;gt;          || 8      || [[File:Random dithering.png]]       || also &amp;quot;random&amp;quot; dithering&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y. &amp;lt;code&amp;gt;i*25/96%1&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;i*97/192&amp;lt;/code&amp;gt; if desperate.&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4+((x&amp;amp;2)-y//2%2)%4/16&amp;lt;/code&amp;gt; || 33   || [[File:Block4 dithering exp.png]] ||  4x4 with dithering matrix defined by an expression&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;('0415627326158473'):sub(p,p)/8&amp;lt;/code&amp;gt; || 44   || [[File:Block4 dithering.png]] ||  4x4 with data-defined dithering matrix, &amp;lt;code&amp;gt;p&amp;lt;/code&amp;gt; defined as &amp;lt;code&amp;gt;p=y%4*4+x%4+1&amp;lt;/code&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays as it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raytracer ===&lt;br /&gt;
&lt;br /&gt;
The raytraced geometry is a tunnel (cylinder). Note that the texture creates both positive and negative values, but applying the distance based fog to both values makes them darker, due to the way the palette is set.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- set palette so scaling color based&lt;br /&gt;
  -- on distance works&lt;br /&gt;
  j=i%48 &lt;br /&gt;
  poke(16320+j,s(j/15)*255)&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- rotate camera&lt;br /&gt;
  a=s(t/23)*3&lt;br /&gt;
  u,w=u*s(a+8)+w*s(a),w*s(a+8)-u*s(a)&lt;br /&gt;
  -- find where ray hits tunnel wall  &lt;br /&gt;
  d=1/(u*u+v*v)^.5&lt;br /&gt;
  z=d*w+t/9 -- move camera with time  &lt;br /&gt;
  q=math.atan2(u,v)&lt;br /&gt;
  -- tunnel texture (&amp;quot;plasma&amp;quot;)&lt;br /&gt;
  c=s(q*5+s(z)*2)*s(z*2+s(q*3)*3)&lt;br /&gt;
  poke4(i,c*.8^d*9) -- distance based fog&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1203</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1203"/>
				<updated>2022-12-19T09:12:16Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Dithering */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Byte Battles ==&lt;br /&gt;
&lt;br /&gt;
Byte Battles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.01&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;t=time()*.6&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
Any arguments passes to a function defined with &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be fetched with the ellipsis (&amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt;). For example, &amp;lt;code&amp;gt;SCN=function(x)poke(16320,x)end&amp;lt;/code&amp;gt; can be implemented as:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'poke(16320,...)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This saves 6 characters. Multiple arguments can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i%.7&amp;lt;/code&amp;gt;              || 4      || [[File:Modulo dithering.png]]       ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i^2.5%1&amp;lt;/code&amp;gt;           || 7      || [[File:Power dithering.png]]        || &amp;quot;random&amp;quot; dithering&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*i%1&amp;lt;/code&amp;gt;          || 8      || [[File:Random dithering.png]]       || also &amp;quot;random&amp;quot; dithering&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y. &amp;lt;code&amp;gt;i*25/96%1&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;i*97/192&amp;lt;/code&amp;gt; if desperate.&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4+((x&amp;amp;2)-y//2%2)%4/16&amp;lt;/code&amp;gt; || 33   || [[File:Block4 dithering exp.png]] ||  4x4 with dithering matrix defined by an expression&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;('0415627326158473'):sub(p,p)/8&amp;lt;/code&amp;gt; || 44   || [[File:Block4 dithering.png]] ||  4x4 with data-defined dithering matrix, &amp;lt;code&amp;gt;p&amp;lt;/code&amp;gt; defined as &amp;lt;code&amp;gt;p=y%4*4+x%4+1&amp;lt;/code&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays as it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1202</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1202"/>
				<updated>2022-12-19T09:06:48Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Byte Battles ==&lt;br /&gt;
&lt;br /&gt;
Byte Battles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.01&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;t=time()*.6&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
Any arguments passes to a function defined with &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be fetched with the ellipsis (&amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt;). For example, &amp;lt;code&amp;gt;SCN=function(x)poke(16320,x)end&amp;lt;/code&amp;gt; can be implemented as:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'poke(16320,...)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This saves 6 characters. Multiple arguments can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i%.7&amp;lt;/code&amp;gt;              || 4      || [[File:Modulo dithering.png]]       ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i^2.5%1&amp;lt;/code&amp;gt;           || 7      || [[File:Power dithering.png]]        || &amp;quot;random&amp;quot; dithering&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*i%1&amp;lt;/code&amp;gt;          || 8      || [[File:Random dithering.png]]       || also &amp;quot;random&amp;quot; dithering&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y. &amp;lt;code&amp;gt;i*25/96%1&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;i*97/192&amp;lt;/code&amp;gt; if desperate.&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4+((x&amp;amp;2)-y//2%2)%4/16&amp;lt;/code&amp;gt; || 33   || [[File:Block4 dithering exp.png]] ||  4x4 with expression defined dithering matrix defined by an expression&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;('0415627326158473'):sub(p,p)/8&amp;lt;/code&amp;gt; || 44   || [[File:Block4 dithering.png]] ||  4x4 with data-defined dithering matrix, &amp;lt;code&amp;gt;p&amp;lt;/code&amp;gt; defined as &amp;lt;code&amp;gt;p=y%4*4+x%4+1&amp;lt;/code&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays as it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1201</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1201"/>
				<updated>2022-12-19T09:06:11Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: Add 4x4 dithering with expression defined dithering matrix.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Byte Battles ==&lt;br /&gt;
&lt;br /&gt;
Byte Battles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.01&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;t=time()*.6&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
Any arguments passes to a function defined with &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be fetched with the ellipsis (&amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt;). For example, &amp;lt;code&amp;gt;SCN=function(x)poke(16320,x)end&amp;lt;/code&amp;gt; can be implemented as:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'poke(16320,...)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This saves 6 characters. Multiple arguments can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i%.7&amp;lt;/code&amp;gt;              || 4      || [[File:Modulo dithering.png]]       ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i^2.5%1&amp;lt;/code&amp;gt;           || 7      || [[File:Power dithering.png]]        || &amp;quot;random&amp;quot; dithering&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*i%1&amp;lt;/code&amp;gt;          || 8      || [[File:Random dithering.png]]       || also &amp;quot;random&amp;quot; dithering&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y. &amp;lt;code&amp;gt;i*25/96%1&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;i*97/192&amp;lt;/code&amp;gt; if desperate.&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4+((x&amp;amp;2)-y//2%2)%4/16&amp;lt;/code&amp;gt; || 33   || [[File:Block4 dithering exp.png|thumb]] ||  4x4 with expression defined dithering matrix defined by an expression&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;('0415627326158473'):sub(p,p)/8&amp;lt;/code&amp;gt; || 44   || [[File:Block4 dithering.png]] ||  4x4 with data-defined dithering matrix, &amp;lt;code&amp;gt;p&amp;lt;/code&amp;gt; defined as &amp;lt;code&amp;gt;p=y%4*4+x%4+1&amp;lt;/code&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays as it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=File:Block4_dithering_exp.png&amp;diff=1200</id>
		<title>File:Block4 dithering exp.png</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=File:Block4_dithering_exp.png&amp;diff=1200"/>
				<updated>2022-12-19T09:05:37Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Illustrating 4x4 block dithering with expression defined dithering matrix&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1199</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1199"/>
				<updated>2022-12-19T08:52:53Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Byte Battles ==&lt;br /&gt;
&lt;br /&gt;
Byte Battles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.01&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;t=time()*.6&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
Any arguments passes to a function defined with &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be fetched with the ellipsis (&amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt;). For example, &amp;lt;code&amp;gt;SCN=function(x)poke(16320,x)end&amp;lt;/code&amp;gt; can be implemented as:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'poke(16320,...)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This saves 6 characters. Multiple arguments can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i%.7&amp;lt;/code&amp;gt;              || 4      || [[File:Modulo dithering.png]]       ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i^2.5%1&amp;lt;/code&amp;gt;           || 7      || [[File:Power dithering.png]]        || &amp;quot;random&amp;quot; dithering&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*i%1&amp;lt;/code&amp;gt;          || 8      || [[File:Random dithering.png]]       || also &amp;quot;random&amp;quot; dithering&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y. &amp;lt;code&amp;gt;i*25/96%1&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;i*97/192&amp;lt;/code&amp;gt; if desperate.&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;('0415627326158473'):sub(p,p)/8&amp;lt;/code&amp;gt; || 44   || [[File:Block4 dithering.png]] ||  4x4 with data-defined dithering matrix, &amp;lt;code&amp;gt;p&amp;lt;/code&amp;gt; defined as &amp;lt;code&amp;gt;p=y%4*4+x%4+1&amp;lt;/code&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays as it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1198</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1198"/>
				<updated>2022-12-19T08:48:42Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: Add 4x4 dithering&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Byte Battles ==&lt;br /&gt;
&lt;br /&gt;
Byte Battles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.01&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;t=time()*.6&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
Any arguments passes to a function defined with &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be fetched with the ellipsis (&amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt;). For example, &amp;lt;code&amp;gt;SCN=function(x)poke(16320,x)end&amp;lt;/code&amp;gt; can be implemented as:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'poke(16320,...)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This saves 6 characters. Multiple arguments can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i%.7&amp;lt;/code&amp;gt;              || 4      || [[File:Modulo dithering.png]]       ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i^2.5%1&amp;lt;/code&amp;gt;           || 7      || [[File:Power dithering.png]]        || &amp;quot;random&amp;quot; dithering&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*i%1&amp;lt;/code&amp;gt;          || 8      || [[File:Random dithering.png]]       || also &amp;quot;random&amp;quot; dithering&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y. &amp;lt;code&amp;gt;i*25/96%1&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;i*97/192&amp;lt;/code&amp;gt; if desperate.&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;('0415627326158473'):sub(p,p)/8&amp;lt;/code&amp;gt; || 42   || [[File:Block4 dithering.png]] ||  4x4 with data-defined dithering matrix, &amp;lt;code&amp;gt;p&amp;lt;/code&amp;gt; defined as &amp;lt;code&amp;gt;p=y%4*4+x%4&amp;lt;/code&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays as it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=File:Block4_dithering.png&amp;diff=1197</id>
		<title>File:Block4 dithering.png</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=File:Block4_dithering.png&amp;diff=1197"/>
				<updated>2022-12-19T08:47:04Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Illustration of the 4x4 dithering with data-driven dithering matrix&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1196</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1196"/>
				<updated>2022-12-18T17:29:01Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: Clarify the use of ellipsis in loaded functions&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Byte Battles ==&lt;br /&gt;
&lt;br /&gt;
Byte Battles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.01&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;t=time()*.6&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
Any arguments passes to a function defined with &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be fetched with the ellipsis (&amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt;). For example, &amp;lt;code&amp;gt;SCN=function(x)poke(16320,x)end&amp;lt;/code&amp;gt; can be implemented as:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'poke(16320,...)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This saves 6 characters. Multiple arguments can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i%.7&amp;lt;/code&amp;gt;              || 4      || [[File:Modulo dithering.png]]       ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i^2.5%1&amp;lt;/code&amp;gt;           || 7      || [[File:Power dithering.png]]        || &amp;quot;random&amp;quot; dithering&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*i%1&amp;lt;/code&amp;gt;          || 8      || [[File:Random dithering.png]]       || also &amp;quot;random&amp;quot; dithering&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y. &amp;lt;code&amp;gt;i*25/96%1&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;i*97/192&amp;lt;/code&amp;gt; if desperate.&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays as it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1195</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1195"/>
				<updated>2022-12-18T17:15:31Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: add i^2.5%1 dithering&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Byte Battles ==&lt;br /&gt;
&lt;br /&gt;
Byte Battles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.01&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;t=time()*.6&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be even used to minimize a function with parameters: &amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt; returns the parameters. For example, the following example saves 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'r=...poke(16320,r)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Multiple parameters can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i%.7&amp;lt;/code&amp;gt;              || 4      || [[File:Modulo dithering.png]]       ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i^2.5%1&amp;lt;/code&amp;gt;           || 7      || [[File:Power dithering.png]]        || &amp;quot;random&amp;quot; dithering&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*i%1&amp;lt;/code&amp;gt;          || 8      || [[File:Random dithering.png]]       || also &amp;quot;random&amp;quot; dithering&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y. &amp;lt;code&amp;gt;i*25/96%1&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;i*97/192&amp;lt;/code&amp;gt; if desperate.&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays as it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=File:Power_dithering.png&amp;diff=1194</id>
		<title>File:Power dithering.png</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=File:Power_dithering.png&amp;diff=1194"/>
				<updated>2022-12-18T17:14:23Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Illustrates the i^2.5%1 dithering&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1193</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1193"/>
				<updated>2022-12-18T17:12:37Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: Add modulo dithering&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Byte Battles ==&lt;br /&gt;
&lt;br /&gt;
Byte Battles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.01&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;t=time()*.6&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be even used to minimize a function with parameters: &amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt; returns the parameters. For example, the following example saves 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'r=...poke(16320,r)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Multiple parameters can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i%.7&amp;lt;/code&amp;gt;              || 4      || [[File:Modulo dithering.png]]       ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*i%1&amp;lt;/code&amp;gt;          || 8      || [[File:Random dithering.png]]       || &amp;quot;random&amp;quot; dithering &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y. &amp;lt;code&amp;gt;i*25/96%1&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;i*97/192&amp;lt;/code&amp;gt; if desperate.&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays as it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=File:Modulo_dithering.png&amp;diff=1192</id>
		<title>File:Modulo dithering.png</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=File:Modulo_dithering.png&amp;diff=1192"/>
				<updated>2022-12-18T17:10:12Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Illustrates the i%.7 dithering&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1186</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1186"/>
				<updated>2022-08-11T05:39:42Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Basic optimizations */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Byte Battles ==&lt;br /&gt;
&lt;br /&gt;
Byte Battles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.01&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;t=time()*.6&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be even used to minimize a function with parameters: &amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt; returns the parameters. For example, the following example saves 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'r=...poke(16320,r)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Multiple parameters can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*i%1&amp;lt;/code&amp;gt;          || 8      || [[File:Random dithering.png]]       || &amp;quot;random&amp;quot; dithering &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y. &amp;lt;code&amp;gt;i*25/96%1&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;i*97/192&amp;lt;/code&amp;gt; if desperate.&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays as it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1167</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1167"/>
				<updated>2022-06-15T06:43:10Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Motion blur */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Byte Battles ==&lt;br /&gt;
&lt;br /&gt;
Byte Battles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.1&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;t=time()/399&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be even used to minimize a function with parameters: &amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt; returns the parameters. For example, the following example saves 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'r=...poke(16320,r)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Multiple parameters can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*i%1&amp;lt;/code&amp;gt;          || 8      || [[File:Random dithering.png]]       || &amp;quot;random&amp;quot; dithering &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y. &amp;lt;code&amp;gt;i*25/96%1&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;i*97/192&amp;lt;/code&amp;gt; if desperate.&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays as it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1166</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1166"/>
				<updated>2022-06-15T06:41:13Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Byte Battles */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Byte Battles ==&lt;br /&gt;
&lt;br /&gt;
Byte Battles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.1&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;t=time()/399&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be even used to minimize a function with parameters: &amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt; returns the parameters. For example, the following example saves 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'r=...poke(16320,r)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Multiple parameters can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*i%1&amp;lt;/code&amp;gt;          || 8      || [[File:Random dithering.png]]       || &amp;quot;random&amp;quot; dithering &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y. &amp;lt;code&amp;gt;i*25/96%1&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;i*97/192&amp;lt;/code&amp;gt; if desperate.&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays at it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=TIC-80&amp;diff=1164</id>
		<title>TIC-80</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=TIC-80&amp;diff=1164"/>
				<updated>2022-06-03T20:08:23Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: Add link to pakettic&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Setting up ===&lt;br /&gt;
As the TIC-80 fantasy computer is an all-in-one creation and execution platform, setting up TIC-80 is very easy:&lt;br /&gt;
&lt;br /&gt;
Just go to the https://github.com/nesbox/TIC-80/releases page &lt;br /&gt;
&lt;br /&gt;
and download the package for your platform of choice (Windows, OSX, Linux and even Raspberry Pi).&lt;br /&gt;
&lt;br /&gt;
Or if you are just curious you can just start doodling online at http://tic80.com/&lt;br /&gt;
&lt;br /&gt;
=== Getting started ===&lt;br /&gt;
Most TIC-80 programs are coded using the Lua Scripting language. However it is possible to select different scripting language like javascript at the cost of a couple of bytes/characters like so (respectively for JavaScript, MoonScript, Wren, Fennel, Squirrel):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;js&amp;quot;&amp;gt;//script: js&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;-- script: moon&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;js&amp;quot;&amp;gt;// script: wren&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;fennel&amp;quot;&amp;gt;;; script: fennel&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;js&amp;quot;&amp;gt;// script: squirrel&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The main function used for updating the screen (and called 60 times a second) is the TIC() function, so this function is also a requirement for doing anything with graphics. Additionally you can also setup a sub=function SCN() that is called once per scanline at the costs of more bytes/characters. &lt;br /&gt;
&lt;br /&gt;
Most animated effects will also need to use some kind of a timer, so you are likely to also use the built-in time() function or keep track of your time (t) yourself as well..  So a minimal setup would look something like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()t=time()&lt;br /&gt;
-- your effect code&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://github.com/nesbox/TIC-80/wiki/tic See here] to know how tic() is called in the different language supported by TIC-80.&lt;br /&gt;
&lt;br /&gt;
A full overview of the TIC80 memory map and most common used function is available in this handy TIC80 cheatsheet, as well as the TIC80 wiki page.&lt;br /&gt;
&lt;br /&gt;
https://zenithsal.com/assets/documents/tic-80_cheatsheet.pdf&lt;br /&gt;
&lt;br /&gt;
=== Video display ===&lt;br /&gt;
The TIC-80 has a 240x136 pixel display with 16colors which can be accessed via a wide range of graphics functions or by writing directly to VRAM at memory address 0x0000 using the &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; instruction, that just change 4 bits. The address have to be multiplied by 2 when using poke4. access to 0x1000 for example is 0x02000 (high nibble) and 0x02001 (low nibble).&lt;br /&gt;
&lt;br /&gt;
==== Draw functions ====&lt;br /&gt;
There are a couple of built-in drawing functions you can use:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
cls(color=0)&lt;br /&gt;
pix(x,y[color]) [-&amp;gt; color]&lt;br /&gt;
circ(x,y,r,color) -- filled circle&lt;br /&gt;
circb(x,y,r,color) -- border circle&lt;br /&gt;
rect(x,y,w,h,color) -- filled rect&lt;br /&gt;
rectb(x,y,w,h,color) -- border rect&lt;br /&gt;
line(x0,y0,x1,y1,color)&lt;br /&gt;
tri(x1,y1,x2,y2,x3,y3,color)&lt;br /&gt;
textri(x1,y1,x2,y2,x3,y3,u1,v1,u2,v2,u3,v3,use_map=false,colorkey=-1)&lt;br /&gt;
print(text,x=0,y=0,color=15,fixed=false,scale=1,smallfont=false) -&amp;gt; width&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Getting something on screen ====&lt;br /&gt;
Here is a bit of code to get you started:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC() &lt;br /&gt;
t=time()/99&lt;br /&gt;
for y=0,136 do for x=0,240 do&lt;br /&gt;
pix(x,y,(x&amp;gt;&amp;gt;3~y&amp;gt;&amp;gt;3)+t)&lt;br /&gt;
end;end;end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which will display an animated XOR pattern.&lt;br /&gt;
&lt;br /&gt;
==== Color Palette ====&lt;br /&gt;
The best way to start is to use the default sweetie16 palette (https://lospec.com/palette-list/sweetie-16) as this palette&lt;br /&gt;
offers a nice selection of 16 colors arranged in such a way that they are easily accessable. From the verion 0.9b version and beyond you can initialise the new default sweetie16 palette at startup by adding a 0x11 Chunk to your TIC-80 cartridge. &lt;br /&gt;
&lt;br /&gt;
Normally a chunk would contain 4 bytes of header + data, but as this chunk has no data, it is possible to omit the extra 3 bytes of chunk-header if you place it at the end of your TIC cartridge. The new TIC-Packer linked below has the option to do this for you.&lt;br /&gt;
&lt;br /&gt;
==== Setting your own color palette ====&lt;br /&gt;
Alternatively you can setup your own palette by writing to the palette area located at 0x3fc0 like so:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
for i=0,47 do poke (0x3fc0+i,i*5)end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
This produces a nice grayscale palette of 16 shades to work with.&lt;br /&gt;
&lt;br /&gt;
==== Color index shuffling ====&lt;br /&gt;
If you don't want to use the sweetie16 palette you can revert back to the pre 0.8 db16 palette by simply not including a 0x11 chunk in you cartridge. Although the arrangement of color-indices is not as ideal as sweetie16, you can shuffle your color indices a bit to get 'somewhat workable' colors.&lt;br /&gt;
&lt;br /&gt;
A couple of examples for this are&lt;br /&gt;
* (color)&amp;amp;10 - Some grey/blue shade&lt;br /&gt;
* ((color)&amp;amp;6)-3 - A Nice shade of Dark-cyan-white color&lt;br /&gt;
* (color)^2 - A shade of brown/yellowish colors&lt;br /&gt;
&lt;br /&gt;
But feel free to experiment yourself as well and let us know on discord if you find something cool.&lt;br /&gt;
&lt;br /&gt;
=== Sound ===&lt;br /&gt;
The TIC-80 has soundregisters and 32 byte waveforms to access which are located at address 0FF9C in memory.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
0FF9C SOUND REGS 72 18 byte x 4 ch&lt;br /&gt;
0FFE4 WAVEFORMS 256 16 wave/ 32x4b each&lt;br /&gt;
100E4 SFX 4224 64 sounds&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Make some noise ====&lt;br /&gt;
The easiest way to get 'some' sound going is to bitbang the sound-registers and hope for the best, for example:&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=function()for i=0,71 do poke(65436+i,(time()/7.2)%64)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A more the &amp;quot;proper&amp;quot; way involves something like : define the waveform yourself (f.e. sawtooth), repeatedly (because for some reason one time is not enough), then write low part of the frequency to one byte, and the high nibble combined with the volume to another) &lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=function()&lt;br /&gt;
for i=0,31 do poke4(2*65438+i,i/2) end -- setup waveforem&lt;br /&gt;
t=time()/10 &lt;br /&gt;
-- write frequencies&lt;br /&gt;
poke(65436+0,t%256) &lt;br /&gt;
poke(65437+0,(t/65536)%16+240)&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
But as you can see this costs considerably more bytes to setup.&lt;br /&gt;
&lt;br /&gt;
=== Final Optimisations ===&lt;br /&gt;
When you are happy with your intro and want to get it ready for release, it becomes time to look at squeezing those last bytes.&lt;br /&gt;
As a goal-post, you should always aim to have your uncompressed effect around the target size, and work from there.&lt;br /&gt;
&lt;br /&gt;
Final optimisation can be done by stringing as much code together on single lines and removing any extra spaces and whitelines.&lt;br /&gt;
A rule of thumb for this is that of the first or last character of a variable or function isn't a valid hex number (i.e. A-F) you can omit whitespace (so that: x=0 y=0 z=0 can become x=0y=0z=0)&lt;br /&gt;
&lt;br /&gt;
=== Release ===&lt;br /&gt;
For releasing an intro at a demoscene event, a raw TIC cartridge file without any additional graphics/sound/metadata is needed.&lt;br /&gt;
&lt;br /&gt;
Creating a http://www.sizecoding.org/index.php?title=Fantasy_Consoles&amp;amp;action=edit&amp;amp;section=13 TIC cartridge file adds a 4 byte header + 1 extra byte for a 0x11 sweetie16 chunk.&lt;br /&gt;
&lt;br /&gt;
Luckily there are various packers that help you convert your (LUA) Script to a empty TIC Cartridge with a single ZLIB compressed code block and optional 0x11 (sweetie16) palette chunk. See the additional links for links to these packers.&lt;br /&gt;
 &lt;br /&gt;
==== Exporting Video as Animated GIF ====&lt;br /&gt;
The TIC80 environment has a neat feature that lets you export your intro directly as an animated GIF file to converted to video later, by Pressing the F9 key to start and stop recording. However, there is a default recording limit capped to a fixed number of frames or seconds. You can change this in the tic80 config to a bigger number to match your recording-size. &lt;br /&gt;
&lt;br /&gt;
If your intro is taking up too many resources and starts chugging a bit on your machine, it can be wise to make a version that steps through time lineary by adding a number to your t variable yourself instead of using the time() function.&lt;br /&gt;
&lt;br /&gt;
==== Online version: Metadata and Thumbnail image ====&lt;br /&gt;
When uploading the intro to the TIC80 website for a playable online version, you will need to build a new TIC file with some added some meta-data and Thumbnail image (You can take this screenshot using the F7 key during the demo playback) and use this as you online version. The screenshot can also be imported from a 240×136 PNG (other size will throw an error) using inside TIC-80 console &amp;lt;code&amp;gt;import screen file[.png]&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The Meta data is added at the top of your intro as follows&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
-- title: My intro&lt;br /&gt;
-- author: scener&lt;br /&gt;
-- desc: my first sizecoded TIC-80 intro&lt;br /&gt;
-- script: lua (or moon/wren/js/fennel)&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Update: As of version 0.9b the TIC80.COM website now also allows you to upload a seperate TIC file with the metadata and keep the uploaded binary TIC file as code only.&lt;br /&gt;
&lt;br /&gt;
=== Additional Resources ===&lt;br /&gt;
Sizecoding on the TIC-80 is still in its infancy, but luckily there is already plenty of information to get you started!&lt;br /&gt;
&lt;br /&gt;
* TIC-80 Wiki page https://github.com/nesbox/TIC-80/wiki&lt;br /&gt;
* TIC-80 One page cheat sheet (PDF) https://zenithsal.com/assets/documents/tic-80_cheatsheet.pdf&lt;br /&gt;
* TIC-80 Intros and demos on Pouet (Press F1 for code): https://www.pouet.net/prodlist.php?platform%5B%5D=TIC-80&lt;br /&gt;
* TIC-80 TIC Cartridge File Format (from TIC-80 Wiki) https://github.com/nesbox/TIC-80/wiki/tic-File-Format&lt;br /&gt;
* TIC-80 Packer https://bitbucket.org/WaterEnVuur/tic80-packer/src/master/&lt;br /&gt;
* Pactic, fork de TIC-80 Packer https://github.com/phlubby/pactic&lt;br /&gt;
* TIC-Tool https://github.com/exoticorn/tic-tool&lt;br /&gt;
* pakettic, https://github.com/vsariola/pakettic&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1083</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1083"/>
				<updated>2022-02-20T09:57:56Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Dithering */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Bytebattles ==&lt;br /&gt;
&lt;br /&gt;
Bytebattles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.1&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;t=time()/399&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be even used to minimize a function with parameters: &amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt; returns the parameters. For example, the following example saves 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'r=...poke(16320,r)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Multiple parameters can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*i%1&amp;lt;/code&amp;gt;          || 8      || [[File:Random dithering.png]]       || &amp;quot;random&amp;quot; dithering &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y. &amp;lt;code&amp;gt;i*25/96%1&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;i*97/192&amp;lt;/code&amp;gt; if desperate.&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays at it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1082</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1082"/>
				<updated>2022-02-20T09:50:57Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Dithering */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Bytebattles ==&lt;br /&gt;
&lt;br /&gt;
Bytebattles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.1&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;t=time()/399&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be even used to minimize a function with parameters: &amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt; returns the parameters. For example, the following example saves 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'r=...poke(16320,r)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Multiple parameters can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*i%1&amp;lt;/code&amp;gt;          || 8      || [[File:Random dithering.png]]       || &amp;quot;random&amp;quot; dithering &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y. &amp;lt;code&amp;gt;i*33/64%1&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;i*97/192&amp;lt;/code&amp;gt; if desperate.&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays at it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1081</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1081"/>
				<updated>2022-02-19T15:09:01Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Dithering */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Bytebattles ==&lt;br /&gt;
&lt;br /&gt;
Bytebattles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.1&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;t=time()/399&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be even used to minimize a function with parameters: &amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt; returns the parameters. For example, the following example saves 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'r=...poke(16320,r)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Multiple parameters can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*i%1&amp;lt;/code&amp;gt;          || 8      || [[File:Random dithering.png]]       || &amp;quot;random&amp;quot; dithering &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  chess horse, &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y                               &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays at it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1080</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1080"/>
				<updated>2022-02-19T14:50:23Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Dithering */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Bytebattles ==&lt;br /&gt;
&lt;br /&gt;
Bytebattles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.1&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;t=time()/399&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be even used to minimize a function with parameters: &amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt; returns the parameters. For example, the following example saves 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'r=...poke(16320,r)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Multiple parameters can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*99%1&amp;lt;/code&amp;gt;         || 9      || [[File:Random dithering.png]]       || &amp;quot;random&amp;quot; dithering, use *9 if desperate. &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  chess horse, &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y                               &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays at it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1079</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1079"/>
				<updated>2022-02-19T14:37:44Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Load-function */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Bytebattles ==&lt;br /&gt;
&lt;br /&gt;
Bytebattles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.1&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;t=time()/399&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be even used to minimize a function with parameters: &amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt; returns the parameters. For example, the following example saves 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'r=...poke(16320,r)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Multiple parameters can be fetched with &amp;lt;code&amp;gt;x,y=...&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*99%1&amp;lt;/code&amp;gt;         || 9      || [[File:Random dithering.png]]       || &amp;quot;random&amp;quot; dithering, use *9 if desperate &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  chess horse, &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y                               &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays at it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	<entry>
		<id>http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1078</id>
		<title>Byte Battle</title>
		<link rel="alternate" type="text/html" href="http://www.sizecoding.org/index.php?title=Byte_Battle&amp;diff=1078"/>
				<updated>2022-02-19T09:10:48Z</updated>
		
		<summary type="html">&lt;p&gt;Pestis: /* Basic optimizations */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Bytebattles ==&lt;br /&gt;
&lt;br /&gt;
Bytebattles are a form of live coding, similar to Shader Showdowns, where two contestants compete in writing a visual effect in 25 minutes. The coding environment is the [[Fantasy Consoles|TIC-80 fantasy console]]. However, unlike Shader Showdowns, there is an additional limit: the final code should be 256 characters or less. This requires the contestants to use efficient code (e.g. single letter variables) and to minimize the code (e.g. remove the whitespace), all within the time limit. Unlike in normal TIC-80 sizecoding, there is no compression, so every character counts.&lt;br /&gt;
&lt;br /&gt;
== General notation in this article ==&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Symbol || Meaning&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; || Pixel index&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s&amp;lt;/code&amp;gt; || Alias for math.sin&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; || Pixel x-coordinate&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; || Pixel y-coordinate&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Basic optimizations ==&lt;br /&gt;
&lt;br /&gt;
* Functions, that are called three or more times should be aliased. For example, &amp;lt;code&amp;gt;e=elli&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;e()e()e()&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;elli()elli()elli()&amp;lt;/code&amp;gt;. Functions with 5-character-long names may already benefit from aliasing with two calls: &amp;lt;code&amp;gt;r=rectb&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;r()r()&amp;lt;/code&amp;gt; is 1 character shorter than &amp;lt;code&amp;gt;rectb()rectb()&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;t=0&amp;lt;/code&amp;gt; with &amp;lt;code&amp;gt;t=t+.1&amp;lt;/code&amp;gt; is 3 characters shorter than &amp;lt;code&amp;gt;t=time()/399&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;for i=0,32639 do x=i%240y=i/240 end&amp;lt;/code&amp;gt; is 2-3 characters shorter than &amp;lt;code&amp;gt;for y=0,135 do for x=0,239 do end end&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;(x*x+y*y)^.5&amp;lt;/code&amp;gt; is 6 characters shorter than &amp;lt;code&amp;gt;math.sqrt(x*x+y*y)&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;s(w+8)&amp;lt;/code&amp;gt; both approximate &amp;lt;code&amp;gt;math.cos(w)&amp;lt;/code&amp;gt;, so only &amp;lt;code&amp;gt;math.sin&amp;lt;/code&amp;gt; needs to be aliased. &amp;lt;code&amp;gt;s(w-11)&amp;lt;/code&amp;gt; is far more accurate, with the cost of one more character.&lt;br /&gt;
&lt;br /&gt;
== One-lining ==&lt;br /&gt;
&lt;br /&gt;
Most whitespace can be removed from LUA code. For example: &amp;lt;code&amp;gt;x=0y=0&amp;lt;/code&amp;gt; is valid. All new lines can be removed or replaced with space, making the whole code a single line:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()for i=0,32639 do poke4(i,i)end end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: Letters &amp;lt;code&amp;gt;a-f&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;A-F&amp;lt;/code&amp;gt; after a number cause problems.''' &amp;lt;code&amp;gt;a=0b=0&amp;lt;/code&amp;gt; is not valid code. It is advisable to only used one letter variables in the ranges &amp;lt;code&amp;gt;g-z&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;G-Z&amp;lt;/code&amp;gt; from the start; this will make eventual one-lining easier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Load-function == &lt;br /&gt;
&lt;br /&gt;
Function &amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; takes a string of code and returns a function with no named arguments, with the code as its body. It's particularly useful for shortening the TIC function after one-lining:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
TIC=load'for i=0,32639 do poke4(i,i)end'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As a rule of thumb, one-lining and using the load trick can bring a ~ 275 character code down to 256.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;load&amp;lt;/code&amp;gt; can be even used to minimize a function with parameters: &amp;lt;code&amp;gt;...&amp;lt;/code&amp;gt; returns the parameters. For example, the following example saves 3 bytes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
SCN=load'r=...poke(16320,r)'&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Warning: The backslash causes problems when using the load trick.''' In particular, if you have a string with escaped characters in the original code e.g. &amp;lt;code&amp;gt;print(&amp;quot;foo\nbar&amp;quot;)&amp;lt;/code&amp;gt;, then this needs to be double-escaped: &amp;lt;code&amp;gt;load'print(&amp;quot;foo\\nbar&amp;quot;)'&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Dithering ==&lt;br /&gt;
&lt;br /&gt;
If you have a floating point color value, TIC-80 &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round it (toward zero). To add dithering, add a small value, between 0 and 1, to the color. The best technique depends whether you have &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;y&amp;lt;/code&amp;gt; available or only &amp;lt;code&amp;gt;i&amp;lt;/code&amp;gt; and how many bytes you can spare:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                     || Length || Result                              || Notes                                                                                                                     &lt;br /&gt;
|-&lt;br /&gt;
|                                ||        || [[File:No dithering.png]]           || No dithering                              &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;s(i)*99%1&amp;lt;/code&amp;gt;         || 9      || [[File:Random dithering.png]]       || &amp;quot;random&amp;quot; dithering, use *9 if desperate &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;i*481/960%1&amp;lt;/code&amp;gt;       || 11     || [[File:Chess dithering.png]]        ||  chess horse, &amp;lt;code&amp;gt;(x/2+y/4)%1&amp;lt;/code&amp;gt; if you have x&amp;amp;y                               &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(x*2-y%2)%4/4&amp;lt;/code&amp;gt;     || 13     || [[File:Block dithering.png]]        ||  2x2 block dithering                       &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;(i*2-i//80%2)%4/4&amp;lt;/code&amp;gt; || 17     || [[File:Block dithering from i.png]] ||  2x2 block dithering (almost), from i only &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A quick example demonstrating the 2x2 block dithering:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for i=0,2399 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i//240&lt;br /&gt;
  poke4(i,x/30+(x*2-y%2)%4/4)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Palettes ==&lt;br /&gt;
&lt;br /&gt;
The following palettes assume that &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; goes from 0 to 47. Usually there's no need to make a new loop for this: just reuse another loop with &amp;lt;code&amp;gt;j=i%48&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
! Expression                                       || Length  || Result                               || Notes&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j*5)&amp;lt;/code&amp;gt;                   || 17      || [[File:Gray palette.png]]            ||&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j*5)&amp;lt;/code&amp;gt;               || 21      || [[File:Blue-green-cyan palette.png]] || Good for objects &amp;amp; background&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,j%3*j/.4)&amp;lt;/code&amp;gt;              || 22      || [[File:Blue palette.png]]            || Use &amp;lt;code&amp;gt;(j+1)%3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;(j+2)%3&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for different colors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*255)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow palette.png]]         || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j)^2*j*6)&amp;lt;/code&amp;gt;            || 24      || [[File:Rainbow faded palette.png]]   || Change the phase of the palette with s(j+p)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15)*255)&amp;lt;/code&amp;gt;           || 25      || [[File:Blue-brown palette.png]]      || &amp;lt;code&amp;gt;s(j/15)^2&amp;lt;/code&amp;gt; is less bright&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j-j%-3)^2*255)&amp;lt;/code&amp;gt;       || 29      || [[File:Green-beige palette.png]]     || &amp;lt;code&amp;gt;j%3*2&amp;lt;/code&amp;gt; for a more blue/beige variant, &amp;lt;code&amp;gt;-j%3*4&amp;lt;/code&amp;gt; for beige/blue variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(4+j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright beige palette.png]]    || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a pink variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,255/(1+2^(5-j%3-j/5)))&amp;lt;/code&amp;gt; || 35      || [[File:Bright blue palette.png]]     || &amp;lt;code&amp;gt;2*j%3&amp;lt;/code&amp;gt; for a green variant&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;poke(16320+j,s(j/15+s(j%3*3))^2*255)&amp;lt;/code&amp;gt;|| 37      || [[File:Green-purple palette.png]]    || Cyclic, based on [https://iquilezles.org/www/articles/palettes/palettes.htm]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The last one is an entire family of palettes. You can replace &amp;lt;code&amp;gt;s(j%3*3)&amp;lt;/code&amp;gt; with any function that depends on &amp;lt;code&amp;gt;j%3&amp;lt;/code&amp;gt;; this ensures the palette remains cyclic. Some ideas for tweaking the palettes:&lt;br /&gt;
* Invert the colors by adding a &amp;lt;code&amp;gt;-1-&amp;lt;/code&amp;gt; in the expression&lt;br /&gt;
* Flip the blue/red channels &amp;amp; have the entire palette running backwards by using &amp;lt;code&amp;gt;poke(16367-j,...)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Abuse the default Sweetie 16 palette, by only setting some of the RGB channels, while keeping others as they are. For example, setting all the blue channels to zero: &amp;lt;code&amp;gt;poke(16322+j*3,0)&amp;lt;/code&amp;gt;. Here &amp;lt;code&amp;gt;j&amp;lt;/code&amp;gt; is between 0 and 15.&lt;br /&gt;
&lt;br /&gt;
Code for testing palettes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 cls()&lt;br /&gt;
 for j=0,47 do poke(16320+j,s(j/15)*255)end&lt;br /&gt;
 for c=0,15 do rect(c*5,0,5,5,c)end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Motion blur == &lt;br /&gt;
&lt;br /&gt;
In TIC-80 API, the &amp;lt;code&amp;gt;pix&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;poke4&amp;lt;/code&amp;gt; functions round numbers towards zero. This can be abused for a motion blur: &amp;lt;code&amp;gt;poke4(i,peek4(i)-.9)&amp;lt;/code&amp;gt; maps colors 1 to 15 into one lower value, but value 0 stays at it is. Like so:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/9&lt;br /&gt;
 circ(t%240,t%136,9,15)&lt;br /&gt;
 for i=0,32639 do poke4(i,peek4(i)-.9)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Updating only some pixels ==&lt;br /&gt;
&lt;br /&gt;
Pixel-based effects, especially raycasting and raymarching, can become excessively slow. A simple trick to update only ~ half of the pixels, giving a dithered/motion blur look and making the update smoother:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=t%2,32639,1.9 do poke4(i,i/4e3+t)end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Examples of effects ==&lt;br /&gt;
&lt;br /&gt;
The effects have not been crunched to keep them readable.&lt;br /&gt;
&lt;br /&gt;
=== Plasma ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/499&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240&lt;br /&gt;
  y=i/240&lt;br /&gt;
  v=s(x/50+t)+s(y/22+t)+s(x/32)&lt;br /&gt;
  poke4(i,v*2%8)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Rotozoomer ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/999 &lt;br /&gt;
 a=s(t-11)&lt;br /&gt;
 b=s(t)&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-120&lt;br /&gt;
  y=i/240-68&lt;br /&gt;
  u=a*x-b*y&lt;br /&gt;
  v=b*x+a*y&lt;br /&gt;
  poke4(i,(u//1~v//1)//16)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Tunnel ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 t=time()/199&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  x=i%240-s(t/7)*99-120&lt;br /&gt;
  y=i/240-s(t/9)*49-68&lt;br /&gt;
  u=math.atan2(y,x)*6/6.29&lt;br /&gt;
  v=99/(x*x+y*y)^.5+t&lt;br /&gt;
  poke4(i,u//1~v//1)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
s=math.sin&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Raymarcher ===&lt;br /&gt;
&lt;br /&gt;
The map is a bunch of repeated spheres here.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;lua&amp;quot;&amp;gt;&lt;br /&gt;
function TIC()&lt;br /&gt;
 for i=0,32639 do&lt;br /&gt;
  -- ray (u,v,w), not normalized!&lt;br /&gt;
  u=i%240/120-1&lt;br /&gt;
  v=i/32639-.5&lt;br /&gt;
  w=1&lt;br /&gt;
  -- camera origo (x,y,z)&lt;br /&gt;
  x=3&lt;br /&gt;
  y=0&lt;br /&gt;
  z=time()/999 -- camera moves with time&lt;br /&gt;
  j=0&lt;br /&gt;
  repeat&lt;br /&gt;
   X=x%6-3 -- domain repetition&lt;br /&gt;
   Y=y%6-3&lt;br /&gt;
   Z=z%6-3&lt;br /&gt;
   -- ray not normalized=&amp;gt;reduce scale&lt;br /&gt;
   m=(X*X+Y*Y+Z*Z)^.5/2-1&lt;br /&gt;
   x=x+m*u&lt;br /&gt;
   y=y+m*v&lt;br /&gt;
   z=z+m*w&lt;br /&gt;
   j=j+1&lt;br /&gt;
  until j&amp;gt;15 or m&amp;lt;.1&lt;br /&gt;
  poke4(i,j)&lt;br /&gt;
 end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Additional Resources ==&lt;br /&gt;
&lt;br /&gt;
* Code from past bytebattles https://livecode.demozoo.org/&lt;/div&gt;</summary>
		<author><name>Pestis</name></author>	</entry>

	</feed>