http://www.sizecoding.org/api.php?action=feedcontributions&user=HellMood&feedformat=atomSizeCoding - User contributions [en]2021-04-18T11:35:11ZUser contributionsMediaWiki 1.27.0http://www.sizecoding.org/index.php?title=Memories&diff=571Memories2020-06-04T18:56:56Z<p>HellMood: /* External Links */</p>
<hr />
<div>==Update==<br />
The final freedos version is completed, you can [https://www.youtube.com/watch?v=wlW84fEHngM watch it here]. It includes the Amiga Ball as an extra effect, but lacks the music, since a normal modern PC booted to freedos can not play MIDI that easily. The filesize is still 256 bytes.<br />
==Introduction==<br />
<br />
Hello! My name is "HellMood" and this is not a usual wiki style article. It will be written from my point of view, in first person. This article is all about a tiny MS DOS program called "Memories". This program has a size of 256 bytes and won the "PC 256 byte" competition of the [https://en.wikipedia.org/wiki/Demoscene demoscene] event [https://2020.revision-party.net/ "Revision"] in 2020, as well as the public choice award. You can watch a video of the output of this program here (https://www.youtube.com/watch?v=Imquk_3oFf4) and a video including live reactions of online audience and moderators here (https://www.twitch.tv/videos/589179638?t=7h02m40s). You can download the release and comment here (https://www.pouet.net/prod.php?which=85227). This wiki article will give you an indepth analysis of the program and point out historical references as well as development steps. In case you did not notice, you are now on the "sizecoding wiki". This will help you not only to understand the interior of "Memories", but also enable you to produce something like this yourself. Feel free to look around! It is recommended that you start with the [http://www.sizecoding.org/wiki/Getting_Started basics of this wiki] first if you're new to sizecoding and/or x86 assembler. While the principles might be easy to grasp, details of the implementation might not.<br />
<br />
==Disclaimer==<br />
This article discusses the submitted DosBox version (256 bytes). The available archive contains versions for FreeDos and Windows XP DOS which, at the time of this writing, don't work on all computers. These alternative versions were included as proof of concept, to show that this program does not only run on the emulator "DosBox". For the "PC 256 bytes" competition of the event "Revision" in 2020, it was allowed to specify either "FreeDos" or "DosBox" as platform, the latter with a specific configuration. It is in fact possible to modify the DosBox version to work on FreeDos, MS DOS, WinXP and Win98 as the released alternative versions showcase, but this is not the focus of this article. Anyway, safe versions for all platform and all computers are in the works. Regarding possible further optimizations, I will discuss/explain only the submitted version although i already found several spots in the code that can be optimized. As the history of tiny effects show, almost no code is perfect, and can be brought down further in size. To avoid confusion between different versions, only the submitted version will be explained.<br />
<br />
==History of tiny effects==<br />
[[File:Pouet categories.png|thumb|Size categories on http://www.pouet.net]]<br />
Sizecoders think in size categories. For MS DOS, these categories are 256b, 128b, 64b and 32b. These are the standards of one of the biggest demoscene archives, http://www.pouet.net. There is no 16b category, although many tiny effects can be implemented in 16 bytes. Almost all of the effects of "Memories" have been coded and optimized by me before, and mostly the implementations were attempts to reduce an already existing effect in size, or do something similar in less size. Reducing the size in this case means, reducing it to the (one of the) next lower 2^N category. For example I might have seen an effect done in 33 to 64 bytes, and then reduced the size to 32 bytes or less. Almost everytime I moved an effect to a lower category, I submitted the resulting tiny program to a demoscene party which allowed remote entries in the 256b category and/or published the result to http://www.pouet.net . In this section I will introduce the effects and mention the origins and authors.<br />
===Array of chessboards===<br />
[[File:Kasparov.gif|frame|kasparov, 16 bytes]]<br />
The origin of this effect is my own "Kasparov 16b" from 2018 (https://www.pouet.net/prod.php?which=75912) apparently I "had this piece lying around, stuck at 17 bytes and not that beautiful, until i found a trick ;)" (quote from the release notes) There was a similar effect in 32 bytes before : "ew" by "headcrash" from 2003. (https://www.pouet.net/prod.php?which=17567) In this case, my design goal was to show "real" chessboards of 8x8 fields and recognizable dark and light squares, as well as the correct orientation of the single chessboards, meaning that the bottom right corner (h1) has to be a [https://en.wikibooks.org/wiki/Chess/Arranging_The_Board white square]. For "Memories", the effect had to be reimplemented with another approach of writing to the screen to work with the framework, also the scrolling direction was changed to be different from the "scrolling tilted plane" effect.<br />
<br />
===Zooming circles===<br />
The zooming circles were supposed to be entered to a demoparty as a 32 bytes intro, but I never actually did it. There is no real predecessor in 64 bytes, because in the 64b category much more complex effects are possible. The zooming circles were the result of desperately trying to reach the 32b category for a circular "tunnel" effect, for that my own record is still 52 bytes ("Neontube" - 2016)(https://www.pouet.net/prod.php?which=66808) which in turn was the optimization of the all-time-classic 64 byte effect "constant evolution" from ryg/Farbrausch (2003) (https://www.pouet.net/prod.php?which=8697). In the zooming circles routine, the distance and the angle are removed/ignored, so that the 32b category could be reached.<br />
<br />
===Scrolling tilted plane===<br />
[[File:Floorcast variation.png|thumb|Floorcast, 32 bytes version, variation]]<br />
The scrolling tilted plane is one of my own releases, "floorcast 32b" from 2018. Floorcasts have its own history in sizecoding, the effect was brought down stepwise from 256 bytes to 32 bytes. The versions differ in the number of planes, some show two planes, others only one. For the "floorcast 32b" release I specifically decided against the "XOR" texture, whereas in "Memories" I used it again, but disguised with another final touch of "AND".<br />
* "rain_storm version 2008 - 256 bytes - https://www.pouet.net/prod.php?which=50479<br />
* "org_100h" version 2008 - 128 bytes - https://www.pouet.net/prod.php?which=63271<br />
* "Baudsurfer" version 2013 - 86 bytes - https://www.pouet.net/prod.php?which=61070<br />
* "Baudsurfer" version 2014 - 64 bytes - https://www.pouet.net/prod.php?which=63305<br />
* "HellMood" version 2018 - 32 bytes - https://www.pouet.net/prod.php?which=77774<br />
<br />
===Parallax checkerboards===<br />
[[File:Projektbeschreibung.gif|frame|Projektbeschreibung, 32 bytes]]<br />
The parallax checkerboards have been released by me as 32 bytes effect [https://www.pouet.net/prod.php?which=76788 "Projektbeschreibung"] in 2018. It was the direct attempt to bring [https://www.pouet.net/prod.php?which=28543 "Follow the light"] from "Digimind" (2006) or my own [https://www.pouet.net/prod.php?which=63293 "Lucy"] (2014) down to 32 bytes. Very helpful inspirations came from [https://www.pouet.net/prod.php?which=32281 "Rrrolas" "Paralaxa"] (32 bytes, 2007) and [https://www.pouet.net/prod.php?which=61075 "Sensenstahl"s "Byteropolis"] (32 bytes, 2013). In fact Rrrolas rendering approach was already close to the final solution, the code was modified to correct the planes arrangement, change the shape from triangles to checkerboards and improve on the colors. In "Memories", the coloring of "Digimind"s version was used. Also, the effect has been modified to reduce shearing as much as possible.<br />
<br />
===Sierpinski rotozoomer===<br />
[[File:Colpinski.png|thumb|colpinski, 16 bytes]]<br />
[[File:Rotastic.png|thumb|left|rotastic, 32 bytes]]<br />
This consists of two effects, the rotozoomer and the sierpinski effect as a texture. The sierpinski effect is based on my own "Colpinski 16b" from 2013 (https://www.pouet.net/prod.php?which=62079) where "frag" of "fsqrt" helped pushing the limits. There is no real predecessor to that, because it is known that this effect can be done directly by combining X and Y, instead of an Iterated Function System or Cellular Automaton approach. The rotozoomer was released as 32b intro "rotastic" by me in 2017 (https://www.pouet.net/prod.php?which=70112), it is based on the ideas of "ryg" of "Farbrausch" (51 bytes, 2002, https://www.pouet.net/prod.php?which=5015) and "Gargaj" of "Conspiracy" (49 bytes, 2002, https://www.pouet.net/prod.php?which=6088).<br />
<br />
===Raycast bent tunnel===<br />
[[File:Intoanew.gif|frame|Into a new era, 64 bytes version]]<br />
The raycast bent tunnel is a tweaked version of my own 64 bytes "Into a new era" release (2018, https://www.pouet.net/prod.php?which=78044). The custom colors were exchanged for colors of the standard palette, the geometry was tweaked a bit and changed accordingly, so that no depth effect was used for the calculation of texture values. The standalone version of this effect has a size of 50 bytes. The two main inspirations for the 64 bytes version were two 128 byte intros, "Spongy" by "TBC" (2009, https://www.pouet.net/prod.php?which=53871) and "Wolf128" from "Baudsurfer" (2014, https://www.pouet.net/prod.php?which=63518), while the particular algorithm was developed independently.<br />
<br />
===Ocean night to day===<br />
[[File:Ocean.png|thumb|Ocean, 64b version]]<br />
The ocean effect is based on my own 64 bytes release "Ocean" from 2016 (https://www.pouet.net/prod.php?which=66857). The custom color generation and music generation have been stripped from the original, both were not compatible with the main framework of "Memories" without consuming a LOT of extra space. The special effect of "dawn" happens as a byproduct of the general framework approach. This will be explained in the next chapter.<br />
<br />
===Fading effect===<br />
The transition between two effects is an effect itself, which has no real predecessor. It is rather an idea that evolved over the years, and might have been done by many others in similar ways. In a nutshell, while calculating a frame, each pixel position is randomized and the time - which determines the effect to use - is offset by this randomization value which is downscaled before. This allows the usage of the standard VGA palette (picture, source) rather than creating custom colors for real crossfading and thus saves space.<br />
<br />
==Tiny megademo framework==<br />
The tiny megademo framework is based on ideas of [https://www.pouet.net/prod.php?which=68020 "Quatro"], done by "Sensenstahl" and me. For putting many tiny effects together into a "megademo", they have all to use the same approach and they have to be mostly free of assumptions (contents of memory and registers). Also they have to use the same timing values and be tuned to a shared timing in general. Preparing the single effects to be includeable in a framework like this took quite some time and required a lot of extra space initially. It has to be noted, that - sadly - some of my most impressive effects (judged by audience reactions and views on social media) could not be included at all, because the approaches were not combineable without a HUGE overhead. Once all effects were streamlined, I could think about "factoring" out common calculations which repeatedly occur, resulting in a few saved bytes. The framework does the following:<br />
* Setting up screen mode to 320 x 200 pixels in 256 colors<br />
* Initializing a pointer to the screen<br />
* Installing a callback for timing and music<br />
* Mainloop<br />
** Calculation of X and Y from the current screen location<br />
*** with the [http://www.sizecoding.org/wiki/General_Coding_Tricks#Obtaining_X_and_Y_without_DIV_.28The_Rrrola_Trick.29 "Rrrola Trick"]<br />
** P = Pseudorandom value from screen location<br />
** Offset current time T by a scaled value P'<br />
** Selection of effect number N according to T'<br />
** Execution of effect N (X,Y) for current pixel<br />
*** On entrance, XY is in register DX (DL,DH)<br />
*** On entrance, the current time is in register BP<br />
*** On exit, the pixel color is expected in AL<br />
** Advance to next pixel<br />
** Triple diagonal interlacing for smoothing<br />
** Repeat until frame is complete<br />
* Set timer to a tempo of ~35 FPS<br />
* Check keyboard for ESC<br />
** Quit on ESC, otherwise continue<br />
=== Code of framework===<br />
<syntaxhighlight lang="nasm"><br />
org 100h<br />
s:<br />
mov al,0x13 ; set AL to mode 320*200 in 256 colors<br />
int 0x10 ; call BIOS to set mode<br />
xchg bp,ax ; set timing value to 0x13 <br />
push 0xa000-10 ; write the screen adress to register ES<br />
pop es ; works in conjunction with Rrrola trick<br />
mov ax,0x251c ; parameter for changing timer interrupt<br />
mov dl,timer ; adress of timer routine, assume DH=1<br />
int 0x21 ; install timer routine<br />
top:<br />
mov ax,0xcccd ; load magic Rrrola constant<br />
mul di ; transform screen pointer to X, Y<br />
add al,ah ; use transformation garbage as<br />
xor ah,ah ; pseudorandom value and clear AH<br />
add ax,bp ; add time value to random value<br />
shr ax,9 ; divide by 512 (basically the speed)<br />
and al,15 ; filter effect number<br />
xchg bx,ax ; move effect number to BX<br />
mov bh,1 ; reset BH to align with start of code<br />
mov bl,[byte bx+table] ; read the effect address from the table<br />
call bx ; call the effect<br />
stosb ; write the return value and advance<br />
inc di ; triple interlace trick for after<br />
inc di ; effect and smoothing the animation<br />
jnz top ; repeat until the frame is complete<br />
mov al,tempo ; set AL to divider for timer<br />
out 40h,al ; set timing (dual pass)<br />
in al,0x60 ; read keyboard<br />
dec al ; quit on ESC<br />
jnz top ; otherwise repeat loop<br />
sounds: db 0xc3, 11, 0x93; 0xc3 is MIDI/RET; fx2-s is used as volume<br />
table: db fx2-s,fx1-s,fx0-s,fx3-s,fx4-s,fx5-s,fx6-s,sounds-s,stop-s<br />
<br />
</syntaxhighlight><br />
<br />
==Explanation of effects==<br />
===Array of chessboards===<br />
[[File:Array of chessboards.png|thumb|array of chessboards]]<br />
The most simple effect and a good one to start with. After offsetting the row by the time, the classic XOR pattern is applied. To achieve the impression of a grid of chessboards, all bits but two are set in the color. The real trick lies in shifting to a "good" palette spot. The portion of the frame which is perceived as black, isn't actually black, it's the dark part of the [https://en.wikipedia.org/wiki/Video_Graphics_Array#Color_palette standard VGA palette]. Shifting that way also makes the dark and light square look like they are from an ancient old chessboard.<br />
<syntaxhighlight lang="nasm"> <br />
xchg dx,ax ; get XY into AX<br />
sub ax,bp ; subtract time from row<br />
xor al,ah ; XOR pattern (x xor y)<br />
or al,0xDB ; pattern for array of boards<br />
add al,13h ; shift to good palette spot<br />
</syntaxhighlight><br />
<br />
===Zooming circles===<br />
[[File:Zooming circles.png|thumb|zooming circles]]<br />
The distance D of a point (X,Y) to the center (0,0) is sqrt(X²+Y²). The framework already makes sure the DL contains a centered X coordinate, DH=Y has to be centered though. Performing a square root on x86 requires quite a bit of code, but this actually can be omitted. With a clever selection of colors, the impression of zooming circles without square root is quite decent.<br />
<syntaxhighlight lang="nasm"> <br />
mov al,dh ; get Y in AL<br />
sub al,100 ; align Y vertically<br />
imul al ; AL = Y²<br />
xchg dx,ax ; Y²/256 in DH, X in AL<br />
imul al ; AL = X²<br />
add dh,ah ; DH = (X² + Y²)/256<br />
mov al,dh ; AL = (X² + Y²)/256<br />
add ax,bp ; offset color by time<br />
and al,8+16 ; select special rings<br />
</syntaxhighlight><br />
<br />
===Scrolling tilted plane===<br />
[[File:Tiltplane.png|thumb|scrolling tilted plane]]<br />
This effect is achieved as follow. First, a large constant is divided by the row number Y, to simulate distance. Then the resulting value is used twice, a) scaled by a centered X value and b) offset by current time. These results are then combined with the XOR pattern of which a special pattern is selected.<br />
<syntaxhighlight lang="nasm"> <br />
mov ax,0x1329 ; initialize with constant<br />
add dh,al ; preventing divide overflow<br />
div dh ; reverse divide AL = C/Y'<br />
xchg dx,ax ; DL = C/Y', AL = X<br />
imul dl ; AH = CX/Y'<br />
sub dx,bp ; DL = C/Y'-T <br />
xor ah,dl ; AH = (CX/Y') ^ (C/Y'-T)<br />
mov al,ah ; move to AL<br />
and al,4+8+16 ; select special pattern<br />
</syntaxhighlight><br />
<br />
===Parallax checkerboards===<br />
[[File:Paralax.png|thumb|parallax checker boards]]<br />
This is kind of raycasting with dynamic geometry. Each object plane is split horizontally because of the signed 16 bit multiplication and vertically because of an implicit logic column number operation. Additionally, implicit shearing is applied, so that the edges of the resulting grid (4 "solid" patches, 4 "transparent" patches, alternating) connect. If the ray hits one of the solid patches, the color becomes the iteration number (+palette offset into grayscale), if not, the plane is shifted by the screenpointer and the process is repeated until the maximum iteration number is reached.<br />
<syntaxhighlight lang="nasm"><br />
mov cx,bp ; set inital point to time<br />
mov bx,-16 ; limit to 16 iterations<br />
fx3L:<br />
add cx,di ; offset point by screenpointer<br />
mov ax,819 ; magic, related to Rrrola constant<br />
imul cx ; get X',Y' in DX<br />
ror dx,1 ; set carry flag on "hit"<br />
inc bx ; increment iteration count<br />
ja fx3L ; loop until "hit" or "iter=max"<br />
lea ax,[bx+31] ; map value to standard gray scale<br />
</syntaxhighlight><br />
<br />
===Sierpinski rotozoomer===<br />
[[File:Sierpinski rotozoomer.png|thumb|sierpinski rotozoomer]]<br />
[[File:1 div cos(atan(x)).png|thumb|1/cos(atan(x)), made with www.google.com]]<br />
Rotozooming normally requires trigonometric functions, or good estimations of them. Let's look at the [https://en.wikipedia.org/wiki/Rotation_matrix#In_two_dimensions usual 2D rotation equation] and interpret it in a special way to get rid of the trigonometry in the calculation:<br />
<syntaxhighlight lang="nasm"><br />
x' = x*cos(a) - y*sin(a)<br />
y' = x*sin(a) + y*cos(a)<br />
</syntaxhighlight><br />
If we add zooming to it, it looks like this:<br />
<syntaxhighlight lang="nasm"><br />
x' = z * (x*cos(a) - y*sin(a))<br />
y' = z * (x*sin(a) + y*cos(a))<br />
</syntaxhighlight><br />
Now let's assume, we don't specify z ourself, and factor out cos(a):<br />
<syntaxhighlight lang="nasm"><br />
x' = cos(a) * (x - y*tan(a))<br />
y' = cos(a) * (x*tan(a) + y)<br />
</syntaxhighlight><br />
Finally, let's substitute tan(a) with our time variable, since the tangens function rises to infinity the closer we get to +-90°:<br />
<syntaxhighlight lang="nasm"><br />
x' = cos(atan(T)) * (x - y*T)<br />
y' = cos(atan(T)) * (x*T + y)<br />
</syntaxhighlight><br />
As long as we don't care about specifying the zoomfactor and don't directly control the angle, we can now rotate between -90° and +90° without using trigonometric functions. As a byproduct, the zoomfactor is now tied to the time T. The function for the zoomfactor is shown in the image, it results in zooming from infinitely small to one (original) and back to infinitely small. Quite a few bytes have been spent to beautify the effect, correct the timing offset, speeding up the animation, enlarging the sierpinski pixels and have decent colors, but I thought it's worth the space.<br />
<syntaxhighlight lang="nasm"><br />
lea cx,[bp-2048]; center time to pass zero<br />
sal cx,3 ; speed up by factor 8!<br />
movzx ax,dh ; get X into AL<br />
movsx dx,dl ; get Y int DL<br />
mov bx,ax ; save X in BX<br />
imul bx,cx ; BX = X*T<br />
add bh,dl ; BH = X*T/256+Y<br />
imul dx,cx ; DX = Y*T<br />
sub al,dh ; AL = X-Y*T/256<br />
and al,bh ; AL = (X-Y*T/256)&(X*T/256+Y)<br />
and al,252 ; thicker sierpinski<br />
salc ; set pixel value to black<br />
jnz fx4q ; leave black if not sierpinski<br />
mov al,0x2A ; otherwise: a nice orange<br />
fx4q:<br />
</syntaxhighlight><br />
<br />
===Raycast bent tunnel===<br />
[[File:Raycastbenttunnel.png|thumb|raycast bent tunnel]]<br />
This is a variation of a the [https://www.pouet.net/prod.php?which=78044 raycast tunnel "into a new era"] (see also above). A [https://www.reddit.com/r/tinycode/comments/de7dj9/moving_cubes_with_pathtracing_lighting_and/ detailed explanation] has been posted to reddit for the [https://www.pouet.net/prod.php?which=83204 similar production "Essence"]. The custom colors have been removed, the bend direction is different, and the geometry is a bit more closed, to increase the performance on slower computers and DosBox.<br />
<syntaxhighlight lang="nasm"><br />
mov cl,-9 ; start with depth 9 (moves backwards)<br />
fx5L: <br />
push dx ; save DX, destroyed inside the loop<br />
mov al,dh ; Get Y into AL<br />
sub al,100 ; Centering Y has to be done "manually".<br />
imul cl ; Multiply AL=Y by the current distance, to get a projection(1)<br />
xchg ax,dx ; Get X into AL, while saving the result in DX (DH)<br />
add al,cl ; add distance to projection, (bend to the right)<br />
imul cl ; Multiply AL=X by the current distance, to get a projection(2)<br />
mov al,dh ; Get projection(1) in AL<br />
xor al,ah ; combine with projection(2)<br />
add al,4 ; center the walls around 0<br />
test al,-8 ; check if the wall is hit<br />
pop dx ; restore DX<br />
loopz fx5L ; repeat until "hit" or "iter=max"<br />
sub cx,bp ; offset depth by time<br />
xor al,cl ; XOR pattern for texture <br />
aam 6 ; irregular pattern with MOD 6<br />
add al,20 ; offset into grayscale palette<br />
</syntaxhighlight><br />
<br />
===Ocean night to day===<br />
[[File:Oceannight.png|thumb|left|oceannight]]<br />
[[File:Oceanday.png|thumb|right|oceanday]]<br />
The ocean effect is a wonderful example for a [https://en.wikipedia.org/wiki/Bob_Ross#Style "happy accident"]. If you load a value into the FPU as integer, and store it as floating point value, and interpret it as integer again, you get a cool pattern. Combine that with a reverse divide and you get a neat wavy effect. This works in tight conjunction with the register DX, which has a sign flip at the position we want, so we can easily separate sky and sea. The final touch is the coloring though. By design of the global framework, the value of AL is determined on function entrance, it contains the effect adress. With a bit of code shuffling, the color of the sky came "for free", without using instructions, as well as the "fading to day"-color, which is also an effect adress. And that is no happy accident. In other versions than the original submitted DosBox version, the sky color can vary for that reason.<br />
<syntaxhighlight lang="nasm"><br />
sub dh,120 ; check if pixel is in the sky<br />
js fx6q ; quit if that's the case<br />
mov [bx+si],dx ; move XY to a memory location<br />
fild word [bx+si] ; read memory location as integer<br />
fidivr dword [bx+si]; reverse divide by constant<br />
fstp dword [bx+si-1]; store result as floating point<br />
mov ax,[bx+si] ; get the result into AX<br />
add ax,bp ; modify color by time<br />
and al,128 ; threshold into two bands<br />
dec ax ; beautify colors to blue/black<br />
</syntaxhighlight><br />
<br />
===Fading effect===<br />
For convenience, the fading effect from the global framework, isolated. Basically it is generating a pseudorandom value from the screenpointer, then offsetting the time by a scaled value of it, then calling the destined effect.<br />
<syntaxhighlight lang="nasm"><br />
mov ax,0xcccd ; load magic Rrrola constant<br />
mul di ; transform screen pointer to X, Y<br />
add al,ah ; use transformation garbage as<br />
xor ah,ah ; pseudorandom value and clear AH<br />
add ax,bp ; add time value to random value<br />
shr ax,9 ; divide by 512 (basically the speed)<br />
and al,15 ; filter effect number<br />
xchg bx,ax ; move effect number to BX<br />
mov bh,1 ; reset BH to align with start of code<br />
mov bl,[byte bx+table] ; read the effect address from the table<br />
</syntaxhighlight><br />
<br />
==MIDI music part==<br />
This code section increments the timing value and produces sound. By chosing channel 3 we can reuse the "change channel instrument" instruction again as "RET". By shifting around the code of the effects, a reasonable value for the volume could be created and thus save another byte. Be aware that this code sequence only works if the MIDI device is already in UART mode, otherwise three more bytes have to be spent. Quite a few people from the audience and sceners told my afterwards that the melody sounds a bit like [https://en.wikipedia.org/wiki/Incantations_(album) Incantations from Mike Oldfield], but it is rather created by a simple approach. Starting with [https://www.pouet.net/prod.php?which=65604 "Hypnoteye" in 2015], I experimented with procedural MIDI and eventually released a [https://www.pouet.net/prod.php?which=66313 small MIDI framework (64 bytes)]. The basic principle is to jump through tonal space with fixed steps, and map high values back to lower ones (by modulo). With simple combinations of stepwidth and modvalue, interesting effects can be produced. For example stepwidth=3 for reduced minor chords which are never resolved, likewise stepwidth=4, or stepwidth=6 for tritonal effects. With a well chosen modvalue, these patterns can create progressions. I did not do an appropriate theoretical analysis yet, rather explored the tonal space and noted when something sounded interesting.<br />
<syntaxhighlight lang="nasm"><br />
sounds: db 0xc3, 11, 0x93, fx2-s<br />
...<br />
inc bp ; increment timing value<br />
test bp, 7 ; play a note every 8th step<br />
jnz nomuse ; quit if in between<br />
mov dx,0x330 ; port number for MIDI<br />
mov si,sounds ; adress for sound data<br />
outsb ; change instrument of channel 3<br />
outsb ; to vibraphone<br />
outsb ; play a note on channel 3<br />
imul ax,bp,-19*32*4 ; the magic melody constant<br />
shr ax,10 ; scale down and implicit "and 63"<br />
add al,22 ; pitch base is 22<br />
out dx,al ; play THIS note on channel 3<br />
outsb ; play it with THIS volume<br />
</syntaxhighlight><br />
==Further tweaks==<br />
To be done ;)<br />
==Differences between dos versions==<br />
To be done ;)<br />
==External Links==<br />
*[https://en.wikipedia.org/wiki/Demoscene Demoscene]<br />
*[https://www.youtube.com/watch?v=Imquk_3oFf4 Youtube Capture]<br />
*[https://www.twitch.tv/videos/589179638?t=7h02m40s Twitch Live Reactions]<br />
*[https://www.pouet.net/prod.php?which=85227 Production Page on Pouet]<br />
*[https://www.reddit.com/r/programming/comments/g1qhf8/memories_256_byte_msdos_intro/ Reddit/Programming Thread I]<br />
*[https://www.reddit.com/r/programming/comments/g4ewh7/how_its_made_memories_a_256_byte_dos_intro_with_8/ Reddit/Programming Thread II]<br />
*[https://github.com/cesarmiquel/memories-256b-msdos-intro Analysis by "cesarmiquel"]<br />
*[https://hackaday.com/2020/04/21/a-jaw-dropping-demo-in-only-256-bytes/ Hackaday Article]<br />
*[https://github.com/simpassi/cocoamemories Simpassis Port to MAC (TIGR)] [https://www.youtube.com/watch?v=_EgG_ZaBB7o Youtube Capture]<br />
<br />
==Original release code (complete)==<br />
<syntaxhighlight lang="nasm"> <br />
; "memories" by HellMood/DESiRE<br />
; the tiny megademo, 256 byte msdos intro<br />
; shown in April 2020 @ REVISION<br />
;<br />
; (= WILL BE COMMENTED IN DETAIL LATER =)<br />
;<br />
; create : nasm.exe memories.asm -fbin -o memories.com<br />
; CHOOSE YOUR TARGET PLATFORM (compo version is dosbox)<br />
; be sure to use the dosbox.conf from this archive!<br />
; only ONE of the defines should be active!<br />
%define dosbox ; size : 256 bytes<br />
;%define freedos ; size : 230 bytes<br />
;%define winxpdos ; size : 263 bytes<br />
<br />
; DON'T TOUCH THESE UNLESS YOU KNOW WHAT YOU'RE DOING<br />
%ifdef winxpdos<br />
%define music<br />
%define switch_uart<br />
%define safe_dx<br />
%define safe_segment<br />
%endif<br />
%ifdef freedos<br />
%define safe_dx<br />
%endif<br />
%ifdef dosbox<br />
%define music<br />
;%define safe_dx ; sometimes needed<br />
%endif<br />
<br />
; GLOBAL PARAMETERS, TUNE WITH CARE!<br />
%define volume 127 ; not used on dosbox (optimization)<br />
%define instrument 11<br />
%define scale_mod -19*32*4; <br />
%define time_mask 7<br />
%define targetFPS 35<br />
%define tempo 1193182/256/targetFPS <br />
%define sierp_color 0x2A<br />
%define tunnel_base_color 20<br />
%define tunnel_pattern 6<br />
%define tilt_plate_pattern 4+8+16<br />
%define circles_pattern 8+16<br />
<br />
org 100h<br />
s:<br />
%ifdef freedos<br />
mov fs,ax<br />
mov [fs:0x46c],ax<br />
%endif<br />
mov al,0x13<br />
int 0x10 <br />
xchg bp,ax<br />
push 0xa000-10<br />
pop es<br />
%ifndef freedos<br />
mov ax,0x251c<br />
%ifdef safe_dx <br />
mov dx,timer <br />
%else ; assume DH=1, mostly true on DosBox<br />
mov dl,timer<br />
%endif<br />
int 0x21<br />
%endif<br />
top:<br />
%ifdef freedos<br />
mov bp,[fs:0x46c]<br />
%endif <br />
mov ax,0xcccd<br />
mul di<br />
add al,ah<br />
xor ah,ah<br />
add ax,bp<br />
shr ax,9<br />
and al,15<br />
xchg bx,ax<br />
mov bh,1<br />
mov bl,[byte bx+table]<br />
call bx<br />
stosb<br />
inc di<br />
inc di<br />
jnz top<br />
mov al,tempo<br />
out 40h,al<br />
in al,0x60<br />
dec al<br />
jnz top<br />
sounds:<br />
db 0xc3 ; is MIDI/RET<br />
%ifdef music<br />
db instrument,0x93<br />
%ifdef switch_uart<br />
db volume ; without switch, volume is in table<br />
db 0x3f <br />
%endif<br />
%endif<br />
table: ; first index is volume, change order with care! <br />
db fx2-s,fx1-s,fx0-s,fx3-s,fx4-s,fx5-s,fx6-s,sounds-s,stop-s<br />
stop:<br />
pop ax<br />
ret<br />
timer:<br />
%ifndef freedos<br />
%ifdef safe_segment<br />
push cs<br />
pop ds<br />
%endif<br />
inc bp<br />
%ifdef music <br />
test bp, time_mask<br />
jnz nomuse<br />
mov dx,0x330<br />
mov si,sounds<br />
outsb<br />
outsb<br />
outsb<br />
imul ax,bp,scale_mod<br />
shr ax,10<br />
add al,22<br />
out dx,al<br />
outsb<br />
%ifdef switch_uart<br />
inc dx<br />
outsb<br />
%endif<br />
%endif<br />
nomuse:<br />
iret<br />
%endif <br />
fx0: ; tilted plane, scrolling<br />
mov ax,0x1329<br />
add dh,al<br />
div dh<br />
xchg dx,ax<br />
imul dl<br />
sub dx,bp<br />
xor ah,dl<br />
mov al,ah<br />
and al,tilt_plate_pattern<br />
ret<br />
fx2: ; board of chessboards<br />
xchg dx,ax<br />
sub ax,bp<br />
xor al,ah<br />
or al,0xDB<br />
add al,13h<br />
ret<br />
fx1: ; circles, zooming<br />
mov al,dh<br />
sub al,100<br />
imul al<br />
xchg dx,ax<br />
imul al<br />
add dh,ah<br />
mov al,dh<br />
add ax,bp<br />
and al,circles_pattern<br />
ret<br />
fx3: ; parallax checkerboards<br />
mov cx,bp<br />
mov bx,-16<br />
fx3L:<br />
add cx,di<br />
mov ax,819<br />
imul cx <br />
ror dx,1 <br />
inc bx <br />
ja fx3L<br />
lea ax,[bx+31] <br />
ret<br />
fx4: ; sierpinski rotozoomer <br />
lea cx,[bp-2048]<br />
sal cx,3<br />
movzx ax,dh<br />
movsx dx,dl<br />
mov bx,ax<br />
imul bx,cx<br />
add bh,dl<br />
imul dx,cx<br />
sub al,dh<br />
and al,bh<br />
and al,0b11111100<br />
salc ; VERY slow on dosbox, but ok<br />
jnz fx4q<br />
mov al,sierp_color<br />
fx4q:<br />
ret<br />
fx5: ; raycast bent tunnel<br />
mov cl,-9<br />
fx5L: <br />
push dx<br />
mov al,dh<br />
sub al,100<br />
imul cl<br />
xchg ax,dx <br />
add al,cl<br />
imul cl<br />
mov al,dh<br />
xor al,ah<br />
add al,4<br />
test al,-8<br />
pop dx<br />
loopz fx5L<br />
sub cx,bp<br />
xor al,cl<br />
aam tunnel_pattern; VERY slow on dosbox, but ok<br />
add al,tunnel_base_color<br />
ret<br />
fx6: ; ocean night / to day sky<br />
sub dh,120<br />
js fx6q<br />
mov [bx+si],dx<br />
fild word [bx+si]<br />
fidivr dword [bx+si]<br />
fstp dword [bx+si-1]<br />
mov ax,[bx+si]<br />
add ax,bp<br />
and al,128<br />
dec ax<br />
fx6q:<br />
ret<br />
</syntaxhighlight><br />
<br />
==Bonus - NFO/ASCII==<br />
<br />
<syntaxhighlight lang="nasm"> <br />
art : hammerfist<br />
∂#MW%e _d$Ng,<br />
'B, ∂b _jM@$QZb,cQ" )@<br />
,edRB$b,l@ Wk,yGR$KM&$b, ,dP" Wl ]bsd%UR8BG6&$@DSyG#ZKM&$b,<br />
,dP "T%L 'MGF "*∂R_ Tg "*4Zk,#I YP W" 7P "*∂R<br />
4M gd@ ^ ∂@ d@b dQ$#@Z@R3L_ "*GMj 'W ,gd$ d@b 9Q$#%b<br />
W#, `M Wb `* _4P `Qk *#N8L `H5 @b 'QR7YK `* _4F" Qk<br />
`6@L dML '@ ,BK 'M ∂B *b, '#L<br />
^QBb,_ _,4&M∞∂@=,_ _dGL _gQKM GL @k 'Mg,_ _dG,<br />
"*BN5W$2#MNP" "*G3WRM8&B5P"`Y@QNW3Z5P" ∂#$W8BRM3XZN87 "*GW38M%EBDW5P"`<br />
<br />
<br />
p r e s e n t s<br />
<br />
4<br />
d@,<br />
_& `Wl<br />
_,aP "#baedM$#@@K JP*"?ML<br />
,ad@$#P" ,d@NEWVB" X,aQPYb,_<br />
V@Mm,_ ,d@MW#BW' EMP" '¶R ,ngBP^fML<br />
¶M@N@y Y#BNW#M" J9" `MQ9" "MgRBq ,QBMg,<br />
VN#P` ,d@@ `WM@^ 7f ¶F` 7kY" ^G _.eQNE1.<br />
]B _G@MWN$, `P ' 4b QP ¶w@F*^ ^Qb<br />
]O@NRM#W@MNB, ; ^` j JP^ Yl<br />
J#NRNWM@#BcT"^ ,A _J _q@ `X<br />
'¶WM#B@WdY`,7 _G#YN#PM _,gG" M,<br />
*BN#WP" dK ,Q@NRMB"]9 , _,M@Q* #A<br />
"U^ V@h, iNBW#NT J' J9 s@QN" _; 'D,<br />
¶RMBv&NMQR@9 .W .K' "9` ,6BA _JL ]l<br />
Y#NE@W#NRP #[ `¶8 _d@MW#B_jW#W BN<br />
"GQ@MR#W QL_ *B _,p#NBW#NQMG@WY 3Q<br />
"Y@F ,XW@M%im,_Yb_ _,g5@#MW@QMNE@E@NRMB ,WM<br />
` _,gP*"#REM#GB@N#MQbnd@N#M@MW#R8QSB^'WQERM@ ;4NB,<br />
,GYKL ¶E#B8R8QSB@M@#BM#W@MNB"`_ , "^` N ,dW@Ql<br />
_Q`'W`*t '¶@GS#MBQ#E@W#NQBW[ 'LvQ_ ,K _dNABGM#N<br />
,F ' `^WAB@QGE9*"9^*@L jP7FY, ¶h,_.jWM#BR#GBM,<br />
J; , _ ' ' "LL YxE#B8R8QSBNW@W;<br />
AP _,Ag6^ _ J ¶A `"Q#M@MW#R8E#P<br />
j@ `"XQW[ 'LvK,_ 'L_,/ @t Y#NE@WNR"<br />
:M/ 9^*@L jP7F" _PYKL _,A; ¶RSNQ"<br />
dKL ' ` ' "L "`'W`*t `"XQb `W^<br />
Q`8t 'L_,/ , _ ' 9^Q<br />
,W _PYKL _,Ag6^ ' W, _ ,#N&<br />
!N _ J "`'W`*t `"XQW[ _ J N!_JG9^RwQ' *t<br />
`W, 'LvK,_ ' _gGB8@L _ 'LvK,_ ,WgB' V 7L<br />
_.,gm&@B&wBZF" j@'` "WL _gML jZd7Yb lN" dBWl<br />
,g&QB*"^` `"*G@g, .gR&k,_ ,N" '@QF ¶k;gMF *QvQ jQ, ,@N@B#,<br />
.eQF*` `Yb@" "*6Qg,gF , 7 XMN" 'MNB, ^¶QWSER@N;<br />
,gP" qy, W' ^Q' &L ,g@W' `QMEL `"WBNWP<br />
g7 ¶9 ,X M? 9" _q8MSK ¶EMt *@K<br />
Vh _,m#L _AH le ,GBDNE9^A, *@F NMg<br />
¶L,qQ@ND _.m@Bl We ,gM@B8#Q' ¶h_ lWE,<br />
W9NHW@` JWM#B@] @e 4WR@NGF^ 'QL dRWl<br />
VMd* "@BE@PM 'N *UP" VW, JRSB;<br />
,@F j `¶WK W, ¶t XNt _A@E#N<br />
_JP ,6& "GLdM XD, _.g8NMA@k,_ _,gG#NMGR;<br />
"Z .JRER 'VMi _jNB#W&_ _,j@E@W#Nl ¶MBGMNQGNQMG@QBW9<br />
¶h ,G@NRMBl `" ,d#R@M$F ¶Mg,_.gp&@@NEWVBWBMG *QMN8R8SBN$E@WF<br />
Vb dW#R8QSRb, *YM@EQ,_ 'MENBW#NQMG#B@R@MW#l "BM@QNENRQG'<br />
*WGS#MBMNEYL `^"*8M@Q@NRM#W@BWSNW@QBF"` `^*@QBF^ [HFT]<br />
^M@MW#Q9 ^Wt `^¶RQ@W8NQGP*`<br />
¶Q#@P Vk lA `"^`<br />
Y" `MA J#,<br />
*R@, ,MQl<br />
Y#Wk, GWM8L<br />
W8RQSt,_ AQ@MR#,<br />
`@M@#SB@Mbm.,_ QNBW#NW<br />
¶QB8R8SBN$WNRM@#GNtwg@NMQR@B'<br />
*MBQ#8R8QS@NE@WNBW#NQMG@NR;<br />
`WGS#MBQ#R8QSB@NE@W#NQBW9<br />
*OMW@QMNE@E@NRMW@QMB@*<br />
`^"YQW@Q#SB#NE@EGP<br />
`^"*8R@GBQF`<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Memories&diff=570Memories2020-06-04T10:38:57Z<p>HellMood: /* External Links */</p>
<hr />
<div>==Update==<br />
The final freedos version is completed, you can [https://www.youtube.com/watch?v=wlW84fEHngM watch it here]. It includes the Amiga Ball as an extra effect, but lacks the music, since a normal modern PC booted to freedos can not play MIDI that easily. The filesize is still 256 bytes.<br />
==Introduction==<br />
<br />
Hello! My name is "HellMood" and this is not a usual wiki style article. It will be written from my point of view, in first person. This article is all about a tiny MS DOS program called "Memories". This program has a size of 256 bytes and won the "PC 256 byte" competition of the [https://en.wikipedia.org/wiki/Demoscene demoscene] event [https://2020.revision-party.net/ "Revision"] in 2020, as well as the public choice award. You can watch a video of the output of this program here (https://www.youtube.com/watch?v=Imquk_3oFf4) and a video including live reactions of online audience and moderators here (https://www.twitch.tv/videos/589179638?t=7h02m40s). You can download the release and comment here (https://www.pouet.net/prod.php?which=85227). This wiki article will give you an indepth analysis of the program and point out historical references as well as development steps. In case you did not notice, you are now on the "sizecoding wiki". This will help you not only to understand the interior of "Memories", but also enable you to produce something like this yourself. Feel free to look around! It is recommended that you start with the [http://www.sizecoding.org/wiki/Getting_Started basics of this wiki] first if you're new to sizecoding and/or x86 assembler. While the principles might be easy to grasp, details of the implementation might not.<br />
<br />
==Disclaimer==<br />
This article discusses the submitted DosBox version (256 bytes). The available archive contains versions for FreeDos and Windows XP DOS which, at the time of this writing, don't work on all computers. These alternative versions were included as proof of concept, to show that this program does not only run on the emulator "DosBox". For the "PC 256 bytes" competition of the event "Revision" in 2020, it was allowed to specify either "FreeDos" or "DosBox" as platform, the latter with a specific configuration. It is in fact possible to modify the DosBox version to work on FreeDos, MS DOS, WinXP and Win98 as the released alternative versions showcase, but this is not the focus of this article. Anyway, safe versions for all platform and all computers are in the works. Regarding possible further optimizations, I will discuss/explain only the submitted version although i already found several spots in the code that can be optimized. As the history of tiny effects show, almost no code is perfect, and can be brought down further in size. To avoid confusion between different versions, only the submitted version will be explained.<br />
<br />
==History of tiny effects==<br />
[[File:Pouet categories.png|thumb|Size categories on http://www.pouet.net]]<br />
Sizecoders think in size categories. For MS DOS, these categories are 256b, 128b, 64b and 32b. These are the standards of one of the biggest demoscene archives, http://www.pouet.net. There is no 16b category, although many tiny effects can be implemented in 16 bytes. Almost all of the effects of "Memories" have been coded and optimized by me before, and mostly the implementations were attempts to reduce an already existing effect in size, or do something similar in less size. Reducing the size in this case means, reducing it to the (one of the) next lower 2^N category. For example I might have seen an effect done in 33 to 64 bytes, and then reduced the size to 32 bytes or less. Almost everytime I moved an effect to a lower category, I submitted the resulting tiny program to a demoscene party which allowed remote entries in the 256b category and/or published the result to http://www.pouet.net . In this section I will introduce the effects and mention the origins and authors.<br />
===Array of chessboards===<br />
[[File:Kasparov.gif|frame|kasparov, 16 bytes]]<br />
The origin of this effect is my own "Kasparov 16b" from 2018 (https://www.pouet.net/prod.php?which=75912) apparently I "had this piece lying around, stuck at 17 bytes and not that beautiful, until i found a trick ;)" (quote from the release notes) There was a similar effect in 32 bytes before : "ew" by "headcrash" from 2003. (https://www.pouet.net/prod.php?which=17567) In this case, my design goal was to show "real" chessboards of 8x8 fields and recognizable dark and light squares, as well as the correct orientation of the single chessboards, meaning that the bottom right corner (h1) has to be a [https://en.wikibooks.org/wiki/Chess/Arranging_The_Board white square]. For "Memories", the effect had to be reimplemented with another approach of writing to the screen to work with the framework, also the scrolling direction was changed to be different from the "scrolling tilted plane" effect.<br />
<br />
===Zooming circles===<br />
The zooming circles were supposed to be entered to a demoparty as a 32 bytes intro, but I never actually did it. There is no real predecessor in 64 bytes, because in the 64b category much more complex effects are possible. The zooming circles were the result of desperately trying to reach the 32b category for a circular "tunnel" effect, for that my own record is still 52 bytes ("Neontube" - 2016)(https://www.pouet.net/prod.php?which=66808) which in turn was the optimization of the all-time-classic 64 byte effect "constant evolution" from ryg/Farbrausch (2003) (https://www.pouet.net/prod.php?which=8697). In the zooming circles routine, the distance and the angle are removed/ignored, so that the 32b category could be reached.<br />
<br />
===Scrolling tilted plane===<br />
[[File:Floorcast variation.png|thumb|Floorcast, 32 bytes version, variation]]<br />
The scrolling tilted plane is one of my own releases, "floorcast 32b" from 2018. Floorcasts have its own history in sizecoding, the effect was brought down stepwise from 256 bytes to 32 bytes. The versions differ in the number of planes, some show two planes, others only one. For the "floorcast 32b" release I specifically decided against the "XOR" texture, whereas in "Memories" I used it again, but disguised with another final touch of "AND".<br />
* "rain_storm version 2008 - 256 bytes - https://www.pouet.net/prod.php?which=50479<br />
* "org_100h" version 2008 - 128 bytes - https://www.pouet.net/prod.php?which=63271<br />
* "Baudsurfer" version 2013 - 86 bytes - https://www.pouet.net/prod.php?which=61070<br />
* "Baudsurfer" version 2014 - 64 bytes - https://www.pouet.net/prod.php?which=63305<br />
* "HellMood" version 2018 - 32 bytes - https://www.pouet.net/prod.php?which=77774<br />
<br />
===Parallax checkerboards===<br />
[[File:Projektbeschreibung.gif|frame|Projektbeschreibung, 32 bytes]]<br />
The parallax checkerboards have been released by me as 32 bytes effect [https://www.pouet.net/prod.php?which=76788 "Projektbeschreibung"] in 2018. It was the direct attempt to bring [https://www.pouet.net/prod.php?which=28543 "Follow the light"] from "Digimind" (2006) or my own [https://www.pouet.net/prod.php?which=63293 "Lucy"] (2014) down to 32 bytes. Very helpful inspirations came from [https://www.pouet.net/prod.php?which=32281 "Rrrolas" "Paralaxa"] (32 bytes, 2007) and [https://www.pouet.net/prod.php?which=61075 "Sensenstahl"s "Byteropolis"] (32 bytes, 2013). In fact Rrrolas rendering approach was already close to the final solution, the code was modified to correct the planes arrangement, change the shape from triangles to checkerboards and improve on the colors. In "Memories", the coloring of "Digimind"s version was used. Also, the effect has been modified to reduce shearing as much as possible.<br />
<br />
===Sierpinski rotozoomer===<br />
[[File:Colpinski.png|thumb|colpinski, 16 bytes]]<br />
[[File:Rotastic.png|thumb|left|rotastic, 32 bytes]]<br />
This consists of two effects, the rotozoomer and the sierpinski effect as a texture. The sierpinski effect is based on my own "Colpinski 16b" from 2013 (https://www.pouet.net/prod.php?which=62079) where "frag" of "fsqrt" helped pushing the limits. There is no real predecessor to that, because it is known that this effect can be done directly by combining X and Y, instead of an Iterated Function System or Cellular Automaton approach. The rotozoomer was released as 32b intro "rotastic" by me in 2017 (https://www.pouet.net/prod.php?which=70112), it is based on the ideas of "ryg" of "Farbrausch" (51 bytes, 2002, https://www.pouet.net/prod.php?which=5015) and "Gargaj" of "Conspiracy" (49 bytes, 2002, https://www.pouet.net/prod.php?which=6088).<br />
<br />
===Raycast bent tunnel===<br />
[[File:Intoanew.gif|frame|Into a new era, 64 bytes version]]<br />
The raycast bent tunnel is a tweaked version of my own 64 bytes "Into a new era" release (2018, https://www.pouet.net/prod.php?which=78044). The custom colors were exchanged for colors of the standard palette, the geometry was tweaked a bit and changed accordingly, so that no depth effect was used for the calculation of texture values. The standalone version of this effect has a size of 50 bytes. The two main inspirations for the 64 bytes version were two 128 byte intros, "Spongy" by "TBC" (2009, https://www.pouet.net/prod.php?which=53871) and "Wolf128" from "Baudsurfer" (2014, https://www.pouet.net/prod.php?which=63518), while the particular algorithm was developed independently.<br />
<br />
===Ocean night to day===<br />
[[File:Ocean.png|thumb|Ocean, 64b version]]<br />
The ocean effect is based on my own 64 bytes release "Ocean" from 2016 (https://www.pouet.net/prod.php?which=66857). The custom color generation and music generation have been stripped from the original, both were not compatible with the main framework of "Memories" without consuming a LOT of extra space. The special effect of "dawn" happens as a byproduct of the general framework approach. This will be explained in the next chapter.<br />
<br />
===Fading effect===<br />
The transition between two effects is an effect itself, which has no real predecessor. It is rather an idea that evolved over the years, and might have been done by many others in similar ways. In a nutshell, while calculating a frame, each pixel position is randomized and the time - which determines the effect to use - is offset by this randomization value which is downscaled before. This allows the usage of the standard VGA palette (picture, source) rather than creating custom colors for real crossfading and thus saves space.<br />
<br />
==Tiny megademo framework==<br />
The tiny megademo framework is based on ideas of [https://www.pouet.net/prod.php?which=68020 "Quatro"], done by "Sensenstahl" and me. For putting many tiny effects together into a "megademo", they have all to use the same approach and they have to be mostly free of assumptions (contents of memory and registers). Also they have to use the same timing values and be tuned to a shared timing in general. Preparing the single effects to be includeable in a framework like this took quite some time and required a lot of extra space initially. It has to be noted, that - sadly - some of my most impressive effects (judged by audience reactions and views on social media) could not be included at all, because the approaches were not combineable without a HUGE overhead. Once all effects were streamlined, I could think about "factoring" out common calculations which repeatedly occur, resulting in a few saved bytes. The framework does the following:<br />
* Setting up screen mode to 320 x 200 pixels in 256 colors<br />
* Initializing a pointer to the screen<br />
* Installing a callback for timing and music<br />
* Mainloop<br />
** Calculation of X and Y from the current screen location<br />
*** with the [http://www.sizecoding.org/wiki/General_Coding_Tricks#Obtaining_X_and_Y_without_DIV_.28The_Rrrola_Trick.29 "Rrrola Trick"]<br />
** P = Pseudorandom value from screen location<br />
** Offset current time T by a scaled value P'<br />
** Selection of effect number N according to T'<br />
** Execution of effect N (X,Y) for current pixel<br />
*** On entrance, XY is in register DX (DL,DH)<br />
*** On entrance, the current time is in register BP<br />
*** On exit, the pixel color is expected in AL<br />
** Advance to next pixel<br />
** Triple diagonal interlacing for smoothing<br />
** Repeat until frame is complete<br />
* Set timer to a tempo of ~35 FPS<br />
* Check keyboard for ESC<br />
** Quit on ESC, otherwise continue<br />
=== Code of framework===<br />
<syntaxhighlight lang="nasm"><br />
org 100h<br />
s:<br />
mov al,0x13 ; set AL to mode 320*200 in 256 colors<br />
int 0x10 ; call BIOS to set mode<br />
xchg bp,ax ; set timing value to 0x13 <br />
push 0xa000-10 ; write the screen adress to register ES<br />
pop es ; works in conjunction with Rrrola trick<br />
mov ax,0x251c ; parameter for changing timer interrupt<br />
mov dl,timer ; adress of timer routine, assume DH=1<br />
int 0x21 ; install timer routine<br />
top:<br />
mov ax,0xcccd ; load magic Rrrola constant<br />
mul di ; transform screen pointer to X, Y<br />
add al,ah ; use transformation garbage as<br />
xor ah,ah ; pseudorandom value and clear AH<br />
add ax,bp ; add time value to random value<br />
shr ax,9 ; divide by 512 (basically the speed)<br />
and al,15 ; filter effect number<br />
xchg bx,ax ; move effect number to BX<br />
mov bh,1 ; reset BH to align with start of code<br />
mov bl,[byte bx+table] ; read the effect address from the table<br />
call bx ; call the effect<br />
stosb ; write the return value and advance<br />
inc di ; triple interlace trick for after<br />
inc di ; effect and smoothing the animation<br />
jnz top ; repeat until the frame is complete<br />
mov al,tempo ; set AL to divider for timer<br />
out 40h,al ; set timing (dual pass)<br />
in al,0x60 ; read keyboard<br />
dec al ; quit on ESC<br />
jnz top ; otherwise repeat loop<br />
sounds: db 0xc3, 11, 0x93; 0xc3 is MIDI/RET; fx2-s is used as volume<br />
table: db fx2-s,fx1-s,fx0-s,fx3-s,fx4-s,fx5-s,fx6-s,sounds-s,stop-s<br />
<br />
</syntaxhighlight><br />
<br />
==Explanation of effects==<br />
===Array of chessboards===<br />
[[File:Array of chessboards.png|thumb|array of chessboards]]<br />
The most simple effect and a good one to start with. After offsetting the row by the time, the classic XOR pattern is applied. To achieve the impression of a grid of chessboards, all bits but two are set in the color. The real trick lies in shifting to a "good" palette spot. The portion of the frame which is perceived as black, isn't actually black, it's the dark part of the [https://en.wikipedia.org/wiki/Video_Graphics_Array#Color_palette standard VGA palette]. Shifting that way also makes the dark and light square look like they are from an ancient old chessboard.<br />
<syntaxhighlight lang="nasm"> <br />
xchg dx,ax ; get XY into AX<br />
sub ax,bp ; subtract time from row<br />
xor al,ah ; XOR pattern (x xor y)<br />
or al,0xDB ; pattern for array of boards<br />
add al,13h ; shift to good palette spot<br />
</syntaxhighlight><br />
<br />
===Zooming circles===<br />
[[File:Zooming circles.png|thumb|zooming circles]]<br />
The distance D of a point (X,Y) to the center (0,0) is sqrt(X²+Y²). The framework already makes sure the DL contains a centered X coordinate, DH=Y has to be centered though. Performing a square root on x86 requires quite a bit of code, but this actually can be omitted. With a clever selection of colors, the impression of zooming circles without square root is quite decent.<br />
<syntaxhighlight lang="nasm"> <br />
mov al,dh ; get Y in AL<br />
sub al,100 ; align Y vertically<br />
imul al ; AL = Y²<br />
xchg dx,ax ; Y²/256 in DH, X in AL<br />
imul al ; AL = X²<br />
add dh,ah ; DH = (X² + Y²)/256<br />
mov al,dh ; AL = (X² + Y²)/256<br />
add ax,bp ; offset color by time<br />
and al,8+16 ; select special rings<br />
</syntaxhighlight><br />
<br />
===Scrolling tilted plane===<br />
[[File:Tiltplane.png|thumb|scrolling tilted plane]]<br />
This effect is achieved as follow. First, a large constant is divided by the row number Y, to simulate distance. Then the resulting value is used twice, a) scaled by a centered X value and b) offset by current time. These results are then combined with the XOR pattern of which a special pattern is selected.<br />
<syntaxhighlight lang="nasm"> <br />
mov ax,0x1329 ; initialize with constant<br />
add dh,al ; preventing divide overflow<br />
div dh ; reverse divide AL = C/Y'<br />
xchg dx,ax ; DL = C/Y', AL = X<br />
imul dl ; AH = CX/Y'<br />
sub dx,bp ; DL = C/Y'-T <br />
xor ah,dl ; AH = (CX/Y') ^ (C/Y'-T)<br />
mov al,ah ; move to AL<br />
and al,4+8+16 ; select special pattern<br />
</syntaxhighlight><br />
<br />
===Parallax checkerboards===<br />
[[File:Paralax.png|thumb|parallax checker boards]]<br />
This is kind of raycasting with dynamic geometry. Each object plane is split horizontally because of the signed 16 bit multiplication and vertically because of an implicit logic column number operation. Additionally, implicit shearing is applied, so that the edges of the resulting grid (4 "solid" patches, 4 "transparent" patches, alternating) connect. If the ray hits one of the solid patches, the color becomes the iteration number (+palette offset into grayscale), if not, the plane is shifted by the screenpointer and the process is repeated until the maximum iteration number is reached.<br />
<syntaxhighlight lang="nasm"><br />
mov cx,bp ; set inital point to time<br />
mov bx,-16 ; limit to 16 iterations<br />
fx3L:<br />
add cx,di ; offset point by screenpointer<br />
mov ax,819 ; magic, related to Rrrola constant<br />
imul cx ; get X',Y' in DX<br />
ror dx,1 ; set carry flag on "hit"<br />
inc bx ; increment iteration count<br />
ja fx3L ; loop until "hit" or "iter=max"<br />
lea ax,[bx+31] ; map value to standard gray scale<br />
</syntaxhighlight><br />
<br />
===Sierpinski rotozoomer===<br />
[[File:Sierpinski rotozoomer.png|thumb|sierpinski rotozoomer]]<br />
[[File:1 div cos(atan(x)).png|thumb|1/cos(atan(x)), made with www.google.com]]<br />
Rotozooming normally requires trigonometric functions, or good estimations of them. Let's look at the [https://en.wikipedia.org/wiki/Rotation_matrix#In_two_dimensions usual 2D rotation equation] and interpret it in a special way to get rid of the trigonometry in the calculation:<br />
<syntaxhighlight lang="nasm"><br />
x' = x*cos(a) - y*sin(a)<br />
y' = x*sin(a) + y*cos(a)<br />
</syntaxhighlight><br />
If we add zooming to it, it looks like this:<br />
<syntaxhighlight lang="nasm"><br />
x' = z * (x*cos(a) - y*sin(a))<br />
y' = z * (x*sin(a) + y*cos(a))<br />
</syntaxhighlight><br />
Now let's assume, we don't specify z ourself, and factor out cos(a):<br />
<syntaxhighlight lang="nasm"><br />
x' = cos(a) * (x - y*tan(a))<br />
y' = cos(a) * (x*tan(a) + y)<br />
</syntaxhighlight><br />
Finally, let's substitute tan(a) with our time variable, since the tangens function rises to infinity the closer we get to +-90°:<br />
<syntaxhighlight lang="nasm"><br />
x' = cos(atan(T)) * (x - y*T)<br />
y' = cos(atan(T)) * (x*T + y)<br />
</syntaxhighlight><br />
As long as we don't care about specifying the zoomfactor and don't directly control the angle, we can now rotate between -90° and +90° without using trigonometric functions. As a byproduct, the zoomfactor is now tied to the time T. The function for the zoomfactor is shown in the image, it results in zooming from infinitely small to one (original) and back to infinitely small. Quite a few bytes have been spent to beautify the effect, correct the timing offset, speeding up the animation, enlarging the sierpinski pixels and have decent colors, but I thought it's worth the space.<br />
<syntaxhighlight lang="nasm"><br />
lea cx,[bp-2048]; center time to pass zero<br />
sal cx,3 ; speed up by factor 8!<br />
movzx ax,dh ; get X into AL<br />
movsx dx,dl ; get Y int DL<br />
mov bx,ax ; save X in BX<br />
imul bx,cx ; BX = X*T<br />
add bh,dl ; BH = X*T/256+Y<br />
imul dx,cx ; DX = Y*T<br />
sub al,dh ; AL = X-Y*T/256<br />
and al,bh ; AL = (X-Y*T/256)&(X*T/256+Y)<br />
and al,252 ; thicker sierpinski<br />
salc ; set pixel value to black<br />
jnz fx4q ; leave black if not sierpinski<br />
mov al,0x2A ; otherwise: a nice orange<br />
fx4q:<br />
</syntaxhighlight><br />
<br />
===Raycast bent tunnel===<br />
[[File:Raycastbenttunnel.png|thumb|raycast bent tunnel]]<br />
This is a variation of a the [https://www.pouet.net/prod.php?which=78044 raycast tunnel "into a new era"] (see also above). A [https://www.reddit.com/r/tinycode/comments/de7dj9/moving_cubes_with_pathtracing_lighting_and/ detailed explanation] has been posted to reddit for the [https://www.pouet.net/prod.php?which=83204 similar production "Essence"]. The custom colors have been removed, the bend direction is different, and the geometry is a bit more closed, to increase the performance on slower computers and DosBox.<br />
<syntaxhighlight lang="nasm"><br />
mov cl,-9 ; start with depth 9 (moves backwards)<br />
fx5L: <br />
push dx ; save DX, destroyed inside the loop<br />
mov al,dh ; Get Y into AL<br />
sub al,100 ; Centering Y has to be done "manually".<br />
imul cl ; Multiply AL=Y by the current distance, to get a projection(1)<br />
xchg ax,dx ; Get X into AL, while saving the result in DX (DH)<br />
add al,cl ; add distance to projection, (bend to the right)<br />
imul cl ; Multiply AL=X by the current distance, to get a projection(2)<br />
mov al,dh ; Get projection(1) in AL<br />
xor al,ah ; combine with projection(2)<br />
add al,4 ; center the walls around 0<br />
test al,-8 ; check if the wall is hit<br />
pop dx ; restore DX<br />
loopz fx5L ; repeat until "hit" or "iter=max"<br />
sub cx,bp ; offset depth by time<br />
xor al,cl ; XOR pattern for texture <br />
aam 6 ; irregular pattern with MOD 6<br />
add al,20 ; offset into grayscale palette<br />
</syntaxhighlight><br />
<br />
===Ocean night to day===<br />
[[File:Oceannight.png|thumb|left|oceannight]]<br />
[[File:Oceanday.png|thumb|right|oceanday]]<br />
The ocean effect is a wonderful example for a [https://en.wikipedia.org/wiki/Bob_Ross#Style "happy accident"]. If you load a value into the FPU as integer, and store it as floating point value, and interpret it as integer again, you get a cool pattern. Combine that with a reverse divide and you get a neat wavy effect. This works in tight conjunction with the register DX, which has a sign flip at the position we want, so we can easily separate sky and sea. The final touch is the coloring though. By design of the global framework, the value of AL is determined on function entrance, it contains the effect adress. With a bit of code shuffling, the color of the sky came "for free", without using instructions, as well as the "fading to day"-color, which is also an effect adress. And that is no happy accident. In other versions than the original submitted DosBox version, the sky color can vary for that reason.<br />
<syntaxhighlight lang="nasm"><br />
sub dh,120 ; check if pixel is in the sky<br />
js fx6q ; quit if that's the case<br />
mov [bx+si],dx ; move XY to a memory location<br />
fild word [bx+si] ; read memory location as integer<br />
fidivr dword [bx+si]; reverse divide by constant<br />
fstp dword [bx+si-1]; store result as floating point<br />
mov ax,[bx+si] ; get the result into AX<br />
add ax,bp ; modify color by time<br />
and al,128 ; threshold into two bands<br />
dec ax ; beautify colors to blue/black<br />
</syntaxhighlight><br />
<br />
===Fading effect===<br />
For convenience, the fading effect from the global framework, isolated. Basically it is generating a pseudorandom value from the screenpointer, then offsetting the time by a scaled value of it, then calling the destined effect.<br />
<syntaxhighlight lang="nasm"><br />
mov ax,0xcccd ; load magic Rrrola constant<br />
mul di ; transform screen pointer to X, Y<br />
add al,ah ; use transformation garbage as<br />
xor ah,ah ; pseudorandom value and clear AH<br />
add ax,bp ; add time value to random value<br />
shr ax,9 ; divide by 512 (basically the speed)<br />
and al,15 ; filter effect number<br />
xchg bx,ax ; move effect number to BX<br />
mov bh,1 ; reset BH to align with start of code<br />
mov bl,[byte bx+table] ; read the effect address from the table<br />
</syntaxhighlight><br />
<br />
==MIDI music part==<br />
This code section increments the timing value and produces sound. By chosing channel 3 we can reuse the "change channel instrument" instruction again as "RET". By shifting around the code of the effects, a reasonable value for the volume could be created and thus save another byte. Be aware that this code sequence only works if the MIDI device is already in UART mode, otherwise three more bytes have to be spent. Quite a few people from the audience and sceners told my afterwards that the melody sounds a bit like [https://en.wikipedia.org/wiki/Incantations_(album) Incantations from Mike Oldfield], but it is rather created by a simple approach. Starting with [https://www.pouet.net/prod.php?which=65604 "Hypnoteye" in 2015], I experimented with procedural MIDI and eventually released a [https://www.pouet.net/prod.php?which=66313 small MIDI framework (64 bytes)]. The basic principle is to jump through tonal space with fixed steps, and map high values back to lower ones (by modulo). With simple combinations of stepwidth and modvalue, interesting effects can be produced. For example stepwidth=3 for reduced minor chords which are never resolved, likewise stepwidth=4, or stepwidth=6 for tritonal effects. With a well chosen modvalue, these patterns can create progressions. I did not do an appropriate theoretical analysis yet, rather explored the tonal space and noted when something sounded interesting.<br />
<syntaxhighlight lang="nasm"><br />
sounds: db 0xc3, 11, 0x93, fx2-s<br />
...<br />
inc bp ; increment timing value<br />
test bp, 7 ; play a note every 8th step<br />
jnz nomuse ; quit if in between<br />
mov dx,0x330 ; port number for MIDI<br />
mov si,sounds ; adress for sound data<br />
outsb ; change instrument of channel 3<br />
outsb ; to vibraphone<br />
outsb ; play a note on channel 3<br />
imul ax,bp,-19*32*4 ; the magic melody constant<br />
shr ax,10 ; scale down and implicit "and 63"<br />
add al,22 ; pitch base is 22<br />
out dx,al ; play THIS note on channel 3<br />
outsb ; play it with THIS volume<br />
</syntaxhighlight><br />
==Further tweaks==<br />
To be done ;)<br />
==Differences between dos versions==<br />
To be done ;)<br />
==External Links==<br />
*[https://en.wikipedia.org/wiki/Demoscene Demoscene]<br />
*[https://www.youtube.com/watch?v=Imquk_3oFf4 Youtube Capture]<br />
*[https://www.twitch.tv/videos/589179638?t=7h02m40s Twitch Live Reactions]<br />
*[https://www.pouet.net/prod.php?which=85227 Production Page on Pouet]<br />
*[https://www.reddit.com/r/programming/comments/g1qhf8/memories_256_byte_msdos_intro/ Reddit/Programming Thread I]<br />
*[https://www.reddit.com/r/programming/comments/g4ewh7/how_its_made_memories_a_256_byte_dos_intro_with_8/ Reddit/Programming Thread II]<br />
*[https://github.com/cesarmiquel/memories-256b-msdos-intro Analysis by "cesarmiquel"]<br />
*[https://hackaday.com/2020/04/21/a-jaw-dropping-demo-in-only-256-bytes/ Hackaday Article]<br />
*[https://github.com/simpassi/cocoamemories Simpassis Port to MAC (TIGR)]<br />
<br />
==Original release code (complete)==<br />
<syntaxhighlight lang="nasm"> <br />
; "memories" by HellMood/DESiRE<br />
; the tiny megademo, 256 byte msdos intro<br />
; shown in April 2020 @ REVISION<br />
;<br />
; (= WILL BE COMMENTED IN DETAIL LATER =)<br />
;<br />
; create : nasm.exe memories.asm -fbin -o memories.com<br />
; CHOOSE YOUR TARGET PLATFORM (compo version is dosbox)<br />
; be sure to use the dosbox.conf from this archive!<br />
; only ONE of the defines should be active!<br />
%define dosbox ; size : 256 bytes<br />
;%define freedos ; size : 230 bytes<br />
;%define winxpdos ; size : 263 bytes<br />
<br />
; DON'T TOUCH THESE UNLESS YOU KNOW WHAT YOU'RE DOING<br />
%ifdef winxpdos<br />
%define music<br />
%define switch_uart<br />
%define safe_dx<br />
%define safe_segment<br />
%endif<br />
%ifdef freedos<br />
%define safe_dx<br />
%endif<br />
%ifdef dosbox<br />
%define music<br />
;%define safe_dx ; sometimes needed<br />
%endif<br />
<br />
; GLOBAL PARAMETERS, TUNE WITH CARE!<br />
%define volume 127 ; not used on dosbox (optimization)<br />
%define instrument 11<br />
%define scale_mod -19*32*4; <br />
%define time_mask 7<br />
%define targetFPS 35<br />
%define tempo 1193182/256/targetFPS <br />
%define sierp_color 0x2A<br />
%define tunnel_base_color 20<br />
%define tunnel_pattern 6<br />
%define tilt_plate_pattern 4+8+16<br />
%define circles_pattern 8+16<br />
<br />
org 100h<br />
s:<br />
%ifdef freedos<br />
mov fs,ax<br />
mov [fs:0x46c],ax<br />
%endif<br />
mov al,0x13<br />
int 0x10 <br />
xchg bp,ax<br />
push 0xa000-10<br />
pop es<br />
%ifndef freedos<br />
mov ax,0x251c<br />
%ifdef safe_dx <br />
mov dx,timer <br />
%else ; assume DH=1, mostly true on DosBox<br />
mov dl,timer<br />
%endif<br />
int 0x21<br />
%endif<br />
top:<br />
%ifdef freedos<br />
mov bp,[fs:0x46c]<br />
%endif <br />
mov ax,0xcccd<br />
mul di<br />
add al,ah<br />
xor ah,ah<br />
add ax,bp<br />
shr ax,9<br />
and al,15<br />
xchg bx,ax<br />
mov bh,1<br />
mov bl,[byte bx+table]<br />
call bx<br />
stosb<br />
inc di<br />
inc di<br />
jnz top<br />
mov al,tempo<br />
out 40h,al<br />
in al,0x60<br />
dec al<br />
jnz top<br />
sounds:<br />
db 0xc3 ; is MIDI/RET<br />
%ifdef music<br />
db instrument,0x93<br />
%ifdef switch_uart<br />
db volume ; without switch, volume is in table<br />
db 0x3f <br />
%endif<br />
%endif<br />
table: ; first index is volume, change order with care! <br />
db fx2-s,fx1-s,fx0-s,fx3-s,fx4-s,fx5-s,fx6-s,sounds-s,stop-s<br />
stop:<br />
pop ax<br />
ret<br />
timer:<br />
%ifndef freedos<br />
%ifdef safe_segment<br />
push cs<br />
pop ds<br />
%endif<br />
inc bp<br />
%ifdef music <br />
test bp, time_mask<br />
jnz nomuse<br />
mov dx,0x330<br />
mov si,sounds<br />
outsb<br />
outsb<br />
outsb<br />
imul ax,bp,scale_mod<br />
shr ax,10<br />
add al,22<br />
out dx,al<br />
outsb<br />
%ifdef switch_uart<br />
inc dx<br />
outsb<br />
%endif<br />
%endif<br />
nomuse:<br />
iret<br />
%endif <br />
fx0: ; tilted plane, scrolling<br />
mov ax,0x1329<br />
add dh,al<br />
div dh<br />
xchg dx,ax<br />
imul dl<br />
sub dx,bp<br />
xor ah,dl<br />
mov al,ah<br />
and al,tilt_plate_pattern<br />
ret<br />
fx2: ; board of chessboards<br />
xchg dx,ax<br />
sub ax,bp<br />
xor al,ah<br />
or al,0xDB<br />
add al,13h<br />
ret<br />
fx1: ; circles, zooming<br />
mov al,dh<br />
sub al,100<br />
imul al<br />
xchg dx,ax<br />
imul al<br />
add dh,ah<br />
mov al,dh<br />
add ax,bp<br />
and al,circles_pattern<br />
ret<br />
fx3: ; parallax checkerboards<br />
mov cx,bp<br />
mov bx,-16<br />
fx3L:<br />
add cx,di<br />
mov ax,819<br />
imul cx <br />
ror dx,1 <br />
inc bx <br />
ja fx3L<br />
lea ax,[bx+31] <br />
ret<br />
fx4: ; sierpinski rotozoomer <br />
lea cx,[bp-2048]<br />
sal cx,3<br />
movzx ax,dh<br />
movsx dx,dl<br />
mov bx,ax<br />
imul bx,cx<br />
add bh,dl<br />
imul dx,cx<br />
sub al,dh<br />
and al,bh<br />
and al,0b11111100<br />
salc ; VERY slow on dosbox, but ok<br />
jnz fx4q<br />
mov al,sierp_color<br />
fx4q:<br />
ret<br />
fx5: ; raycast bent tunnel<br />
mov cl,-9<br />
fx5L: <br />
push dx<br />
mov al,dh<br />
sub al,100<br />
imul cl<br />
xchg ax,dx <br />
add al,cl<br />
imul cl<br />
mov al,dh<br />
xor al,ah<br />
add al,4<br />
test al,-8<br />
pop dx<br />
loopz fx5L<br />
sub cx,bp<br />
xor al,cl<br />
aam tunnel_pattern; VERY slow on dosbox, but ok<br />
add al,tunnel_base_color<br />
ret<br />
fx6: ; ocean night / to day sky<br />
sub dh,120<br />
js fx6q<br />
mov [bx+si],dx<br />
fild word [bx+si]<br />
fidivr dword [bx+si]<br />
fstp dword [bx+si-1]<br />
mov ax,[bx+si]<br />
add ax,bp<br />
and al,128<br />
dec ax<br />
fx6q:<br />
ret<br />
</syntaxhighlight><br />
<br />
==Bonus - NFO/ASCII==<br />
<br />
<syntaxhighlight lang="nasm"> <br />
art : hammerfist<br />
∂#MW%e _d$Ng,<br />
'B, ∂b _jM@$QZb,cQ" )@<br />
,edRB$b,l@ Wk,yGR$KM&$b, ,dP" Wl ]bsd%UR8BG6&$@DSyG#ZKM&$b,<br />
,dP "T%L 'MGF "*∂R_ Tg "*4Zk,#I YP W" 7P "*∂R<br />
4M gd@ ^ ∂@ d@b dQ$#@Z@R3L_ "*GMj 'W ,gd$ d@b 9Q$#%b<br />
W#, `M Wb `* _4P `Qk *#N8L `H5 @b 'QR7YK `* _4F" Qk<br />
`6@L dML '@ ,BK 'M ∂B *b, '#L<br />
^QBb,_ _,4&M∞∂@=,_ _dGL _gQKM GL @k 'Mg,_ _dG,<br />
"*BN5W$2#MNP" "*G3WRM8&B5P"`Y@QNW3Z5P" ∂#$W8BRM3XZN87 "*GW38M%EBDW5P"`<br />
<br />
<br />
p r e s e n t s<br />
<br />
4<br />
d@,<br />
_& `Wl<br />
_,aP "#baedM$#@@K JP*"?ML<br />
,ad@$#P" ,d@NEWVB" X,aQPYb,_<br />
V@Mm,_ ,d@MW#BW' EMP" '¶R ,ngBP^fML<br />
¶M@N@y Y#BNW#M" J9" `MQ9" "MgRBq ,QBMg,<br />
VN#P` ,d@@ `WM@^ 7f ¶F` 7kY" ^G _.eQNE1.<br />
]B _G@MWN$, `P ' 4b QP ¶w@F*^ ^Qb<br />
]O@NRM#W@MNB, ; ^` j JP^ Yl<br />
J#NRNWM@#BcT"^ ,A _J _q@ `X<br />
'¶WM#B@WdY`,7 _G#YN#PM _,gG" M,<br />
*BN#WP" dK ,Q@NRMB"]9 , _,M@Q* #A<br />
"U^ V@h, iNBW#NT J' J9 s@QN" _; 'D,<br />
¶RMBv&NMQR@9 .W .K' "9` ,6BA _JL ]l<br />
Y#NE@W#NRP #[ `¶8 _d@MW#B_jW#W BN<br />
"GQ@MR#W QL_ *B _,p#NBW#NQMG@WY 3Q<br />
"Y@F ,XW@M%im,_Yb_ _,g5@#MW@QMNE@E@NRMB ,WM<br />
` _,gP*"#REM#GB@N#MQbnd@N#M@MW#R8QSB^'WQERM@ ;4NB,<br />
,GYKL ¶E#B8R8QSB@M@#BM#W@MNB"`_ , "^` N ,dW@Ql<br />
_Q`'W`*t '¶@GS#MBQ#E@W#NQBW[ 'LvQ_ ,K _dNABGM#N<br />
,F ' `^WAB@QGE9*"9^*@L jP7FY, ¶h,_.jWM#BR#GBM,<br />
J; , _ ' ' "LL YxE#B8R8QSBNW@W;<br />
AP _,Ag6^ _ J ¶A `"Q#M@MW#R8E#P<br />
j@ `"XQW[ 'LvK,_ 'L_,/ @t Y#NE@WNR"<br />
:M/ 9^*@L jP7F" _PYKL _,A; ¶RSNQ"<br />
dKL ' ` ' "L "`'W`*t `"XQb `W^<br />
Q`8t 'L_,/ , _ ' 9^Q<br />
,W _PYKL _,Ag6^ ' W, _ ,#N&<br />
!N _ J "`'W`*t `"XQW[ _ J N!_JG9^RwQ' *t<br />
`W, 'LvK,_ ' _gGB8@L _ 'LvK,_ ,WgB' V 7L<br />
_.,gm&@B&wBZF" j@'` "WL _gML jZd7Yb lN" dBWl<br />
,g&QB*"^` `"*G@g, .gR&k,_ ,N" '@QF ¶k;gMF *QvQ jQ, ,@N@B#,<br />
.eQF*` `Yb@" "*6Qg,gF , 7 XMN" 'MNB, ^¶QWSER@N;<br />
,gP" qy, W' ^Q' &L ,g@W' `QMEL `"WBNWP<br />
g7 ¶9 ,X M? 9" _q8MSK ¶EMt *@K<br />
Vh _,m#L _AH le ,GBDNE9^A, *@F NMg<br />
¶L,qQ@ND _.m@Bl We ,gM@B8#Q' ¶h_ lWE,<br />
W9NHW@` JWM#B@] @e 4WR@NGF^ 'QL dRWl<br />
VMd* "@BE@PM 'N *UP" VW, JRSB;<br />
,@F j `¶WK W, ¶t XNt _A@E#N<br />
_JP ,6& "GLdM XD, _.g8NMA@k,_ _,gG#NMGR;<br />
"Z .JRER 'VMi _jNB#W&_ _,j@E@W#Nl ¶MBGMNQGNQMG@QBW9<br />
¶h ,G@NRMBl `" ,d#R@M$F ¶Mg,_.gp&@@NEWVBWBMG *QMN8R8SBN$E@WF<br />
Vb dW#R8QSRb, *YM@EQ,_ 'MENBW#NQMG#B@R@MW#l "BM@QNENRQG'<br />
*WGS#MBMNEYL `^"*8M@Q@NRM#W@BWSNW@QBF"` `^*@QBF^ [HFT]<br />
^M@MW#Q9 ^Wt `^¶RQ@W8NQGP*`<br />
¶Q#@P Vk lA `"^`<br />
Y" `MA J#,<br />
*R@, ,MQl<br />
Y#Wk, GWM8L<br />
W8RQSt,_ AQ@MR#,<br />
`@M@#SB@Mbm.,_ QNBW#NW<br />
¶QB8R8SBN$WNRM@#GNtwg@NMQR@B'<br />
*MBQ#8R8QS@NE@WNBW#NQMG@NR;<br />
`WGS#MBQ#R8QSB@NE@W#NQBW9<br />
*OMW@QMNE@E@NRMW@QMB@*<br />
`^"YQW@Q#SB#NE@EGP<br />
`^"*8R@GBQF`<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Memories&diff=556Memories2020-05-05T07:54:18Z<p>HellMood: </p>
<hr />
<div>==Update==<br />
The final freedos version is completed, you can [https://www.youtube.com/watch?v=wlW84fEHngM watch it here]. It includes the Amiga Ball as an extra effect, but lacks the music, since a normal modern PC booted to freedos can not play MIDI that easily. The filesize is still 256 bytes.<br />
==Introduction==<br />
<br />
Hello! My name is "HellMood" and this is not a usual wiki style article. It will be written from my point of view, in first person. This article is all about a tiny MS DOS program called "Memories". This program has a size of 256 bytes and won the "PC 256 byte" competition of the [https://en.wikipedia.org/wiki/Demoscene demoscene] event [https://2020.revision-party.net/ "Revision"] in 2020, as well as the public choice award. You can watch a video of the output of this program here (https://www.youtube.com/watch?v=Imquk_3oFf4) and a video including live reactions of online audience and moderators here (https://www.twitch.tv/videos/589179638?t=7h02m40s). You can download the release and comment here (https://www.pouet.net/prod.php?which=85227). This wiki article will give you an indepth analysis of the program and point out historical references as well as development steps. In case you did not notice, you are now on the "sizecoding wiki". This will help you not only to understand the interior of "Memories", but also enable you to produce something like this yourself. Feel free to look around! It is recommended that you start with the [http://www.sizecoding.org/wiki/Getting_Started basics of this wiki] first if you're new to sizecoding and/or x86 assembler. While the principles might be easy to grasp, details of the implementation might not.<br />
<br />
==Disclaimer==<br />
This article discusses the submitted DosBox version (256 bytes). The available archive contains versions for FreeDos and Windows XP DOS which, at the time of this writing, don't work on all computers. These alternative versions were included as proof of concept, to show that this program does not only run on the emulator "DosBox". For the "PC 256 bytes" competition of the event "Revision" in 2020, it was allowed to specify either "FreeDos" or "DosBox" as platform, the latter with a specific configuration. It is in fact possible to modify the DosBox version to work on FreeDos, MS DOS, WinXP and Win98 as the released alternative versions showcase, but this is not the focus of this article. Anyway, safe versions for all platform and all computers are in the works. Regarding possible further optimizations, I will discuss/explain only the submitted version although i already found several spots in the code that can be optimized. As the history of tiny effects show, almost no code is perfect, and can be brought down further in size. To avoid confusion between different versions, only the submitted version will be explained.<br />
<br />
==History of tiny effects==<br />
[[File:Pouet categories.png|thumb|Size categories on http://www.pouet.net]]<br />
Sizecoders think in size categories. For MS DOS, these categories are 256b, 128b, 64b and 32b. These are the standards of one of the biggest demoscene archives, http://www.pouet.net. There is no 16b category, although many tiny effects can be implemented in 16 bytes. Almost all of the effects of "Memories" have been coded and optimized by me before, and mostly the implementations were attempts to reduce an already existing effect in size, or do something similar in less size. Reducing the size in this case means, reducing it to the (one of the) next lower 2^N category. For example I might have seen an effect done in 33 to 64 bytes, and then reduced the size to 32 bytes or less. Almost everytime I moved an effect to a lower category, I submitted the resulting tiny program to a demoscene party which allowed remote entries in the 256b category and/or published the result to http://www.pouet.net . In this section I will introduce the effects and mention the origins and authors.<br />
===Array of chessboards===<br />
[[File:Kasparov.gif|frame|kasparov, 16 bytes]]<br />
The origin of this effect is my own "Kasparov 16b" from 2018 (https://www.pouet.net/prod.php?which=75912) apparently I "had this piece lying around, stuck at 17 bytes and not that beautiful, until i found a trick ;)" (quote from the release notes) There was a similar effect in 32 bytes before : "ew" by "headcrash" from 2003. (https://www.pouet.net/prod.php?which=17567) In this case, my design goal was to show "real" chessboards of 8x8 fields and recognizable dark and light squares, as well as the correct orientation of the single chessboards, meaning that the bottom right corner (h1) has to be a [https://en.wikibooks.org/wiki/Chess/Arranging_The_Board white square]. For "Memories", the effect had to be reimplemented with another approach of writing to the screen to work with the framework, also the scrolling direction was changed to be different from the "scrolling tilted plane" effect.<br />
<br />
===Zooming circles===<br />
The zooming circles were supposed to be entered to a demoparty as a 32 bytes intro, but I never actually did it. There is no real predecessor in 64 bytes, because in the 64b category much more complex effects are possible. The zooming circles were the result of desperately trying to reach the 32b category for a circular "tunnel" effect, for that my own record is still 52 bytes ("Neontube" - 2016)(https://www.pouet.net/prod.php?which=66808) which in turn was the optimization of the all-time-classic 64 byte effect "constant evolution" from ryg/Farbrausch (2003) (https://www.pouet.net/prod.php?which=8697). In the zooming circles routine, the distance and the angle are removed/ignored, so that the 32b category could be reached.<br />
<br />
===Scrolling tilted plane===<br />
[[File:Floorcast variation.png|thumb|Floorcast, 32 bytes version, variation]]<br />
The scrolling tilted plane is one of my own releases, "floorcast 32b" from 2018. Floorcasts have its own history in sizecoding, the effect was brought down stepwise from 256 bytes to 32 bytes. The versions differ in the number of planes, some show two planes, others only one. For the "floorcast 32b" release I specifically decided against the "XOR" texture, whereas in "Memories" I used it again, but disguised with another final touch of "AND".<br />
* "rain_storm version 2008 - 256 bytes - https://www.pouet.net/prod.php?which=50479<br />
* "org_100h" version 2008 - 128 bytes - https://www.pouet.net/prod.php?which=63271<br />
* "Baudsurfer" version 2013 - 86 bytes - https://www.pouet.net/prod.php?which=61070<br />
* "Baudsurfer" version 2014 - 64 bytes - https://www.pouet.net/prod.php?which=63305<br />
* "HellMood" version 2018 - 32 bytes - https://www.pouet.net/prod.php?which=77774<br />
<br />
===Parallax checkerboards===<br />
[[File:Projektbeschreibung.gif|frame|Projektbeschreibung, 32 bytes]]<br />
The parallax checkerboards have been released by me as 32 bytes effect [https://www.pouet.net/prod.php?which=76788 "Projektbeschreibung"] in 2018. It was the direct attempt to bring [https://www.pouet.net/prod.php?which=28543 "Follow the light"] from "Digimind" (2006) or my own [https://www.pouet.net/prod.php?which=63293 "Lucy"] (2014) down to 32 bytes. Very helpful inspirations came from [https://www.pouet.net/prod.php?which=32281 "Rrrolas" "Paralaxa"] (32 bytes, 2007) and [https://www.pouet.net/prod.php?which=61075 "Sensenstahl"s "Byteropolis"] (32 bytes, 2013). In fact Rrrolas rendering approach was already close to the final solution, the code was modified to correct the planes arrangement, change the shape from triangles to checkerboards and improve on the colors. In "Memories", the coloring of "Digimind"s version was used. Also, the effect has been modified to reduce shearing as much as possible.<br />
<br />
===Sierpinski rotozoomer===<br />
[[File:Colpinski.png|thumb|colpinski, 16 bytes]]<br />
[[File:Rotastic.png|thumb|left|rotastic, 32 bytes]]<br />
This consists of two effects, the rotozoomer and the sierpinski effect as a texture. The sierpinski effect is based on my own "Colpinski 16b" from 2013 (https://www.pouet.net/prod.php?which=62079) where "frag" of "fsqrt" helped pushing the limits. There is no real predecessor to that, because it is known that this effect can be done directly by combining X and Y, instead of an Iterated Function System or Cellular Automaton approach. The rotozoomer was released as 32b intro "rotastic" by me in 2017 (https://www.pouet.net/prod.php?which=70112), it is based on the ideas of "ryg" of "Farbrausch" (51 bytes, 2002, https://www.pouet.net/prod.php?which=5015) and "Gargaj" of "Conspiracy" (49 bytes, 2002, https://www.pouet.net/prod.php?which=6088).<br />
<br />
===Raycast bent tunnel===<br />
[[File:Intoanew.gif|frame|Into a new era, 64 bytes version]]<br />
The raycast bent tunnel is a tweaked version of my own 64 bytes "Into a new era" release (2018, https://www.pouet.net/prod.php?which=78044). The custom colors were exchanged for colors of the standard palette, the geometry was tweaked a bit and changed accordingly, so that no depth effect was used for the calculation of texture values. The standalone version of this effect has a size of 50 bytes. The two main inspirations for the 64 bytes version were two 128 byte intros, "Spongy" by "TBC" (2009, https://www.pouet.net/prod.php?which=53871) and "Wolf128" from "Baudsurfer" (2014, https://www.pouet.net/prod.php?which=63518), while the particular algorithm was developed independently.<br />
<br />
===Ocean night to day===<br />
[[File:Ocean.png|thumb|Ocean, 64b version]]<br />
The ocean effect is based on my own 64 bytes release "Ocean" from 2016 (https://www.pouet.net/prod.php?which=66857). The custom color generation and music generation have been stripped from the original, both were not compatible with the main framework of "Memories" without consuming a LOT of extra space. The special effect of "dawn" happens as a byproduct of the general framework approach. This will be explained in the next chapter.<br />
<br />
===Fading effect===<br />
The transition between two effects is an effect itself, which has no real predecessor. It is rather an idea that evolved over the years, and might have been done by many others in similar ways. In a nutshell, while calculating a frame, each pixel position is randomized and the time - which determines the effect to use - is offset by this randomization value which is downscaled before. This allows the usage of the standard VGA palette (picture, source) rather than creating custom colors for real crossfading and thus saves space.<br />
<br />
==Tiny megademo framework==<br />
The tiny megademo framework is based on ideas of [https://www.pouet.net/prod.php?which=68020 "Quatro"], done by "Sensenstahl" and me. For putting many tiny effects together into a "megademo", they have all to use the same approach and they have to be mostly free of assumptions (contents of memory and registers). Also they have to use the same timing values and be tuned to a shared timing in general. Preparing the single effects to be includeable in a framework like this took quite some time and required a lot of extra space initially. It has to be noted, that - sadly - some of my most impressive effects (judged by audience reactions and views on social media) could not be included at all, because the approaches were not combineable without a HUGE overhead. Once all effects were streamlined, I could think about "factoring" out common calculations which repeatedly occur, resulting in a few saved bytes. The framework does the following:<br />
* Setting up screen mode to 320 x 200 pixels in 256 colors<br />
* Initializing a pointer to the screen<br />
* Installing a callback for timing and music<br />
* Mainloop<br />
** Calculation of X and Y from the current screen location<br />
*** with the [http://www.sizecoding.org/wiki/General_Coding_Tricks#Obtaining_X_and_Y_without_DIV_.28The_Rrrola_Trick.29 "Rrrola Trick"]<br />
** P = Pseudorandom value from screen location<br />
** Offset current time T by a scaled value P'<br />
** Selection of effect number N according to T'<br />
** Execution of effect N (X,Y) for current pixel<br />
*** On entrance, XY is in register DX (DL,DH)<br />
*** On entrance, the current time is in register BP<br />
*** On exit, the pixel color is expected in AL<br />
** Advance to next pixel<br />
** Triple diagonal interlacing for smoothing<br />
** Repeat until frame is complete<br />
* Set timer to a tempo of ~35 FPS<br />
* Check keyboard for ESC<br />
** Quit on ESC, otherwise continue<br />
=== Code of framework===<br />
<syntaxhighlight lang="nasm"><br />
org 100h<br />
s:<br />
mov al,0x13 ; set AL to mode 320*200 in 256 colors<br />
int 0x10 ; call BIOS to set mode<br />
xchg bp,ax ; set timing value to 0x13 <br />
push 0xa000-10 ; write the screen adress to register ES<br />
pop es ; works in conjunction with Rrrola trick<br />
mov ax,0x251c ; parameter for changing timer interrupt<br />
mov dl,timer ; adress of timer routine, assume DH=1<br />
int 0x21 ; install timer routine<br />
top:<br />
mov ax,0xcccd ; load magic Rrrola constant<br />
mul di ; transform screen pointer to X, Y<br />
add al,ah ; use transformation garbage as<br />
xor ah,ah ; pseudorandom value and clear AH<br />
add ax,bp ; add time value to random value<br />
shr ax,9 ; divide by 512 (basically the speed)<br />
and al,15 ; filter effect number<br />
xchg bx,ax ; move effect number to BX<br />
mov bh,1 ; reset BH to align with start of code<br />
mov bl,[byte bx+table] ; read the effect address from the table<br />
call bx ; call the effect<br />
stosb ; write the return value and advance<br />
inc di ; triple interlace trick for after<br />
inc di ; effect and smoothing the animation<br />
jnz top ; repeat until the frame is complete<br />
mov al,tempo ; set AL to divider for timer<br />
out 40h,al ; set timing (dual pass)<br />
in al,0x60 ; read keyboard<br />
dec al ; quit on ESC<br />
jnz top ; otherwise repeat loop<br />
sounds: db 0xc3, 11, 0x93; 0xc3 is MIDI/RET; fx2-s is used as volume<br />
table: db fx2-s,fx1-s,fx0-s,fx3-s,fx4-s,fx5-s,fx6-s,sounds-s,stop-s<br />
<br />
</syntaxhighlight><br />
<br />
==Explanation of effects==<br />
===Array of chessboards===<br />
[[File:Array of chessboards.png|thumb|array of chessboards]]<br />
The most simple effect and a good one to start with. After offsetting the row by the time, the classic XOR pattern is applied. To achieve the impression of a grid of chessboards, all bits but two are set in the color. The real trick lies in shifting to a "good" palette spot. The portion of the frame which is perceived as black, isn't actually black, it's the dark part of the [https://en.wikipedia.org/wiki/Video_Graphics_Array#Color_palette standard VGA palette]. Shifting that way also makes the dark and light square look like they are from an ancient old chessboard.<br />
<syntaxhighlight lang="nasm"> <br />
xchg dx,ax ; get XY into AX<br />
sub ax,bp ; subtract time from row<br />
xor al,ah ; XOR pattern (x xor y)<br />
or al,0xDB ; pattern for array of boards<br />
add al,13h ; shift to good palette spot<br />
</syntaxhighlight><br />
<br />
===Zooming circles===<br />
[[File:Zooming circles.png|thumb|zooming circles]]<br />
The distance D of a point (X,Y) to the center (0,0) is sqrt(X²+Y²). The framework already makes sure the DL contains a centered X coordinate, DH=Y has to be centered though. Performing a square root on x86 requires quite a bit of code, but this actually can be omitted. With a clever selection of colors, the impression of zooming circles without square root is quite decent.<br />
<syntaxhighlight lang="nasm"> <br />
mov al,dh ; get Y in AL<br />
sub al,100 ; align Y vertically<br />
imul al ; AL = Y²<br />
xchg dx,ax ; Y²/256 in DH, X in AL<br />
imul al ; AL = X²<br />
add dh,ah ; DH = (X² + Y²)/256<br />
mov al,dh ; AL = (X² + Y²)/256<br />
add ax,bp ; offset color by time<br />
and al,8+16 ; select special rings<br />
</syntaxhighlight><br />
<br />
===Scrolling tilted plane===<br />
[[File:Tiltplane.png|thumb|scrolling tilted plane]]<br />
This effect is achieved as follow. First, a large constant is divided by the row number Y, to simulate distance. Then the resulting value is used twice, a) scaled by a centered X value and b) offset by current time. These results are then combined with the XOR pattern of which a special pattern is selected.<br />
<syntaxhighlight lang="nasm"> <br />
mov ax,0x1329 ; initialize with constant<br />
add dh,al ; preventing divide overflow<br />
div dh ; reverse divide AL = C/Y'<br />
xchg dx,ax ; DL = C/Y', AL = X<br />
imul dl ; AH = CX/Y'<br />
sub dx,bp ; DL = C/Y'-T <br />
xor ah,dl ; AH = (CX/Y') ^ (C/Y'-T)<br />
mov al,ah ; move to AL<br />
and al,4+8+16 ; select special pattern<br />
</syntaxhighlight><br />
<br />
===Parallax checkerboards===<br />
[[File:Paralax.png|thumb|parallax checker boards]]<br />
This is kind of raycasting with dynamic geometry. Each object plane is split horizontally because of the signed 16 bit multiplication and vertically because of an implicit logic column number operation. Additionally, implicit shearing is applied, so that the edges of the resulting grid (4 "solid" patches, 4 "transparent" patches, alternating) connect. If the ray hits one of the solid patches, the color becomes the iteration number (+palette offset into grayscale), if not, the plane is shifted by the screenpointer and the process is repeated until the maximum iteration number is reached.<br />
<syntaxhighlight lang="nasm"><br />
mov cx,bp ; set inital point to time<br />
mov bx,-16 ; limit to 16 iterations<br />
fx3L:<br />
add cx,di ; offset point by screenpointer<br />
mov ax,819 ; magic, related to Rrrola constant<br />
imul cx ; get X',Y' in DX<br />
ror dx,1 ; set carry flag on "hit"<br />
inc bx ; increment iteration count<br />
ja fx3L ; loop until "hit" or "iter=max"<br />
lea ax,[bx+31] ; map value to standard gray scale<br />
</syntaxhighlight><br />
<br />
===Sierpinski rotozoomer===<br />
[[File:Sierpinski rotozoomer.png|thumb|sierpinski rotozoomer]]<br />
[[File:1 div cos(atan(x)).png|thumb|1/cos(atan(x)), made with www.google.com]]<br />
Rotozooming normally requires trigonometric functions, or good estimations of them. Let's look at the [https://en.wikipedia.org/wiki/Rotation_matrix#In_two_dimensions usual 2D rotation equation] and interpret it in a special way to get rid of the trigonometry in the calculation:<br />
<syntaxhighlight lang="nasm"><br />
x' = x*cos(a) - y*sin(a)<br />
y' = x*sin(a) + y*cos(a)<br />
</syntaxhighlight><br />
If we add zooming to it, it looks like this:<br />
<syntaxhighlight lang="nasm"><br />
x' = z * (x*cos(a) - y*sin(a))<br />
y' = z * (x*sin(a) + y*cos(a))<br />
</syntaxhighlight><br />
Now let's assume, we don't specify z ourself, and factor out cos(a):<br />
<syntaxhighlight lang="nasm"><br />
x' = cos(a) * (x - y*tan(a))<br />
y' = cos(a) * (x*tan(a) + y)<br />
</syntaxhighlight><br />
Finally, let's substitute tan(a) with our time variable, since the tangens function rises to infinity the closer we get to +-90°:<br />
<syntaxhighlight lang="nasm"><br />
x' = cos(atan(T)) * (x - y*T)<br />
y' = cos(atan(T)) * (x*T + y)<br />
</syntaxhighlight><br />
As long as we don't care about specifying the zoomfactor and don't directly control the angle, we can now rotate between -90° and +90° without using trigonometric functions. As a byproduct, the zoomfactor is now tied to the time T. The function for the zoomfactor is shown in the image, it results in zooming from infinitely small to one (original) and back to infinitely small. Quite a few bytes have been spent to beautify the effect, correct the timing offset, speeding up the animation, enlarging the sierpinski pixels and have decent colors, but I thought it's worth the space.<br />
<syntaxhighlight lang="nasm"><br />
lea cx,[bp-2048]; center time to pass zero<br />
sal cx,3 ; speed up by factor 8!<br />
movzx ax,dh ; get X into AL<br />
movsx dx,dl ; get Y int DL<br />
mov bx,ax ; save X in BX<br />
imul bx,cx ; BX = X*T<br />
add bh,dl ; BH = X*T/256+Y<br />
imul dx,cx ; DX = Y*T<br />
sub al,dh ; AL = X-Y*T/256<br />
and al,bh ; AL = (X-Y*T/256)&(X*T/256+Y)<br />
and al,252 ; thicker sierpinski<br />
salc ; set pixel value to black<br />
jnz fx4q ; leave black if not sierpinski<br />
mov al,0x2A ; otherwise: a nice orange<br />
fx4q:<br />
</syntaxhighlight><br />
<br />
===Raycast bent tunnel===<br />
[[File:Raycastbenttunnel.png|thumb|raycast bent tunnel]]<br />
This is a variation of a the [https://www.pouet.net/prod.php?which=78044 raycast tunnel "into a new era"] (see also above). A [https://www.reddit.com/r/tinycode/comments/de7dj9/moving_cubes_with_pathtracing_lighting_and/ detailed explanation] has been posted to reddit for the [https://www.pouet.net/prod.php?which=83204 similar production "Essence"]. The custom colors have been removed, the bend direction is different, and the geometry is a bit more closed, to increase the performance on slower computers and DosBox.<br />
<syntaxhighlight lang="nasm"><br />
mov cl,-9 ; start with depth 9 (moves backwards)<br />
fx5L: <br />
push dx ; save DX, destroyed inside the loop<br />
mov al,dh ; Get Y into AL<br />
sub al,100 ; Centering Y has to be done "manually".<br />
imul cl ; Multiply AL=Y by the current distance, to get a projection(1)<br />
xchg ax,dx ; Get X into AL, while saving the result in DX (DH)<br />
add al,cl ; add distance to projection, (bend to the right)<br />
imul cl ; Multiply AL=X by the current distance, to get a projection(2)<br />
mov al,dh ; Get projection(1) in AL<br />
xor al,ah ; combine with projection(2)<br />
add al,4 ; center the walls around 0<br />
test al,-8 ; check if the wall is hit<br />
pop dx ; restore DX<br />
loopz fx5L ; repeat until "hit" or "iter=max"<br />
sub cx,bp ; offset depth by time<br />
xor al,cl ; XOR pattern for texture <br />
aam 6 ; irregular pattern with MOD 6<br />
add al,20 ; offset into grayscale palette<br />
</syntaxhighlight><br />
<br />
===Ocean night to day===<br />
[[File:Oceannight.png|thumb|left|oceannight]]<br />
[[File:Oceanday.png|thumb|right|oceanday]]<br />
The ocean effect is a wonderful example for a [https://en.wikipedia.org/wiki/Bob_Ross#Style "happy accident"]. If you load a value into the FPU as integer, and store it as floating point value, and interpret it as integer again, you get a cool pattern. Combine that with a reverse divide and you get a neat wavy effect. This works in tight conjunction with the register DX, which has a sign flip at the position we want, so we can easily separate sky and sea. The final touch is the coloring though. By design of the global framework, the value of AL is determined on function entrance, it contains the effect adress. With a bit of code shuffling, the color of the sky came "for free", without using instructions, as well as the "fading to day"-color, which is also an effect adress. And that is no happy accident. In other versions than the original submitted DosBox version, the sky color can vary for that reason.<br />
<syntaxhighlight lang="nasm"><br />
sub dh,120 ; check if pixel is in the sky<br />
js fx6q ; quit if that's the case<br />
mov [bx+si],dx ; move XY to a memory location<br />
fild word [bx+si] ; read memory location as integer<br />
fidivr dword [bx+si]; reverse divide by constant<br />
fstp dword [bx+si-1]; store result as floating point<br />
mov ax,[bx+si] ; get the result into AX<br />
add ax,bp ; modify color by time<br />
and al,128 ; threshold into two bands<br />
dec ax ; beautify colors to blue/black<br />
</syntaxhighlight><br />
<br />
===Fading effect===<br />
For convenience, the fading effect from the global framework, isolated. Basically it is generating a pseudorandom value from the screenpointer, then offsetting the time by a scaled value of it, then calling the destined effect.<br />
<syntaxhighlight lang="nasm"><br />
mov ax,0xcccd ; load magic Rrrola constant<br />
mul di ; transform screen pointer to X, Y<br />
add al,ah ; use transformation garbage as<br />
xor ah,ah ; pseudorandom value and clear AH<br />
add ax,bp ; add time value to random value<br />
shr ax,9 ; divide by 512 (basically the speed)<br />
and al,15 ; filter effect number<br />
xchg bx,ax ; move effect number to BX<br />
mov bh,1 ; reset BH to align with start of code<br />
mov bl,[byte bx+table] ; read the effect address from the table<br />
</syntaxhighlight><br />
<br />
==MIDI music part==<br />
This code section increments the timing value and produces sound. By chosing channel 3 we can reuse the "change channel instrument" instruction again as "RET". By shifting around the code of the effects, a reasonable value for the volume could be created and thus save another byte. Be aware that this code sequence only works if the MIDI device is already in UART mode, otherwise three more bytes have to be spent. Quite a few people from the audience and sceners told my afterwards that the melody sounds a bit like [https://en.wikipedia.org/wiki/Incantations_(album) Incantations from Mike Oldfield], but it is rather created by a simple approach. Starting with [https://www.pouet.net/prod.php?which=65604 "Hypnoteye" in 2015], I experimented with procedural MIDI and eventually released a [https://www.pouet.net/prod.php?which=66313 small MIDI framework (64 bytes)]. The basic principle is to jump through tonal space with fixed steps, and map high values back to lower ones (by modulo). With simple combinations of stepwidth and modvalue, interesting effects can be produced. For example stepwidth=3 for reduced minor chords which are never resolved, likewise stepwidth=4, or stepwidth=6 for tritonal effects. With a well chosen modvalue, these patterns can create progressions. I did not do an appropriate theoretical analysis yet, rather explored the tonal space and noted when something sounded interesting.<br />
<syntaxhighlight lang="nasm"><br />
sounds: db 0xc3, 11, 0x93, fx2-s<br />
...<br />
inc bp ; increment timing value<br />
test bp, 7 ; play a note every 8th step<br />
jnz nomuse ; quit if in between<br />
mov dx,0x330 ; port number for MIDI<br />
mov si,sounds ; adress for sound data<br />
outsb ; change instrument of channel 3<br />
outsb ; to vibraphone<br />
outsb ; play a note on channel 3<br />
imul ax,bp,-19*32*4 ; the magic melody constant<br />
shr ax,10 ; scale down and implicit "and 63"<br />
add al,22 ; pitch base is 22<br />
out dx,al ; play THIS note on channel 3<br />
outsb ; play it with THIS volume<br />
</syntaxhighlight><br />
==Further tweaks==<br />
To be done ;)<br />
==Differences between dos versions==<br />
To be done ;)<br />
==External Links==<br />
*[https://en.wikipedia.org/wiki/Demoscene Demoscene]<br />
*[https://www.youtube.com/watch?v=Imquk_3oFf4 Youtube Capture]<br />
*[https://www.twitch.tv/videos/589179638?t=7h02m40s Twitch Live Reactions]<br />
*[https://www.pouet.net/prod.php?which=85227 Production Page on Pouet]<br />
*[https://www.reddit.com/r/programming/comments/g1qhf8/memories_256_byte_msdos_intro/ Reddit/Programming Thread I]<br />
*[https://www.reddit.com/r/programming/comments/g4ewh7/how_its_made_memories_a_256_byte_dos_intro_with_8/ Reddit/Programming Thread II]<br />
*[https://github.com/cesarmiquel/memories-256b-msdos-intro Analysis by "cesarmiquel"]<br />
*[https://hackaday.com/2020/04/21/a-jaw-dropping-demo-in-only-256-bytes/ Hackaday Article]<br />
<br />
==Original release code (complete)==<br />
<syntaxhighlight lang="nasm"> <br />
; "memories" by HellMood/DESiRE<br />
; the tiny megademo, 256 byte msdos intro<br />
; shown in April 2020 @ REVISION<br />
;<br />
; (= WILL BE COMMENTED IN DETAIL LATER =)<br />
;<br />
; create : nasm.exe memories.asm -fbin -o memories.com<br />
; CHOOSE YOUR TARGET PLATFORM (compo version is dosbox)<br />
; be sure to use the dosbox.conf from this archive!<br />
; only ONE of the defines should be active!<br />
%define dosbox ; size : 256 bytes<br />
;%define freedos ; size : 230 bytes<br />
;%define winxpdos ; size : 263 bytes<br />
<br />
; DON'T TOUCH THESE UNLESS YOU KNOW WHAT YOU'RE DOING<br />
%ifdef winxpdos<br />
%define music<br />
%define switch_uart<br />
%define safe_dx<br />
%define safe_segment<br />
%endif<br />
%ifdef freedos<br />
%define safe_dx<br />
%endif<br />
%ifdef dosbox<br />
%define music<br />
;%define safe_dx ; sometimes needed<br />
%endif<br />
<br />
; GLOBAL PARAMETERS, TUNE WITH CARE!<br />
%define volume 127 ; not used on dosbox (optimization)<br />
%define instrument 11<br />
%define scale_mod -19*32*4; <br />
%define time_mask 7<br />
%define targetFPS 35<br />
%define tempo 1193182/256/targetFPS <br />
%define sierp_color 0x2A<br />
%define tunnel_base_color 20<br />
%define tunnel_pattern 6<br />
%define tilt_plate_pattern 4+8+16<br />
%define circles_pattern 8+16<br />
<br />
org 100h<br />
s:<br />
%ifdef freedos<br />
mov fs,ax<br />
mov [fs:0x46c],ax<br />
%endif<br />
mov al,0x13<br />
int 0x10 <br />
xchg bp,ax<br />
push 0xa000-10<br />
pop es<br />
%ifndef freedos<br />
mov ax,0x251c<br />
%ifdef safe_dx <br />
mov dx,timer <br />
%else ; assume DH=1, mostly true on DosBox<br />
mov dl,timer<br />
%endif<br />
int 0x21<br />
%endif<br />
top:<br />
%ifdef freedos<br />
mov bp,[fs:0x46c]<br />
%endif <br />
mov ax,0xcccd<br />
mul di<br />
add al,ah<br />
xor ah,ah<br />
add ax,bp<br />
shr ax,9<br />
and al,15<br />
xchg bx,ax<br />
mov bh,1<br />
mov bl,[byte bx+table]<br />
call bx<br />
stosb<br />
inc di<br />
inc di<br />
jnz top<br />
mov al,tempo<br />
out 40h,al<br />
in al,0x60<br />
dec al<br />
jnz top<br />
sounds:<br />
db 0xc3 ; is MIDI/RET<br />
%ifdef music<br />
db instrument,0x93<br />
%ifdef switch_uart<br />
db volume ; without switch, volume is in table<br />
db 0x3f <br />
%endif<br />
%endif<br />
table: ; first index is volume, change order with care! <br />
db fx2-s,fx1-s,fx0-s,fx3-s,fx4-s,fx5-s,fx6-s,sounds-s,stop-s<br />
stop:<br />
pop ax<br />
ret<br />
timer:<br />
%ifndef freedos<br />
%ifdef safe_segment<br />
push cs<br />
pop ds<br />
%endif<br />
inc bp<br />
%ifdef music <br />
test bp, time_mask<br />
jnz nomuse<br />
mov dx,0x330<br />
mov si,sounds<br />
outsb<br />
outsb<br />
outsb<br />
imul ax,bp,scale_mod<br />
shr ax,10<br />
add al,22<br />
out dx,al<br />
outsb<br />
%ifdef switch_uart<br />
inc dx<br />
outsb<br />
%endif<br />
%endif<br />
nomuse:<br />
iret<br />
%endif <br />
fx0: ; tilted plane, scrolling<br />
mov ax,0x1329<br />
add dh,al<br />
div dh<br />
xchg dx,ax<br />
imul dl<br />
sub dx,bp<br />
xor ah,dl<br />
mov al,ah<br />
and al,tilt_plate_pattern<br />
ret<br />
fx2: ; board of chessboards<br />
xchg dx,ax<br />
sub ax,bp<br />
xor al,ah<br />
or al,0xDB<br />
add al,13h<br />
ret<br />
fx1: ; circles, zooming<br />
mov al,dh<br />
sub al,100<br />
imul al<br />
xchg dx,ax<br />
imul al<br />
add dh,ah<br />
mov al,dh<br />
add ax,bp<br />
and al,circles_pattern<br />
ret<br />
fx3: ; parallax checkerboards<br />
mov cx,bp<br />
mov bx,-16<br />
fx3L:<br />
add cx,di<br />
mov ax,819<br />
imul cx <br />
ror dx,1 <br />
inc bx <br />
ja fx3L<br />
lea ax,[bx+31] <br />
ret<br />
fx4: ; sierpinski rotozoomer <br />
lea cx,[bp-2048]<br />
sal cx,3<br />
movzx ax,dh<br />
movsx dx,dl<br />
mov bx,ax<br />
imul bx,cx<br />
add bh,dl<br />
imul dx,cx<br />
sub al,dh<br />
and al,bh<br />
and al,0b11111100<br />
salc ; VERY slow on dosbox, but ok<br />
jnz fx4q<br />
mov al,sierp_color<br />
fx4q:<br />
ret<br />
fx5: ; raycast bent tunnel<br />
mov cl,-9<br />
fx5L: <br />
push dx<br />
mov al,dh<br />
sub al,100<br />
imul cl<br />
xchg ax,dx <br />
add al,cl<br />
imul cl<br />
mov al,dh<br />
xor al,ah<br />
add al,4<br />
test al,-8<br />
pop dx<br />
loopz fx5L<br />
sub cx,bp<br />
xor al,cl<br />
aam tunnel_pattern; VERY slow on dosbox, but ok<br />
add al,tunnel_base_color<br />
ret<br />
fx6: ; ocean night / to day sky<br />
sub dh,120<br />
js fx6q<br />
mov [bx+si],dx<br />
fild word [bx+si]<br />
fidivr dword [bx+si]<br />
fstp dword [bx+si-1]<br />
mov ax,[bx+si]<br />
add ax,bp<br />
and al,128<br />
dec ax<br />
fx6q:<br />
ret<br />
</syntaxhighlight><br />
<br />
==Bonus - NFO/ASCII==<br />
<br />
<syntaxhighlight lang="nasm"> <br />
art : hammerfist<br />
∂#MW%e _d$Ng,<br />
'B, ∂b _jM@$QZb,cQ" )@<br />
,edRB$b,l@ Wk,yGR$KM&$b, ,dP" Wl ]bsd%UR8BG6&$@DSyG#ZKM&$b,<br />
,dP "T%L 'MGF "*∂R_ Tg "*4Zk,#I YP W" 7P "*∂R<br />
4M gd@ ^ ∂@ d@b dQ$#@Z@R3L_ "*GMj 'W ,gd$ d@b 9Q$#%b<br />
W#, `M Wb `* _4P `Qk *#N8L `H5 @b 'QR7YK `* _4F" Qk<br />
`6@L dML '@ ,BK 'M ∂B *b, '#L<br />
^QBb,_ _,4&M∞∂@=,_ _dGL _gQKM GL @k 'Mg,_ _dG,<br />
"*BN5W$2#MNP" "*G3WRM8&B5P"`Y@QNW3Z5P" ∂#$W8BRM3XZN87 "*GW38M%EBDW5P"`<br />
<br />
<br />
p r e s e n t s<br />
<br />
4<br />
d@,<br />
_& `Wl<br />
_,aP "#baedM$#@@K JP*"?ML<br />
,ad@$#P" ,d@NEWVB" X,aQPYb,_<br />
V@Mm,_ ,d@MW#BW' EMP" '¶R ,ngBP^fML<br />
¶M@N@y Y#BNW#M" J9" `MQ9" "MgRBq ,QBMg,<br />
VN#P` ,d@@ `WM@^ 7f ¶F` 7kY" ^G _.eQNE1.<br />
]B _G@MWN$, `P ' 4b QP ¶w@F*^ ^Qb<br />
]O@NRM#W@MNB, ; ^` j JP^ Yl<br />
J#NRNWM@#BcT"^ ,A _J _q@ `X<br />
'¶WM#B@WdY`,7 _G#YN#PM _,gG" M,<br />
*BN#WP" dK ,Q@NRMB"]9 , _,M@Q* #A<br />
"U^ V@h, iNBW#NT J' J9 s@QN" _; 'D,<br />
¶RMBv&NMQR@9 .W .K' "9` ,6BA _JL ]l<br />
Y#NE@W#NRP #[ `¶8 _d@MW#B_jW#W BN<br />
"GQ@MR#W QL_ *B _,p#NBW#NQMG@WY 3Q<br />
"Y@F ,XW@M%im,_Yb_ _,g5@#MW@QMNE@E@NRMB ,WM<br />
` _,gP*"#REM#GB@N#MQbnd@N#M@MW#R8QSB^'WQERM@ ;4NB,<br />
,GYKL ¶E#B8R8QSB@M@#BM#W@MNB"`_ , "^` N ,dW@Ql<br />
_Q`'W`*t '¶@GS#MBQ#E@W#NQBW[ 'LvQ_ ,K _dNABGM#N<br />
,F ' `^WAB@QGE9*"9^*@L jP7FY, ¶h,_.jWM#BR#GBM,<br />
J; , _ ' ' "LL YxE#B8R8QSBNW@W;<br />
AP _,Ag6^ _ J ¶A `"Q#M@MW#R8E#P<br />
j@ `"XQW[ 'LvK,_ 'L_,/ @t Y#NE@WNR"<br />
:M/ 9^*@L jP7F" _PYKL _,A; ¶RSNQ"<br />
dKL ' ` ' "L "`'W`*t `"XQb `W^<br />
Q`8t 'L_,/ , _ ' 9^Q<br />
,W _PYKL _,Ag6^ ' W, _ ,#N&<br />
!N _ J "`'W`*t `"XQW[ _ J N!_JG9^RwQ' *t<br />
`W, 'LvK,_ ' _gGB8@L _ 'LvK,_ ,WgB' V 7L<br />
_.,gm&@B&wBZF" j@'` "WL _gML jZd7Yb lN" dBWl<br />
,g&QB*"^` `"*G@g, .gR&k,_ ,N" '@QF ¶k;gMF *QvQ jQ, ,@N@B#,<br />
.eQF*` `Yb@" "*6Qg,gF , 7 XMN" 'MNB, ^¶QWSER@N;<br />
,gP" qy, W' ^Q' &L ,g@W' `QMEL `"WBNWP<br />
g7 ¶9 ,X M? 9" _q8MSK ¶EMt *@K<br />
Vh _,m#L _AH le ,GBDNE9^A, *@F NMg<br />
¶L,qQ@ND _.m@Bl We ,gM@B8#Q' ¶h_ lWE,<br />
W9NHW@` JWM#B@] @e 4WR@NGF^ 'QL dRWl<br />
VMd* "@BE@PM 'N *UP" VW, JRSB;<br />
,@F j `¶WK W, ¶t XNt _A@E#N<br />
_JP ,6& "GLdM XD, _.g8NMA@k,_ _,gG#NMGR;<br />
"Z .JRER 'VMi _jNB#W&_ _,j@E@W#Nl ¶MBGMNQGNQMG@QBW9<br />
¶h ,G@NRMBl `" ,d#R@M$F ¶Mg,_.gp&@@NEWVBWBMG *QMN8R8SBN$E@WF<br />
Vb dW#R8QSRb, *YM@EQ,_ 'MENBW#NQMG#B@R@MW#l "BM@QNENRQG'<br />
*WGS#MBMNEYL `^"*8M@Q@NRM#W@BWSNW@QBF"` `^*@QBF^ [HFT]<br />
^M@MW#Q9 ^Wt `^¶RQ@W8NQGP*`<br />
¶Q#@P Vk lA `"^`<br />
Y" `MA J#,<br />
*R@, ,MQl<br />
Y#Wk, GWM8L<br />
W8RQSt,_ AQ@MR#,<br />
`@M@#SB@Mbm.,_ QNBW#NW<br />
¶QB8R8SBN$WNRM@#GNtwg@NMQR@B'<br />
*MBQ#8R8QS@NE@WNBW#NQMG@NR;<br />
`WGS#MBQ#R8QSB@NE@W#NQBW9<br />
*OMW@QMNE@E@NRMW@QMB@*<br />
`^"YQW@Q#SB#NE@EGP<br />
`^"*8R@GBQF`<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=555Game of Life 32b2020-05-04T10:27:00Z<p>HellMood: /* Switching to Textmode : 38 bytes */</p>
<hr />
<div>You can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules], but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
[[File:Golt38.png|thumb|game of life, text mode, 32b - 38b]]<br />
== Switching to Textmode : 38 bytes==<br />
Setting up screen mode and pixel access is requiring quite a bit of space, so in this version, it is removed. That is directly punished with an additional byte, because <code>DI</code> is no longer involved in the process, thus, an optimization had to be removed. The assumption is that the computer this runs on, is already in text mode (80x25 chars, colors). This also helps with the calculation, since now it takes place directly on the screen (only one segment has to be set up) and no content has to be generated initially, since there is always at least something on the screen that works as seed value. A small downside is the alignment of cells, because in textmode, one cell occupies TWO bytes (one for color information). Luckily, the color information is by default set to "gray on black". An additional <code>dec bx</code>, replacing <code>lodsb</code> with <code>lodsw</code> and changing <code>mov bl,3</code> to <code>mov bl,6</code> helps fixing the alignment issue. Additionally, the screen address changed (<code>push 0xb800</code> <code>pop ds</code>) Another lucky coincident is, that instead of blue pixels, we now have a "smiley char" with orthogonal borders, which is a decent representation of a living cell. Inbetween marking and correction it shortly changes to an exclamation mark (!), which is barely visible.<br />
<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Using instructions as segment adress : 36 bytes ==<br />
[[File:LDStrick.png|frame|center|possible instructions for LDS trick in textmode]]<br />
Instead of using <code>push</code> and <code>pop</code> to get the screen adress, there is also the instruction <code>lds</code> available, which reads the segment value from memory. A value "close" to <code>0xb800</code> would be sufficient, because the visible screen in textmode is just a tiny part of the 64 kilobytes addressable by one segment. The idea is now to reuse parts of the code as segment address, which is possible when the instructions is one of the above. If there is such an instruction, it can start at the 4th byte (<code>[si]</code> points to the start of the code and <code>lds bx,[si]</code> puts the first two bytes into BX and the 3rd and 4th into DS, reversed). In this case <code>lodsw</code> can be reused as the first (higher) byte of the segment. The 3rd byte would be only relevant for alignment, so instead of putting "0x00" there, a one-byte-instruction can be used there. The whole process saves two bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsw<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Synchronizing SI/DI, Improved cleanup : 34 bytes ==<br />
A lot of tiny changes were the result of just one idea: How to optimize the clean up step? After all it is not really neccessary to correct a marked cell as soon as possible, instead, it can be waited for a certain amount of time/steps. But any nontrivial version of <code>shr byte [si-160-6],5</code> still uses four bytes, unless it is brought into one of the "pure" forms that only take up THREE bytes: <code>shr byte[(bp/bx)+si/di],x</code>. Since SI and BX were already in use, and the usage of BP would implicate that the register SS is used instead of DS, the only remaining register possible is DI.<br/><br />
Now there are very short instructions available to advance the registers SI and DI, some of them at the same time, and one of them is <code>cmpsw</code>. Not only does it not "hurt" the intended computation (the "compare" part of the instruction can be ignored), it also advances both SI and DI by TWO, so that the alignment of the screen in text mode is perfectly matched.<br/><br />
The usage of <code>cmpsw</code> requires to remove <code>lodsw</code> since there is no simple command to advance SI in the opposite direction (without involving direction flags), so it had been changed again to <code>lodsb</code> to be one of the commands that also works as high byte of a segment adress, and an additional <code>dec si</code> to align DI and SI, so that the clean up step is always in the same distance "behind" the current calculation. The assumption DI = SI - 258 is true on almost every DOS system. As a byproduct, one of the memory access instruction can now be rewritten to use DI instead of SI (like in the original), to save one byte.<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[di+bx+94] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
== Combining exchange with alignment : 33 bytes ==<br />
When thinking about <code>xchg cx,ax</code> and how to skip one row to get rid of one of the double <code>dec bx</code>, my own production "M8trix" (2015) came to mind, where i did [https://www.pouet.net/prod.php?which=63126#c713378 pretty much the same as here], pulling the <code>xchg</code>into the loop and doing alternating counting, so that <code>cl</code> counts the acual cells, while <code>al</code> is never actually used (it "counts" the colors). To make that little dance work, <code>bl</code> has to start at 7.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,7 <br />
AccLoop: xchg cx,ax<br />
add al,[di+bx+94] ; Add in this column<br />
add al,[si+bx-4]<br />
add al,[si+bx+160-4]<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
==Modbyte tuning, jumping into modbytes, code path alignment : 32 bytes==<br />
[[File:Modbyte to 100h.png|frame|right|lds_modbyte_instructions]]<br />
Sometimes, an instruction has several degrees of "freedom". That means, that the effect of that instruction can also be achieved by an alternative version of that instruction. In this case, the <code>lds</code> instruction, which puts two bytes of the code into the segment DS, also loads two bytes into a register we (almost) don't care about. The only requirement is that <code>lds</code> points to the start of the code, which can either be done by [SI] or [BX+SI]. The right image shows which modbyte numbers would be satisfying (highlighted green). Now, this selection can be applied to the instruction table below (highlighted red). It becomes clear that the used instruction<code>and al,0x20</code> would, interpreted as modbyte, be <code>SP,[SI]</code> and thus it would be possible to jump into this modbyte to execute. <br/><br/><br />
To be more clear: <code>lds sp,[si]</code> is <code>0xc5 0x24</code>, <code>and al, 0x20</code> is <code>0x24 0x20</code>, so TWO <code>0x24</code>are merged into ONE. <br/><br />
<br/>To make this work, the "host" instruction has to be only executed once (it would not work in a loop). Also, the parameter of the injected instruction has to be put "behind" the "host" instruction (a single <code>db 32</code> in the code). Finally, it has to be made sure that this second code path aligns with the rest of the code, and does no damage to the intended effect (for example, critical registers could be modified, or worse, illegal instructions could be created that way). In this case the new codepath consists of <code>and [bp+di+0807h],dh</code> and <code>add al,0a7h</code>, after which it aligns normally. These instructions are executed only once and do not modify critical registers.<br/><br />
Sometimes, a bit of code shuffling has to be performed to make such a trick work. Here, the <code>lodsb</code> and <code>dec si</code> have been replaced with <code>mov al,[si]</code>. The critical function of being also a good segment value has been overtaken by <code>mov bl</code> (see table above).<br/><br />
[[File:Instruction selection.png|frame|center|instruction table for modbytes]]<br />
<syntaxhighlight lang="nasm"><br />
lds sp,[si]<br />
X: db 32<br />
mov bl,7 ; O: 3 iterations<br />
or [si],al ; O: Add in new cell<br />
cmpsw<br />
shr byte [di],5 ; O: Shift previous value <br />
C: xchg cx,ax<br />
add al,[di+bx+94] ; O: Add in this column<br />
add al,[si+bx-4]<br />
add al,[si+bx+156]<br />
dec bx ; O: Loop back<br />
jnz C<br />
mov al,[si] ; O: 3 = birth, 4 = stay (tricky): <br />
stc ; O: 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
rcr al,cl ; O: +---> 0.00x100?0 (rcr 4) <br />
jmp short X-1<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=554Game of Life 32b2020-04-30T21:54:38Z<p>HellMood: </p>
<hr />
<div>You can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules], but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
[[File:Golt38.png|thumb|game of life, text mode, 32b - 38b]]<br />
== Switching to Textmode : 38 bytes==<br />
Setting up screen mode and pixel access is requiring quite a bit of space, so in this version, it is removed. That is directly punished with an additional byte, because <code>DI</code> is no longer involved in the process, thus, an optimization had to be removed. The assumption is that the computer this runs on, is already in text mode (40x25 chars, colors). This also helps with the calculation, since now it takes place directly on the screen (only one segment has to be set up) and no content has to be generated initially, since there is always at least something on the screen that works as seed value. A small downside is the alignment of cells, because in textmode, one cell occupies TWO bytes (one for color information). Luckily, the color information is by default set to "gray on black". An additional <code>dec bx</code>, replacing <code>lodsb</code> with <code>lodsw</code> and changing <code>mov bl,3</code> to <code>mov bl,6</code> helps fixing the alignment issue. Additionally, the screen address changed (<code>push 0xb800</code> <code>pop ds</code>) Another lucky coincident is, that instead of blue pixels, we now have a "smiley char" with orthogonal borders, which is a decent representation of a living cell. Inbetween marking and correction it shortly changes to an exclamation mark (!), which is barely visible.<br />
<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Using instructions as segment adress : 36 bytes ==<br />
[[File:LDStrick.png|frame|center|possible instructions for LDS trick in textmode]]<br />
Instead of using <code>push</code> and <code>pop</code> to get the screen adress, there is also the instruction <code>lds</code> available, which reads the segment value from memory. A value "close" to <code>0xb800</code> would be sufficient, because the visible screen in textmode is just a tiny part of the 64 kilobytes addressable by one segment. The idea is now to reuse parts of the code as segment address, which is possible when the instructions is one of the above. If there is such an instruction, it can start at the 4th byte (<code>[si]</code> points to the start of the code and <code>lds bx,[si]</code> puts the first two bytes into BX and the 3rd and 4th into DS, reversed). In this case <code>lodsw</code> can be reused as the first (higher) byte of the segment. The 3rd byte would be only relevant for alignment, so instead of putting "0x00" there, a one-byte-instruction can be used there. The whole process saves two bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsw<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Synchronizing SI/DI, Improved cleanup : 34 bytes ==<br />
A lot of tiny changes were the result of just one idea: How to optimize the clean up step? After all it is not really neccessary to correct a marked cell as soon as possible, instead, it can be waited for a certain amount of time/steps. But any nontrivial version of <code>shr byte [si-160-6],5</code> still uses four bytes, unless it is brought into one of the "pure" forms that only take up THREE bytes: <code>shr byte[(bp/bx)+si/di],x</code>. Since SI and BX were already in use, and the usage of BP would implicate that the register SS is used instead of DS, the only remaining register possible is DI.<br/><br />
Now there are very short instructions available to advance the registers SI and DI, some of them at the same time, and one of them is <code>cmpsw</code>. Not only does it not "hurt" the intended computation (the "compare" part of the instruction can be ignored), it also advances both SI and DI by TWO, so that the alignment of the screen in text mode is perfectly matched.<br/><br />
The usage of <code>cmpsw</code> requires to remove <code>lodsw</code> since there is no simple command to advance SI in the opposite direction (without involving direction flags), so it had been changed again to <code>lodsb</code> to be one of the commands that also works as high byte of a segment adress, and an additional <code>dec si</code> to align DI and SI, so that the clean up step is always in the same distance "behind" the current calculation. The assumption DI = SI - 258 is true on almost every DOS system. As a byproduct, one of the memory access instruction can now be rewritten to use DI instead of SI (like in the original), to save one byte.<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[di+bx+94] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
== Combining exchange with alignment : 33 bytes ==<br />
When thinking about <code>xchg cx,ax</code> and how to skip one row to get rid of one of the double <code>dec bx</code>, my own production "M8trix" (2015) came to mind, where i did [https://www.pouet.net/prod.php?which=63126#c713378 pretty much the same as here], pulling the <code>xchg</code>into the loop and doing alternating counting, so that <code>cl</code> counts the acual cells, while <code>al</code> is never actually used (it "counts" the colors). To make that little dance work, <code>bl</code> has to start at 7.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,7 <br />
AccLoop: xchg cx,ax<br />
add al,[di+bx+94] ; Add in this column<br />
add al,[si+bx-4]<br />
add al,[si+bx+160-4]<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
==Modbyte tuning, jumping into modbytes, code path alignment : 32 bytes==<br />
[[File:Modbyte to 100h.png|frame|right|lds_modbyte_instructions]]<br />
Sometimes, an instruction has several degrees of "freedom". That means, that the effect of that instruction can also be achieved by an alternative version of that instruction. In this case, the <code>lds</code> instruction, which puts two bytes of the code into the segment DS, also loads two bytes into a register we (almost) don't care about. The only requirement is that <code>lds</code> points to the start of the code, which can either be done by [SI] or [BX+SI]. The right image shows which modbyte numbers would be satisfying (highlighted green). Now, this selection can be applied to the instruction table below (highlighted red). It becomes clear that the used instruction<code>and al,0x20</code> would, interpreted as modbyte, be <code>SP,[SI]</code> and thus it would be possible to jump into this modbyte to execute. <br/><br/><br />
To be more clear: <code>lds sp,[si]</code> is <code>0xc5 0x24</code>, <code>and al, 0x20</code> is <code>0x24 0x20</code>, so TWO <code>0x24</code>are merged into ONE. <br/><br />
<br/>To make this work, the "host" instruction has to be only executed once (it would not work in a loop). Also, the parameter of the injected instruction has to be put "behind" the "host" instruction (a single <code>db 32</code> in the code). Finally, it has to be made sure that this second code path aligns with the rest of the code, and does no damage to the intended effect (for example, critical registers could be modified, or worse, illegal instructions could be created that way). In this case the new codepath consists of <code>and [bp+di+0807h],dh</code> and <code>add al,0a7h</code>, after which it aligns normally. These instructions are executed only once and do not modify critical registers.<br/><br />
Sometimes, a bit of code shuffling has to be performed to make such a trick work. Here, the <code>lodsb</code> and <code>dec si</code> have been replaced with <code>mov al,[si]</code>. The critical function of being also a good segment value has been overtaken by <code>mov bl</code> (see table above).<br/><br />
[[File:Instruction selection.png|frame|center|instruction table for modbytes]]<br />
<syntaxhighlight lang="nasm"><br />
lds sp,[si]<br />
X: db 32<br />
mov bl,7 ; O: 3 iterations<br />
or [si],al ; O: Add in new cell<br />
cmpsw<br />
shr byte [di],5 ; O: Shift previous value <br />
C: xchg cx,ax<br />
add al,[di+bx+94] ; O: Add in this column<br />
add al,[si+bx-4]<br />
add al,[si+bx+156]<br />
dec bx ; O: Loop back<br />
jnz C<br />
mov al,[si] ; O: 3 = birth, 4 = stay (tricky): <br />
stc ; O: 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
rcr al,cl ; O: +---> 0.00x100?0 (rcr 4) <br />
jmp short X-1<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=553Game of Life 32b2020-04-30T18:52:25Z<p>HellMood: clearer explanation, thanks gargaj</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules], but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
[[File:Golt38.png|thumb|game of life, text mode, 32b - 38b]]<br />
== Switching to Textmode : 38 bytes==<br />
Setting up screen mode and pixel access is requiring quite a bit of space, so in this version, it is removed. That is directly punished with an additional byte, because <code>DI</code> is no longer involved in the process, thus, an optimization had to be removed. The assumption is that the computer this runs on, is already in text mode (40x25 chars, colors). This also helps with the calculation, since now it takes place directly on the screen (only one segment has to be set up) and no content has to be generated initially, since there is always at least something on the screen that works as seed value. A small downside is the alignment of cells, because in textmode, one cell occupies TWO bytes (one for color information). Luckily, the color information is by default set to "gray on black". An additional <code>dec bx</code>, replacing <code>lodsb</code> with <code>lodsw</code> and changing <code>mov bl,3</code> to <code>mov bl,6</code> helps fixing the alignment issue. Additionally, the screen address changed (<code>push 0xb800</code> <code>pop ds</code>) Another lucky coincident is, that instead of blue pixels, we now have a "smiley char" with orthogonal borders, which is a decent representation of a living cell. Inbetween marking and correction it shortly changes to an exclamation mark (!), which is barely visible.<br />
<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Using instructions as segment adress : 36 bytes ==<br />
[[File:LDStrick.png|frame|center|possible instructions for LDS trick in textmode]]<br />
Instead of using <code>push</code> and <code>pop</code> to get the screen adress, there is also the instruction <code>lds</code> available, which reads the segment value from memory. A value "close" to <code>0xb800</code> would be sufficient, because the visible screen in textmode is just a tiny part of the 64 kilobytes addressable by one segment. The idea is now to reuse parts of the code as segment address, which is possible when the instructions is one of the above. If there is such an instruction, it can start at the 4th byte (<code>[si]</code> points to the start of the code and <code>lds bx,[si]</code> puts the first two bytes into BX and the 3rd and 4th into DS, reversed). In this case <code>lodsw</code> can be reused as the first (higher) byte of the segment. The 3rd byte would be only relevant for alignment, so instead of putting "0x00" there, a one-byte-instruction can be used there. The whole process saves two bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsw<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Synchronizing SI/DI, Improved cleanup : 34 bytes ==<br />
A lot of tiny changes were the result of just one idea: How to optimize the clean up step? After all it is not really neccessary to correct a marked cell as soon as possible, instead, it can be waited for a certain amount of time/steps. But any nontrivial version of <code>shr byte [si-160-6],5</code> still uses four bytes, unless it is brought into one of the "pure" forms that only take up THREE bytes: <code>shr byte[(bp/bx)+si/di],x</code>. Since SI and BX were already in use, and the usage of BP would implicate that the register SS is used instead of DS, the only remaining register possible is DI.<br/><br />
Now there are very short instructions available to advance the registers SI and DI, some of them at the same time, and one of them is <code>cmpsw</code>. Not only does it not "hurt" the intended computation (the "compare" part of the instruction can be ignored), it also advances both SI and DI by TWO, so that the alignment of the screen in text mode is perfectly matched.<br/><br />
The usage of <code>cmpsw</code> requires to remove <code>lodsw</code> since there is no simple command to advance SI in the opposite direction (without involving direction flags), so it had been changed again to <code>lodsb</code> to be one of the commands that also works as high byte of a segment adress, and an additional <code>dec si</code> to align DI and SI, so that the clean up step is always in the same distance "behind" the current calculation. The assumption DI = SI - 258 is true on almost every DOS system. As a byproduct, one of the memory access instruction can now be rewritten to use DI instead of SI (like in the original), to save one byte.<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[di+bx+94] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
== Combining exchange with alignment : 33 bytes ==<br />
When thinking about <code>xchg cx,ax</code> and how to skip one row to get rid of one of the double <code>dec bx</code>, my own production "M8trix" (2015) came to mind, where i did [https://www.pouet.net/prod.php?which=63126#c713378 pretty much the same as here], pulling the <code>xchg</code>into the loop and doing alternating counting, so that <code>cl</code> counts the acual cells, while <code>al</code> is never actually used (it "counts" the colors). To make that little dance work, <code>bl</code> has to start at 7.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,7 <br />
AccLoop: xchg cx,ax<br />
add al,[di+bx+94] ; Add in this column<br />
add al,[si+bx-4]<br />
add al,[si+bx+160-4]<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
==Modbyte tuning, jumping into modbytes, code path alignment : 32 bytes==<br />
[[File:Modbyte to 100h.png|frame|right|lds_modbyte_instructions]]<br />
Sometimes, an instruction has several degrees of "freedom". That means, that the effect of that instruction can also be achieved by an alternative version of that instruction. In this case, the <code>lds</code> instruction, which puts two bytes of the code into the segment DS, also loads two bytes into a register we (almost) don't care about. The only requirement is that <code>lds</code> points to the start of the code, which can either be done by [SI] or [BX+SI]. The right image shows which modbyte numbers would be satisfying (highlighted green). Now, this selection can be applied to the instruction table below (highlighted red). It becomes clear that the used instruction<code>and al,0x20</code> would, interpreted as modbyte, be <code>SP,[SI]</code> and thus it would be possible to jump into this modbyte to execute. <br/><br/><br />
To be more clear: <code>lds sp,[si]</code> is <code>0xc5 0x24</code>, <code>and al, 0x20</code> is <code>0x24 0x20</code>, so TWO <code>0x24</code>are merged into ONE. <br/><br />
<br/>To make this work, the "host" instruction has to be only executed once (it would not work in a loop). Also, the parameter of the injected instruction has to be put "behind" the "host" instruction (a single <code>db 32</code> in the code). Finally, it has to be made sure that this second code path aligns with the rest of the code, and does no damage to the intended effect (for example, critical registers could be modified, or worse, illegal instructions could be created that way). In this case the new codepath consists of <code>and [bp+di+0807h],dh</code> and <code>add al,0a7h</code>, after which it aligns normally. These instructions are executed only once and do not modify critical registers.<br/><br />
Sometimes, a bit of code shuffling has to be performed to make such a trick work. Here, the <code>lodsb</code> and <code>dec si</code> have been replaced with <code>mov al,[si]</code>. The critical function of being also a good segment value has been overtaken by <code>mov bl</code> (see table above).<br/><br />
[[File:Instruction selection.png|frame|center|instruction table for modbytes]]<br />
<syntaxhighlight lang="nasm"><br />
lds sp,[si]<br />
X: db 32<br />
mov bl,7 ; O: 3 iterations<br />
or [si],al ; O: Add in new cell<br />
cmpsw<br />
shr byte [di],5 ; O: Shift previous value <br />
C: xchg cx,ax<br />
add al,[di+bx+94] ; O: Add in this column<br />
add al,[si+bx-4]<br />
add al,[si+bx+156]<br />
dec bx ; O: Loop back<br />
jnz C<br />
mov al,[si] ; O: 3 = birth, 4 = stay (tricky): <br />
stc ; O: 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
rcr al,cl ; O: +---> 0.00x100?0 (rcr 4) <br />
jmp short X-1<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=552Game of Life 32b2020-04-30T18:32:32Z<p>HellMood: /* Modbyte tuning, jumping into modbytes, code path alignment : 32b */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules], but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
[[File:Golt38.png|thumb|game of life, text mode, 32b - 38b]]<br />
== Switching to Textmode : 38 bytes==<br />
Setting up screen mode and pixel access is requiring quite a bit of space, so in this version, it is removed. That is directly punished with an additional byte, because <code>DI</code> is no longer involved in the process, thus, an optimization had to be removed. The assumption is that the computer this runs on, is already in text mode (40x25 chars, colors). This also helps with the calculation, since now it takes place directly on the screen (only one segment has to be set up) and no content has to be generated initially, since there is always at least something on the screen that works as seed value. A small downside is the alignment of cells, because in textmode, one cell occupies TWO bytes (one for color information). Luckily, the color information is by default set to "gray on black". An additional <code>dec bx</code>, replacing <code>lodsb</code> with <code>lodsw</code> and changing <code>mov bl,3</code> to <code>mov bl,6</code> helps fixing the alignment issue. Additionally, the screen address changed (<code>push 0xb800</code> <code>pop ds</code>) Another lucky coincident is, that instead of blue pixels, we now have a "smiley char" with orthogonal borders, which is a decent representation of a living cell. Inbetween marking and correction it shortly changes to an exclamation mark (!), which is barely visible.<br />
<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Using instructions as segment adress : 36 bytes ==<br />
[[File:LDStrick.png|frame|center|possible instructions for LDS trick in textmode]]<br />
Instead of using <code>push</code> and <code>pop</code> to get the screen adress, there is also the instruction <code>lds</code> available, which reads the segment value from memory. A value "close" to <code>0xb800</code> would be sufficient, because the visible screen in textmode is just a tiny part of the 64 kilobytes addressable by one segment. The idea is now to reuse parts of the code as segment address, which is possible when the instructions is one of the above. If there is such an instruction, it can start at the 4th byte (<code>[si]</code> points to the start of the code and <code>lds bx,[si]</code> puts the first two bytes into BX and the 3rd and 4th into DS, reversed). In this case <code>lodsw</code> can be reused as the first (higher) byte of the segment. The 3rd byte would be only relevant for alignment, so instead of putting "0x00" there, a one-byte-instruction can be used there. The whole process saves two bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsw<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Synchronizing SI/DI, Improved cleanup : 34 bytes ==<br />
A lot of tiny changes were the result of just one idea: How to optimize the clean up step? After all it is not really neccessary to correct a marked cell as soon as possible, instead, it can be waited for a certain amount of time/steps. But any nontrivial version of <code>shr byte [si-160-6],5</code> still uses four bytes, unless it is brought into one of the "pure" forms that only take up THREE bytes: <code>shr byte[(bp/bx)+si/di],x</code>. Since SI and BX were already in use, and the usage of BP would implicate that the register SS is used instead of DS, the only remaining register possible is DI.<br/><br />
Now there are very short instructions available to advance the registers SI and DI, some of them at the same time, and one of them is <code>cmpsw</code>. Not only does it not "hurt" the intended computation (the "compare" part of the instruction can be ignored), it also advances both SI and DI by TWO, so that the alignment of the screen in text mode is perfectly matched.<br/><br />
The usage of <code>cmpsw</code> requires to remove <code>lodsw</code> since there is no simple command to advance SI in the opposite direction (without involving direction flags), so it had been changed again to <code>lodsb</code> to be one of the commands that also works as high byte of a segment adress, and an additional <code>dec si</code> to align DI and SI, so that the clean up step is always in the same distance "behind" the current calculation. The assumption DI = SI - 258 is true on almost every DOS system. As a byproduct, one of the memory access instruction can now be rewritten to use DI instead of SI (like in the original), to save one byte.<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[di+bx+94] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
== Combining exchange with alignment : 33 bytes ==<br />
When thinking about <code>xchg cx,ax</code> and how to skip one row to get rid of one of the double <code>dec bx</code>, my own production "M8trix" (2015) came to mind, where i did [https://www.pouet.net/prod.php?which=63126#c713378 pretty much the same as here], pulling the <code>xchg</code>into the loop and doing alternating counting, so that <code>cl</code> counts the acual cells, while <code>al</code> is never actually used (it "counts" the colors). To make that little dance work, <code>bl</code> has to start at 7.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,7 <br />
AccLoop: xchg cx,ax<br />
add al,[di+bx+94] ; Add in this column<br />
add al,[si+bx-4]<br />
add al,[si+bx+160-4]<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
==Modbyte tuning, jumping into modbytes, code path alignment : 32 bytes==<br />
[[File:Modbyte to 100h.png|frame|right|lds_modbyte_instructions]]<br />
Sometimes, an instruction has several degrees of "freedom". That means, that the effect of that instruction can also be achieved by an alternative version of that instruction. In this case, the <code>lds</code> instruction, which puts two bytes of the code into the segment DS, also loads two bytes into a register we (almost) don't care about. The only requirement is that <code>lds</code> points to the start of the code, which can either be done by [SI] or [BX+SI]. The right image shows which modbyte numbers would be satisfying (highlighted green). Now, this selection can be applied to the instruction table below (highlighted red). It becomes clear that the used instruction<code>and al,0x20</code> would, interpreted as modbyte, be <code>SP,[SI]</code> and thus it would be possible to jump into this modbyte to execute. <br />
<br/>To make this work, the "host" instruction has to be only executed once (it would not work in a loop). Also, the parameter of the injected instruction has to be put "behind" the "host" instruction (a single <code>db 32</code> in the code). Finally, it has to be made sure that this second code path aligns with the rest of the code, and does no damage to the intended effect (for example, critical registers could be modified, or worse, illegal instructions could be created that way). In this case the new codepath consists of <code>and [bp+di+0807h],dh</code> and <code>add al,0a7h</code>, after which it aligns normally. These instructions are executed only once and do not modify critical registers.<br/><br />
Sometimes, a bit of code shuffling has to be performed to make such a trick work. Here, the <code>lodsb</code> and <code>dec si</code> have been replaced with <code>mov al,[si]</code>. The critical function of being also a good segment value has been overtaken by <code>mov bl</code> (see table above).<br/><br />
[[File:Instruction selection.png|frame|center|instruction table for modbytes]]<br />
<syntaxhighlight lang="nasm"><br />
lds sp,[si]<br />
X: db 32<br />
mov bl,7 ; O: 3 iterations<br />
or [si],al ; O: Add in new cell<br />
cmpsw<br />
shr byte [di],5 ; O: Shift previous value <br />
C: xchg cx,ax<br />
add al,[di+bx+94] ; O: Add in this column<br />
add al,[si+bx-4]<br />
add al,[si+bx+156]<br />
dec bx ; O: Loop back<br />
jnz C<br />
mov al,[si] ; O: 3 = birth, 4 = stay (tricky): <br />
stc ; O: 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
rcr al,cl ; O: +---> 0.00x100?0 (rcr 4) <br />
jmp short X-1<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=551Game of Life 32b2020-04-30T18:05:02Z<p>HellMood: /* Original version : 65 bytes */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules], but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
[[File:Golt38.png|thumb|game of life, text mode, 32b - 38b]]<br />
== Switching to Textmode : 38 bytes==<br />
Setting up screen mode and pixel access is requiring quite a bit of space, so in this version, it is removed. That is directly punished with an additional byte, because <code>DI</code> is no longer involved in the process, thus, an optimization had to be removed. The assumption is that the computer this runs on, is already in text mode (40x25 chars, colors). This also helps with the calculation, since now it takes place directly on the screen (only one segment has to be set up) and no content has to be generated initially, since there is always at least something on the screen that works as seed value. A small downside is the alignment of cells, because in textmode, one cell occupies TWO bytes (one for color information). Luckily, the color information is by default set to "gray on black". An additional <code>dec bx</code>, replacing <code>lodsb</code> with <code>lodsw</code> and changing <code>mov bl,3</code> to <code>mov bl,6</code> helps fixing the alignment issue. Additionally, the screen address changed (<code>push 0xb800</code> <code>pop ds</code>) Another lucky coincident is, that instead of blue pixels, we now have a "smiley char" with orthogonal borders, which is a decent representation of a living cell. Inbetween marking and correction it shortly changes to an exclamation mark (!), which is barely visible.<br />
<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Using instructions as segment adress : 36 bytes ==<br />
[[File:LDStrick.png|frame|center|possible instructions for LDS trick in textmode]]<br />
Instead of using <code>push</code> and <code>pop</code> to get the screen adress, there is also the instruction <code>lds</code> available, which reads the segment value from memory. A value "close" to <code>0xb800</code> would be sufficient, because the visible screen in textmode is just a tiny part of the 64 kilobytes addressable by one segment. The idea is now to reuse parts of the code as segment address, which is possible when the instructions is one of the above. If there is such an instruction, it can start at the 4th byte (<code>[si]</code> points to the start of the code and <code>lds bx,[si]</code> puts the first two bytes into BX and the 3rd and 4th into DS, reversed). In this case <code>lodsw</code> can be reused as the first (higher) byte of the segment. The 3rd byte would be only relevant for alignment, so instead of putting "0x00" there, a one-byte-instruction can be used there. The whole process saves two bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsw<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Synchronizing SI/DI, Improved cleanup : 34 bytes ==<br />
A lot of tiny changes were the result of just one idea: How to optimize the clean up step? After all it is not really neccessary to correct a marked cell as soon as possible, instead, it can be waited for a certain amount of time/steps. But any nontrivial version of <code>shr byte [si-160-6],5</code> still uses four bytes, unless it is brought into one of the "pure" forms that only take up THREE bytes: <code>shr byte[(bp/bx)+si/di],x</code>. Since SI and BX were already in use, and the usage of BP would implicate that the register SS is used instead of DS, the only remaining register possible is DI.<br/><br />
Now there are very short instructions available to advance the registers SI and DI, some of them at the same time, and one of them is <code>cmpsw</code>. Not only does it not "hurt" the intended computation (the "compare" part of the instruction can be ignored), it also advances both SI and DI by TWO, so that the alignment of the screen in text mode is perfectly matched.<br/><br />
The usage of <code>cmpsw</code> requires to remove <code>lodsw</code> since there is no simple command to advance SI in the opposite direction (without involving direction flags), so it had been changed again to <code>lodsb</code> to be one of the commands that also works as high byte of a segment adress, and an additional <code>dec si</code> to align DI and SI, so that the clean up step is always in the same distance "behind" the current calculation. The assumption DI = SI - 258 is true on almost every DOS system. As a byproduct, one of the memory access instruction can now be rewritten to use DI instead of SI (like in the original), to save one byte.<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[di+bx+94] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
== Combining exchange with alignment : 33 bytes ==<br />
When thinking about <code>xchg cx,ax</code> and how to skip one row to get rid of one of the double <code>dec bx</code>, my own production "M8trix" (2015) came to mind, where i did [https://www.pouet.net/prod.php?which=63126#c713378 pretty much the same as here], pulling the <code>xchg</code>into the loop and doing alternating counting, so that <code>cl</code> counts the acual cells, while <code>al</code> is never actually used (it "counts" the colors). To make that little dance work, <code>bl</code> has to start at 7.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,7 <br />
AccLoop: xchg cx,ax<br />
add al,[di+bx+94] ; Add in this column<br />
add al,[si+bx-4]<br />
add al,[si+bx+160-4]<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
==Modbyte tuning, jumping into modbytes, code path alignment : 32b==<br />
[[File:Modbyte to 100h.png|frame|right|lds_modbyte_instructions]]<br />
Sometimes, an instruction has several degrees of "freedom". That means, that the effect of that instruction can also be achieved by an alternative version of that instruction. In this case, the <code>lds</code> instruction, which puts two bytes of the code into the segment DS, also loads two bytes into a register we (almost) don't care about. The only requirement is that <code>lds</code> points to the start of the code, which can either be done by [SI] or [BX+SI]. The right image shows which modbyte numbers would be satisfying (highlighted green). Now, this selection can be applied to the instruction table below (highlighted red). It becomes clear that the used instruction<code>and al,0x20</code> would, interpreted as modbyte, be <code>SP,[SI]</code> and thus it would be possible to jump into this modbyte to execute. <br />
<br/>To make this work, the "host" instruction has to be only executed once (it would not work in a loop). Also, the parameter of the injected instruction has to be put "behind" the "host" instruction (a single <code>db 32</code> in the code). Finally, it has to be made sure that this second code path aligns with the rest of the code, and does no damage to the intended effect (for example, critical registers could be modified, or worse, illegal instructions could be created that way). In this case the new codepath consists of <code>and [bp+di+0807h],dh</code> and <code>add al,0a7h</code>, after which it aligns normally. These instructions are executed only once and do not modify critical registers.<br/><br />
Sometimes, a bit of code shuffling has to be performed to make such a trick work. Here, the <code>lodsb</code> and <code>dec si</code> have been replaced with <code>mov al,[si]</code>. The critical function of being also a good segment value has been overtaken by <code>mov bl</code> (see table above).<br/><br />
[[File:Instruction selection.png|frame|center|instruction table for modbytes]]<br />
<syntaxhighlight lang="nasm"><br />
lds sp,[si]<br />
X: db 32<br />
mov bl,7 ; O: 3 iterations<br />
or [si],al ; O: Add in new cell<br />
cmpsw<br />
shr byte [di],5 ; O: Shift previous value <br />
C: xchg cx,ax<br />
add al,[di+bx+94] ; O: Add in this column<br />
add al,[si+bx-4]<br />
add al,[si+bx+156]<br />
dec bx ; O: Loop back<br />
jnz C<br />
mov al,[si] ; O: 3 = birth, 4 = stay (tricky): <br />
stc ; O: 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
rcr al,cl ; O: +---> 0.00x100?0 (rcr 4) <br />
jmp short X-1<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=550Game of Life 32b2020-04-30T17:57:28Z<p>HellMood: /* Modbyte tuning, jumping into modbytes, code path alignment : 32b */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
[[File:Golt38.png|thumb|game of life, text mode, 32b - 38b]]<br />
== Switching to Textmode : 38 bytes==<br />
Setting up screen mode and pixel access is requiring quite a bit of space, so in this version, it is removed. That is directly punished with an additional byte, because <code>DI</code> is no longer involved in the process, thus, an optimization had to be removed. The assumption is that the computer this runs on, is already in text mode (40x25 chars, colors). This also helps with the calculation, since now it takes place directly on the screen (only one segment has to be set up) and no content has to be generated initially, since there is always at least something on the screen that works as seed value. A small downside is the alignment of cells, because in textmode, one cell occupies TWO bytes (one for color information). Luckily, the color information is by default set to "gray on black". An additional <code>dec bx</code>, replacing <code>lodsb</code> with <code>lodsw</code> and changing <code>mov bl,3</code> to <code>mov bl,6</code> helps fixing the alignment issue. Additionally, the screen address changed (<code>push 0xb800</code> <code>pop ds</code>) Another lucky coincident is, that instead of blue pixels, we now have a "smiley char" with orthogonal borders, which is a decent representation of a living cell. Inbetween marking and correction it shortly changes to an exclamation mark (!), which is barely visible.<br />
<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Using instructions as segment adress : 36 bytes ==<br />
[[File:LDStrick.png|frame|center|possible instructions for LDS trick in textmode]]<br />
Instead of using <code>push</code> and <code>pop</code> to get the screen adress, there is also the instruction <code>lds</code> available, which reads the segment value from memory. A value "close" to <code>0xb800</code> would be sufficient, because the visible screen in textmode is just a tiny part of the 64 kilobytes addressable by one segment. The idea is now to reuse parts of the code as segment address, which is possible when the instructions is one of the above. If there is such an instruction, it can start at the 4th byte (<code>[si]</code> points to the start of the code and <code>lds bx,[si]</code> puts the first two bytes into BX and the 3rd and 4th into DS, reversed). In this case <code>lodsw</code> can be reused as the first (higher) byte of the segment. The 3rd byte would be only relevant for alignment, so instead of putting "0x00" there, a one-byte-instruction can be used there. The whole process saves two bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsw<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Synchronizing SI/DI, Improved cleanup : 34 bytes ==<br />
A lot of tiny changes were the result of just one idea: How to optimize the clean up step? After all it is not really neccessary to correct a marked cell as soon as possible, instead, it can be waited for a certain amount of time/steps. But any nontrivial version of <code>shr byte [si-160-6],5</code> still uses four bytes, unless it is brought into one of the "pure" forms that only take up THREE bytes: <code>shr byte[(bp/bx)+si/di],x</code>. Since SI and BX were already in use, and the usage of BP would implicate that the register SS is used instead of DS, the only remaining register possible is DI.<br/><br />
Now there are very short instructions available to advance the registers SI and DI, some of them at the same time, and one of them is <code>cmpsw</code>. Not only does it not "hurt" the intended computation (the "compare" part of the instruction can be ignored), it also advances both SI and DI by TWO, so that the alignment of the screen in text mode is perfectly matched.<br/><br />
The usage of <code>cmpsw</code> requires to remove <code>lodsw</code> since there is no simple command to advance SI in the opposite direction (without involving direction flags), so it had been changed again to <code>lodsb</code> to be one of the commands that also works as high byte of a segment adress, and an additional <code>dec si</code> to align DI and SI, so that the clean up step is always in the same distance "behind" the current calculation. The assumption DI = SI - 258 is true on almost every DOS system. As a byproduct, one of the memory access instruction can now be rewritten to use DI instead of SI (like in the original), to save one byte.<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[di+bx+94] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
== Combining exchange with alignment : 33 bytes ==<br />
When thinking about <code>xchg cx,ax</code> and how to skip one row to get rid of one of the double <code>dec bx</code>, my own production "M8trix" (2015) came to mind, where i did [https://www.pouet.net/prod.php?which=63126#c713378 pretty much the same as here], pulling the <code>xchg</code>into the loop and doing alternating counting, so that <code>cl</code> counts the acual cells, while <code>al</code> is never actually used (it "counts" the colors). To make that little dance work, <code>bl</code> has to start at 7.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,7 <br />
AccLoop: xchg cx,ax<br />
add al,[di+bx+94] ; Add in this column<br />
add al,[si+bx-4]<br />
add al,[si+bx+160-4]<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
==Modbyte tuning, jumping into modbytes, code path alignment : 32b==<br />
[[File:Modbyte to 100h.png|frame|right|lds_modbyte_instructions]]<br />
Sometimes, an instruction has several degrees of "freedom". That means, that the effect of that instruction can also be achieved by an alternative version of that instruction. In this case, the <code>lds</code> instruction, which puts two bytes of the code into the segment DS, also loads two bytes into a register we (almost) don't care about. The only requirement is that <code>lds</code> points to the start of the code, which can either be done by [SI] or [BX+SI]. The right image shows which modbyte numbers would be satisfying (highlighted green). Now, this selection can be applied to the instruction table below (highlighted red). It becomes clear that the used instruction<code>and al,0x20</code> would, interpreted as modbyte, be <code>SP,[SI]</code> and thus it would be possible to jump into this modbyte to execute. <br />
<br/>To make this work, the "host" instruction has to be only executed once (it would not work in a loop). Also, the parameter of the injected instruction has to be put "behind" the "host" instruction (a single <code>db 32</code> in the code). Finally, it has to be made sure that this second code path aligns with the rest of the code, and does no damage to the intended effect (for example, critical registers could be modified, or worse, illegal instructions could be created that way). In this case the new codepath consists of <code>and [bp+di+0807h],dh</code> and <code>add al,0a7h</code>, after which it aligns normally. These instructions are executed only once and do not modify critical registers.<br/><br />
Sometimes, a bit of code shuffling has to be performed to make such a trick work. Here, the <code>lodsb</code> and <code>dec si</code> have been replaced with <code>mov al,[si]</code>. The critical function of being also a good segment value has been overtaken by <code>mov bl</code> (see table above).<br/><br />
[[File:Instruction selection.png|frame|center|instruction table for modbytes]]<br />
<syntaxhighlight lang="nasm"><br />
lds sp,[si]<br />
X: db 32<br />
mov bl,7 ; O: 3 iterations<br />
or [si],al ; O: Add in new cell<br />
cmpsw<br />
shr byte [di],5 ; O: Shift previous value <br />
C: xchg cx,ax<br />
add al,[di+bx+94] ; O: Add in this column<br />
add al,[si+bx-4]<br />
add al,[si+bx+156]<br />
dec bx ; O: Loop back<br />
jnz C<br />
mov al,[si] ; O: 3 = birth, 4 = stay (tricky): <br />
stc ; O: 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
rcr al,cl ; O: +---> 0.00x100?0 (rcr 4) <br />
jmp short X-1<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=File:Instruction_selection.png&diff=549File:Instruction selection.png2020-04-30T17:40:24Z<p>HellMood: </p>
<hr />
<div>Selecting an instructions by modbyte restrictions</div>HellMoodhttp://www.sizecoding.org/index.php?title=File:Modbyte_to_100h.png&diff=548File:Modbyte to 100h.png2020-04-30T17:37:07Z<p>HellMood: </p>
<hr />
<div>Modbyte combinations pointing to "100h", with selected choice</div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=547Game of Life 32b2020-04-30T17:14:40Z<p>HellMood: </p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
[[File:Golt38.png|thumb|game of life, text mode, 32b - 38b]]<br />
== Switching to Textmode : 38 bytes==<br />
Setting up screen mode and pixel access is requiring quite a bit of space, so in this version, it is removed. That is directly punished with an additional byte, because <code>DI</code> is no longer involved in the process, thus, an optimization had to be removed. The assumption is that the computer this runs on, is already in text mode (40x25 chars, colors). This also helps with the calculation, since now it takes place directly on the screen (only one segment has to be set up) and no content has to be generated initially, since there is always at least something on the screen that works as seed value. A small downside is the alignment of cells, because in textmode, one cell occupies TWO bytes (one for color information). Luckily, the color information is by default set to "gray on black". An additional <code>dec bx</code>, replacing <code>lodsb</code> with <code>lodsw</code> and changing <code>mov bl,3</code> to <code>mov bl,6</code> helps fixing the alignment issue. Additionally, the screen address changed (<code>push 0xb800</code> <code>pop ds</code>) Another lucky coincident is, that instead of blue pixels, we now have a "smiley char" with orthogonal borders, which is a decent representation of a living cell. Inbetween marking and correction it shortly changes to an exclamation mark (!), which is barely visible.<br />
<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Using instructions as segment adress : 36 bytes ==<br />
[[File:LDStrick.png|frame|center|possible instructions for LDS trick in textmode]]<br />
Instead of using <code>push</code> and <code>pop</code> to get the screen adress, there is also the instruction <code>lds</code> available, which reads the segment value from memory. A value "close" to <code>0xb800</code> would be sufficient, because the visible screen in textmode is just a tiny part of the 64 kilobytes addressable by one segment. The idea is now to reuse parts of the code as segment address, which is possible when the instructions is one of the above. If there is such an instruction, it can start at the 4th byte (<code>[si]</code> points to the start of the code and <code>lds bx,[si]</code> puts the first two bytes into BX and the 3rd and 4th into DS, reversed). In this case <code>lodsw</code> can be reused as the first (higher) byte of the segment. The 3rd byte would be only relevant for alignment, so instead of putting "0x00" there, a one-byte-instruction can be used there. The whole process saves two bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsw<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Synchronizing SI/DI, Improved cleanup : 34 bytes ==<br />
A lot of tiny changes were the result of just one idea: How to optimize the clean up step? After all it is not really neccessary to correct a marked cell as soon as possible, instead, it can be waited for a certain amount of time/steps. But any nontrivial version of <code>shr byte [si-160-6],5</code> still uses four bytes, unless it is brought into one of the "pure" forms that only take up THREE bytes: <code>shr byte[(bp/bx)+si/di],x</code>. Since SI and BX were already in use, and the usage of BP would implicate that the register SS is used instead of DS, the only remaining register possible is DI.<br/><br />
Now there are very short instructions available to advance the registers SI and DI, some of them at the same time, and one of them is <code>cmpsw</code>. Not only does it not "hurt" the intended computation (the "compare" part of the instruction can be ignored), it also advances both SI and DI by TWO, so that the alignment of the screen in text mode is perfectly matched.<br/><br />
The usage of <code>cmpsw</code> requires to remove <code>lodsw</code> since there is no simple command to advance SI in the opposite direction (without involving direction flags), so it had been changed again to <code>lodsb</code> to be one of the commands that also works as high byte of a segment adress, and an additional <code>dec si</code> to align DI and SI, so that the clean up step is always in the same distance "behind" the current calculation. The assumption DI = SI - 258 is true on almost every DOS system. As a byproduct, one of the memory access instruction can now be rewritten to use DI instead of SI (like in the original), to save one byte.<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[di+bx+94] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
== Combining exchange with alignment : 33 bytes ==<br />
When thinking about <code>xchg cx,ax</code> and how to skip one row to get rid of one of the double <code>dec bx</code>, my own production "M8trix" (2015) came to mind, where i did [https://www.pouet.net/prod.php?which=63126#c713378 pretty much the same as here], pulling the <code>xchg</code>into the loop and doing alternating counting, so that <code>cl</code> counts the acual cells, while <code>al</code> is never actually used (it "counts" the colors). To make that little dance work, <code>bl</code> has to start at 7.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,7 <br />
AccLoop: xchg cx,ax<br />
add al,[di+bx+94] ; Add in this column<br />
add al,[si+bx-4]<br />
add al,[si+bx+160-4]<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
==Modbyte tuning, jumping into modbytes, code path alignment : 32b==<br />
<syntaxhighlight lang="nasm"><br />
lds sp,[si]<br />
X: db 32<br />
mov bl,7 ; O: 3 iterations<br />
or [si],al ; O: Add in new cell<br />
cmpsw<br />
shr byte [di],5 ; O: Shift previous value <br />
C: xchg cx,ax<br />
add al,[di+bx+94] ; O: Add in this column<br />
add al,[si+bx-4]<br />
add al,[si+bx+156]<br />
dec bx ; O: Loop back<br />
jnz C<br />
mov al,[si] ; O: 3 = birth, 4 = stay (tricky): <br />
stc ; O: 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
rcr al,cl ; O: +---> 0.00x100?0 (rcr 4) <br />
jmp short X-1<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=546Game of Life 32b2020-04-30T17:11:28Z<p>HellMood: </p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
[[File:Golt38.png|thumb|game of life, text mode, 32b - 38b]]<br />
== Switching to Textmode : 38 bytes==<br />
Setting up screen mode and pixel access is requiring quite a bit of space, so in this version, it is removed. That is directly punished with an additional byte, because <code>DI</code> is no longer involved in the process, thus, an optimization had to be removed. The assumption is that the computer this runs on, is already in text mode (40x25 chars, colors). This also helps with the calculation, since now it takes place directly on the screen (only one segment has to be set up) and no content has to be generated initially, since there is always at least something on the screen that works as seed value. A small downside is the alignment of cells, because in textmode, one cell occupies TWO bytes (one for color information). Luckily, the color information is by default set to "gray on black". An additional <code>dec bx</code>, replacing <code>lodsb</code> with <code>lodsw</code> and changing <code>mov bl,3</code> to <code>mov bl,6</code> helps fixing the alignment issue. Additionally, the screen address changed (<code>push 0xb800</code> <code>pop ds</code>) Another lucky coincident is, that instead of blue pixels, we now have a "smiley char" with orthogonal borders, which is a decent representation of a living cell. Inbetween marking and correction it shortly changes to an exclamation mark (!), which is barely visible.<br />
<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Using instructions as segment adress : 36 bytes ==<br />
[[File:LDStrick.png|frame|center|possible instructions for LDS trick in textmode]]<br />
Instead of using <code>push</code> and <code>pop</code> to get the screen adress, there is also the instruction <code>lds</code> available, which reads the segment value from memory. A value "close" to <code>0xb800</code> would be sufficient, because the visible screen in textmode is just a tiny part of the 64 kilobytes addressable by one segment. The idea is now to reuse parts of the code as segment address, which is possible when the instructions is one of the above. If there is such an instruction, it can start at the 4th byte (<code>[si]</code> points to the start of the code and <code>lds bx,[si]</code> puts the first two bytes into BX and the 3rd and 4th into DS, reversed). In this case <code>lodsw</code> can be reused as the first (higher) byte of the segment. The 3rd byte would be only relevant for alignment, so instead of putting "0x00" there, a one-byte-instruction can be used there. The whole process saves two bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsw<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Synchronizing SI/DI, Improved cleanup : 34 bytes ==<br />
A lot of tiny changes were the result of just one idea: How to optimize the clean up step? After all it is not really neccessary to correct a marked cell as soon as possible, instead, it can be waited for a certain amount of time/steps. But any nontrivial version of <code>shr byte [si-160-6],5</code> still uses four bytes, unless it is brought into one of the "pure" forms that only take up THREE bytes: <code>shr byte[(bp/bx)+si/di],x</code>. Since SI and BX were already in use, and the usage of BP would implicate that the register SS is used instead of DS, the only remaining register possible is DI.<br/><br />
Now there are very short instructions available to advance the registers SI and DI, some of them at the same time, and one of them is <code>cmpsw</code>. Not only does it not "hurt" the intended computation (the "compare" part of the instruction can be ignored), it also advances both SI and DI by TWO, so that the alignment of the screen in text mode is perfectly matched.<br/><br />
The usage of <code>cmpsw</code> requires to remove <code>lodsw</code> since there is no simple command to advance SI in the opposite direction (without involving direction flags), so it had been changed again to <code>lodsb</code> to be one of the commands that also works as high byte of a segment adress, and an additional <code>dec si</code> to align DI and SI, so that the clean up step is always in the same distance "behind" the current calculation. The assumption DI = SI - 258 is true on almost every DOS system. As a byproduct, one of the memory access instruction can now be rewritten to use DI instead of SI (like in the original), to save one byte.<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[di+bx+94] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
== Combining exchange with alignment : 33 bytes ==<br />
When thinking about <code>xchg cx,ax</code> and how to skip one row to get rid of one of the double <code>dec bx</code>, my own production "M8trix" (2015) came to mind, where i did [https://www.pouet.net/prod.php?which=63126#c713378 pretty much the same as here], pulling the <code>xchg</code>into the loop and doing alternating counting, so that <code>cl</code> counts the acual cells, while <code>al</code> is never actually used (it "counts" the colors). To make that little dance work, <code>bl</code> has to start at 7.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,7 <br />
AccLoop: xchg cx,ax<br />
add al,[di+bx+94] ; Add in this column<br />
add al,[si+bx-4]<br />
add al,[si+bx+160-4]<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=545Game of Life 32b2020-04-30T16:56:41Z<p>HellMood: /* Synchronizing SI/DI, Improved cleanup : 34 bytes */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
[[File:Golt38.png|thumb|game of life, text mode, 32b - 38b]]<br />
== Switching to Textmode : 38 bytes==<br />
Setting up screen mode and pixel access is requiring quite a bit of space, so in this version, it is removed. That is directly punished with an additional byte, because <code>DI</code> is no longer involved in the process, thus, an optimization had to be removed. The assumption is that the computer this runs on, is already in text mode (40x25 chars, colors). This also helps with the calculation, since now it takes place directly on the screen (only one segment has to be set up) and no content has to be generated initially, since there is always at least something on the screen that works as seed value. A small downside is the alignment of cells, because in textmode, one cell occupies TWO bytes (one for color information). Luckily, the color information is by default set to "gray on black". An additional <code>dec bx</code>, replacing <code>lodsb</code> with <code>lodsw</code> and changing <code>mov bl,3</code> to <code>mov bl,6</code> helps fixing the alignment issue. Additionally, the screen address changed (<code>push 0xb800</code> <code>pop ds</code>) Another lucky coincident is, that instead of blue pixels, we now have a "smiley char" with orthogonal borders, which is a decent representation of a living cell. Inbetween marking and correction it shortly changes to an exclamation mark (!), which is barely visible.<br />
<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Using instructions as segment adress : 36 bytes ==<br />
[[File:LDStrick.png|frame|center|possible instructions for LDS trick in textmode]]<br />
Instead of using <code>push</code> and <code>pop</code> to get the screen adress, there is also the instruction <code>lds</code> available, which reads the segment value from memory. A value "close" to <code>0xb800</code> would be sufficient, because the visible screen in textmode is just a tiny part of the 64 kilobytes addressable by one segment. The idea is now to reuse parts of the code as segment address, which is possible when the instructions is one of the above. If there is such an instruction, it can start at the 4th byte (<code>[si]</code> points to the start of the code and <code>lds bx,[si]</code> puts the first two bytes into BX and the 3rd and 4th into DS, reversed). In this case <code>lodsw</code> can be reused as the first (higher) byte of the segment. The 3rd byte would be only relevant for alignment, so instead of putting "0x00" there, a one-byte-instruction can be used there. The whole process saves two bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsw<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Synchronizing SI/DI, Improved cleanup : 34 bytes ==<br />
A lot of tiny changes were the result of just one idea: How to optimize the clean up step? After all it is not really neccessary to correct a marked cell as soon as possible, instead, it can be waited for a certain amount of time/steps. But any nontrivial version of <code>shr byte [si-160-6],5</code> still uses four bytes, unless it is brought into one of the "pure" forms that only take up THREE bytes: <code>shr byte[(bp/bx)+si/di],x</code>. Since SI and BX were already in use, and the usage of BP would implicate that the register SS is used instead of DS, the only remaining register possible is DI.<br/><br />
Now there are very short instructions available to advance the registers SI and DI, some of them at the same time, and one of them is <code>cmpsw</code>. Not only does it not "hurt" the intended computation (the "compare" part of the instruction can be ignored), it also advances both SI and DI by TWO, so that the alignment of the screen in text mode is perfectly matched.<br/><br />
The usage of <code>cmpsw</code> requires to remove <code>lodsw</code> since there is no simple command to advance SI in the opposite direction (without involving direction flags), so it had been changed again to <code>lodsb</code> to be one of the commands that also works as high byte of a segment adress, and an additional <code>dec si</code> to align DI and SI, so that the clean up step is always in the same distance "behind" the current calculation. The assumption DI = SI - 258 is true on almost every DOS system. As a byproduct, one of the memory access instruction can now be rewritten to use DI instead of SI (like in the original), to save one byte.<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[di+bx+94] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=544Game of Life 32b2020-04-30T16:47:42Z<p>HellMood: </p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
[[File:Golt38.png|thumb|game of life, text mode, 32b - 38b]]<br />
== Switching to Textmode : 38 bytes==<br />
Setting up screen mode and pixel access is requiring quite a bit of space, so in this version, it is removed. That is directly punished with an additional byte, because <code>DI</code> is no longer involved in the process, thus, an optimization had to be removed. The assumption is that the computer this runs on, is already in text mode (40x25 chars, colors). This also helps with the calculation, since now it takes place directly on the screen (only one segment has to be set up) and no content has to be generated initially, since there is always at least something on the screen that works as seed value. A small downside is the alignment of cells, because in textmode, one cell occupies TWO bytes (one for color information). Luckily, the color information is by default set to "gray on black". An additional <code>dec bx</code>, replacing <code>lodsb</code> with <code>lodsw</code> and changing <code>mov bl,3</code> to <code>mov bl,6</code> helps fixing the alignment issue. Additionally, the screen address changed (<code>push 0xb800</code> <code>pop ds</code>) Another lucky coincident is, that instead of blue pixels, we now have a "smiley char" with orthogonal borders, which is a decent representation of a living cell. Inbetween marking and correction it shortly changes to an exclamation mark (!), which is barely visible.<br />
<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Using instructions as segment adress : 36 bytes ==<br />
[[File:LDStrick.png|frame|center|possible instructions for LDS trick in textmode]]<br />
Instead of using <code>push</code> and <code>pop</code> to get the screen adress, there is also the instruction <code>lds</code> available, which reads the segment value from memory. A value "close" to <code>0xb800</code> would be sufficient, because the visible screen in textmode is just a tiny part of the 64 kilobytes addressable by one segment. The idea is now to reuse parts of the code as segment address, which is possible when the instructions is one of the above. If there is such an instruction, it can start at the 4th byte (<code>[si]</code> points to the start of the code and <code>lds bx,[si]</code> puts the first two bytes into BX and the 3rd and 4th into DS, reversed). In this case <code>lodsw</code> can be reused as the first (higher) byte of the segment. The 3rd byte would be only relevant for alignment, so instead of putting "0x00" there, a one-byte-instruction can be used there. The whole process saves two bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsw<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Synchronizing SI/DI, Improved cleanup : 34 bytes ==<br />
A lot of tiny changes were the result of just one idea: How to optimize the clean up step? After all it is not really neccessary to correct a marked cell as soon as possible, instead, it can be waited for a certain amount of time/steps. But any nontrivial version of <code>shr byte [si-160-6],5</code> still uses four bytes, unless it is brought into one of the "pure" forms that only take up THREE bytes: <code>shr byte[(bp/bx)+si/di],x</code>. Since SI and BX were already in use, and the usage of BP would implicate that the register SS is used instead of DS, the only remaining register possible is DI.<br/><br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsb<br />
dec si<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si],al ; Add in new cell ^<br />
cmpsw<br />
shr byte [di],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[di+bx+94] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=543Game of Life 32b2020-04-30T16:37:57Z<p>HellMood: /* Switching to Textmode : 38 bytes */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
[[File:Golt38.png|thumb|game of life, text mode, 32b - 38b]]<br />
== Switching to Textmode : 38 bytes==<br />
Setting up screen mode and pixel access is requiring quite a bit of space, so in this version, it is removed. That is directly punished with an additional byte, because <code>DI</code> is no longer involved in the process, thus, an optimization had to be removed. The assumption is that the computer this runs on, is already in text mode (40x25 chars, colors). This also helps with the calculation, since now it takes place directly on the screen (only one segment has to be set up) and no content has to be generated initially, since there is always at least something on the screen that works as seed value. A small downside is the alignment of cells, because in textmode, one cell occupies TWO bytes (one for color information). Luckily, the color information is by default set to "gray on black". An additional <code>dec bx</code>, replacing <code>lodsb</code> with <code>lodsw</code> and changing <code>mov bl,3</code> to <code>mov bl,6</code> helps fixing the alignment issue. Additionally, the screen address changed (<code>push 0xb800</code> <code>pop ds</code>) Another lucky coincident is, that instead of blue pixels, we now have a "smiley char" with orthogonal borders, which is a decent representation of a living cell. Inbetween marking and correction it shortly changes to an exclamation mark (!), which is barely visible.<br />
<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Using instructions as segment adress : 36 bytes ==<br />
[[File:LDStrick.png|frame|center|possible instructions for LDS trick in textmode]]<br />
Instead of using <code>push</code> and <code>pop</code> to get the screen adress, there is also the instruction <code>lds</code> available, which reads the segment value from memory. A value "close" to <code>0xb800</code> would be sufficient, because the visible screen in textmode is just a tiny part of the 64 kilobytes addressable by one segment. The idea is now to reuse parts of the code as segment address, which is possible when the instructions is one of the above. If there is such an instruction, it can start at the 4th byte (<code>[si]</code> points to the start of the code and <code>lds bx,[si]</code> puts the first two bytes into BX and the 3rd and 4th into DS, reversed). In this case <code>lodsw</code> can be reused as the first (higher) byte of the segment. The 3rd byte would be only relevant for alignment, so instead of putting "0x00" there, a one-byte-instruction can be used there. The whole process saves two bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsw<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=542Game of Life 32b2020-04-30T16:16:57Z<p>HellMood: /* Using instructions as segment adress : 36 bytes */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
[[File:Golt38.png|thumb|game of life, text mode, 32b - 38b]]<br />
== Switching to Textmode : 38 bytes==<br />
Setting up screen mode and pixel access is requiring quite a bit of space, so in this version, it is removed. The assumption is that the computer this runs on, is already in text mode (40x25 chars, colors). This also helps with the calculation, since now it takes place directly on the screen (only one segment has to be set up) and no content has to be generated initially, since there is always at least something on the screen that works as seed value. A small downside is the alignment of cells, because in textmode, one cell occupies TWO bytes (one for color information). Luckily, the color information is by default set to "gray on black". An additional <code>dec bx</code>, replacing <code>lodsb</code> with <code>lodsw</code> and changing <code>mov bl,3</code> to <code>mov bl,6</code> helps fixing the alignment issue. Additionally, the screen address changed (<code>push 0xb800</code> <code>pop ds</code>) Another lucky coincident is, that instead of blue pixels, we now have a "smiley char" with orthogonal borders, which is a decent representation of a living cell. Inbetween marking and correction it shortly changes to an exclamation mark (!), which is barely visible.<br />
<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Using instructions as segment adress : 36 bytes ==<br />
[[File:LDStrick.png|frame|center|possible instructions for LDS trick in textmode]]<br />
Instead of using <code>push</code> and <code>pop</code> to get the screen adress, there is also the instruction <code>lds</code> available, which reads the segment value from memory. A value "close" to <code>0xb800</code> would be sufficient, because the visible screen in textmode is just a tiny part of the 64 kilobytes addressable by one segment. The idea is now to reuse parts of the code as segment address, which is possible when the instructions is one of the above. If there is such an instruction, it can start at the 4th byte (<code>[si]</code> points to the start of the code and <code>lds bx,[si]</code> puts the first two bytes into BX and the 3rd and 4th into DS, reversed). In this case <code>lodsw</code> can be reused as the first (higher) byte of the segment. The 3rd byte would be only relevant for alignment, so instead of putting "0x00" there, a one-byte-instruction can be used there. The whole process saves two bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsw<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=541Game of Life 32b2020-04-30T16:15:06Z<p>HellMood: /* TODO */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
[[File:Golt38.png|thumb|game of life, text mode, 32b - 38b]]<br />
== Switching to Textmode : 38 bytes==<br />
Setting up screen mode and pixel access is requiring quite a bit of space, so in this version, it is removed. The assumption is that the computer this runs on, is already in text mode (40x25 chars, colors). This also helps with the calculation, since now it takes place directly on the screen (only one segment has to be set up) and no content has to be generated initially, since there is always at least something on the screen that works as seed value. A small downside is the alignment of cells, because in textmode, one cell occupies TWO bytes (one for color information). Luckily, the color information is by default set to "gray on black". An additional <code>dec bx</code>, replacing <code>lodsb</code> with <code>lodsw</code> and changing <code>mov bl,3</code> to <code>mov bl,6</code> helps fixing the alignment issue. Additionally, the screen address changed (<code>push 0xb800</code> <code>pop ds</code>) Another lucky coincident is, that instead of blue pixels, we now have a "smiley char" with orthogonal borders, which is a decent representation of a living cell. Inbetween marking and correction it shortly changes to an exclamation mark (!), which is barely visible.<br />
<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== Using instructions as segment adress : 36 bytes ==<br />
[[File:LDStrick.png|frame|center|possible instructions for LDS trick in textmode]]<br />
Instead of using <code>push</code> and <code>pop</code> to get the screen adress, there is also the instruction <code>lds</code> available, which reads the segment value from memory. A value "close" to <code>0xb800</code> would be sufficient, because the visible screen in textmode is just a tiny part of the 64 kilobytes addressable by one segment. The idea is now to reuse parts of the code as segment address, which is possible when the instructions is one of the above. If there is such an instruction, it can start at the 4th byte (<code>[si]</code> points to the start of the code and <code>lds bx,[si]</code> puts the first two bytes into BX and the 3rd and 4th into DS, reversed). The 3rd byte would be only relevant for alignment, so instead of putting "0x00" there, a one-byte-instruction can be used there. The whole process saves two bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
lds bx,[si] <br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
lodsw<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
jmp short LifeLoop<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=File:LDStrick.png&diff=540File:LDStrick.png2020-04-30T16:08:57Z<p>HellMood: </p>
<hr />
<div>possible instructions for LDS trick in textmode</div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=539Game of Life 32b2020-04-30T15:49:28Z<p>HellMood: /* Switching to Textmode : 38 bytes */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
[[File:Golt38.png|thumb|game of life, text mode, 32b - 38b]]<br />
== Switching to Textmode : 38 bytes==<br />
Setting up screen mode and pixel access is requiring quite a bit of space, so in this version, it is removed. The assumption is that the computer this runs on, is already in text mode (40x25 chars, colors). This also helps with the calculation, since now it takes place directly on the screen (only one segment has to be set up) and no content has to be generated initially, since there is always at least something on the screen that works as seed value. A small downside is the alignment of cells, because in textmode, one cell occupies TWO bytes (one for color information). Luckily, the color information is by default set to "gray on black". An additional <code>dec bx</code>, replacing <code>lodsb</code> with <code>lodsw</code> and changing <code>mov bl,3</code> to <code>mov bl,6</code> helps fixing the alignment issue. Additionally, the screen address changed (<code>push 0xb800</code> <code>pop ds</code>) Another lucky coincident is, that instead of blue pixels, we now have a "smiley char" with orthogonal borders, which is a decent representation of a living cell. Inbetween marking and correction it shortly changes to an exclamation mark (!), which is barely visible.<br />
<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== TODO ==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=File:Golt38.png&diff=538File:Golt38.png2020-04-30T15:49:13Z<p>HellMood: </p>
<hr />
<div>game of life, textmode, 32b - 38b</div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=537Game of Life 32b2020-04-30T15:47:17Z<p>HellMood: /* Switching to Textmode : 39 bytes */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
== Switching to Textmode : 38 bytes==<br />
Setting up screen mode and pixel access is requiring quite a bit of space, so in this version, it is removed. The assumption is that the computer this runs on, is already in text mode (40x25 chars, colors). This also helps with the calculation, since now it takes place directly on the screen (only one segment has to be set up) and no content has to be generated initially, since there is always at least something on the screen that works as seed value. A small downside is the alignment of cells, because in textmode, one cell occupies TWO bytes (one for color information). Luckily, the color information is by default set to "gray on black". An additional <code>dec bx</code>, replacing <code>lodsb</code> with <code>lodsw</code> and changing <code>mov bl,3</code> to <code>mov bl,6</code> helps fixing the alignment issue. Additionally, the screen address changed (<code>push 0xb800</code> <code>pop ds</code>) Another lucky coincident is, that instead of blue pixels, we now have a "smiley char" with orthogonal borders, which is a decent representation of a living cell. Inbetween marking and correction it shortly changes to an exclamation mark (!), which is barely visible.<br />
<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp short LifeLoop<br />
</syntaxhighlight><br />
<br />
== TODO ==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=536Game of Life 32b2020-04-30T15:43:06Z<p>HellMood: /* Switching to Textmode : 39 bytes */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
== Switching to Textmode : 39 bytes==<br />
Setting up screen mode and pixel access is requiring quite a bit of space, so in this version, it is removed. The assumption is that the computer this runs on, is already in text mode (40x25 chars, colors). This also helps with the calculation, since now it takes place directly on the screen (only one segment has to be set up) and no content has to be generated initially, since there is always at least something on the screen that works as seed value. A small downside is the alignment of cells, because in textmode, one cell occupies TWO bytes (one for color information). Luckily, the color information is by default set to "gray on black". An additional <code>dec bx</code>, replacing <code>lodsb</code> with <code>lodsw</code> and changing <code>mov bl,3</code> to <code>mov bl,6</code> helps fixing the alignment issue. Additionally, the screen address changed (<code>push 0xb800</code> <code>pop ds</code>) Another lucky coincident is, that instead of blue pixels, we now have a "smiley char" with orthogonal borders, which is a decent representation of a living cell. Inbetween marking and correction it shortly changes to an exclamation mark (!), which is barely visible.<br />
<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp LifeLoop<br />
</syntaxhighlight><br />
<br />
== TODO ==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=535Game of Life 32b2020-04-30T15:33:09Z<p>HellMood: /* Remove key handler and RNG : 44 bytes */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
[[File:Gol44c.png|thumb|game of life, 44 bytes, increased contrast]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
== Switching to Textmode : 39 bytes==<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp LifeLoop<br />
</syntaxhighlight><br />
== TODO ==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=File:Gol44c.png&diff=534File:Gol44c.png2020-04-30T15:32:54Z<p>HellMood: </p>
<hr />
<div>game of life, 44 bytes, increased contrast</div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=533Game of Life 32b2020-04-30T15:25:50Z<p>HellMood: /* Remove key handler and RNG : 44 bytes */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
In order to reach 32 bytes, all the convenient stuff has to be removed. In case there is space left, parts of it could be reintegrated again. There are tiny changes to make this work as intended. The segment where all the calculation takes place has been changed to <code>1000h</code>, pointing to a lower memory location. (Note: this might be working just with DosBox) The activity there (visible on the screen) helps spawning actual game of life structures.<br />
[[File:Gol44.png|thumb|right|game of life, 44 bytes]]<br />
<code>mov al,[si]</code> and <code>inc si</code> have been replaced with <code>lodsb</code> since that saves one byte.<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
push 01000h ; DS = low memory segment<br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
<br />
== Switching to Textmode : 39 bytes==<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp LifeLoop<br />
</syntaxhighlight><br />
== TODO ==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=File:Gol44.png&diff=532File:Gol44.png2020-04-30T15:23:44Z<p>HellMood: </p>
<hr />
<div>game of life, 44 bytes</div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=531Game of Life 32b2020-04-30T15:05:32Z<p>HellMood: </p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>, which as shown before, does not hurt the computation, besides the cell value has a temporary value of 32 or 33, thus being visible as brighter blue pixel in the short time span between marking and correction.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,93h ; Set mode 13h <br />
int 10h <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
== Switching to Textmode : 39 bytes==<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp LifeLoop<br />
</syntaxhighlight><br />
== TODO ==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=530Game of Life 32b2020-04-30T15:00:46Z<p>HellMood: /* Original version : 65 bytes */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. Setting up screen mode and putting pixels to the screen is described in the basic sectionn of this Wiki. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only eight neighbour cells inside a 3x3 neighbourhood , ALL nine cells are taken into consideration, and the rules are reinterpreted as:<br />
<br /><br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br /><br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" by <code>shr [byte di-65],5</code> to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br /><br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br /><br />
<br /><br />
When the summation is complete, the aforementioned <code>rcr</code> is executed, but not before setting the carry flag (<code>stc</code>) which will be rotated in from the left, and directly right of the original cell value. By extracting the 6th bit of this rotated value (with <code>and al,0x20</code> we get exactly the value according to the rules defined above. <br /><br />
<br /><br />
This value is now set in the original cell with <code>or [si-1],al</code>.<br />
<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,93h ; Set mode 13h <br />
int 10h <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
== Switching to Textmode : 39 bytes==<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp LifeLoop<br />
</syntaxhighlight><br />
== TODO ==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=529Game of Life 32b2020-04-30T14:52:56Z<p>HellMood: /* Original version : 65 bytes */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only neighbour cells inside a 3x3 neighbourhood , ALL cells are taken into consideration, and the rules are reinterpreted as:<br />
<br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br />
At the start of the loop, there is already the first "trick" happening. The register of summation <code>cl</code> is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code> after <code>xchg cx,ax</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcr</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,93h ; Set mode 13h <br />
int 10h <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
== Switching to Textmode : 39 bytes==<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp LifeLoop<br />
</syntaxhighlight><br />
== TODO ==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=528Game of Life 32b2020-04-30T14:49:47Z<p>HellMood: /* Original version : 65 bytes */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only neighbour cells inside a 3x3 neighbourhood , ALL cells are taken into consideration, and the rules are reinterpreted as:<br />
<br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br />
At the start of the loop, there is already the first "trick" happening. The register of summation is not properly cleaned, but at this point it can either contain 0 or 32 from the instruction <code>and al,0x20</code>. If an arbitrary amount of cells has this on bit set, that won't hurt the calculation because of a special property of the <code>rcl</code> instruction. [https://kernfunny.org/x86/html/file_module_x86_id_273.html "The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 leastsignificant bits."]<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,93h ; Set mode 13h <br />
int 10h <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
== Switching to Textmode : 39 bytes==<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp LifeLoop<br />
</syntaxhighlight><br />
== TODO ==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=527Game of Life 32b2020-04-30T14:41:47Z<p>HellMood: /* Original version : 65 bytes */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only neighbour cells inside a 3x3 neighbourhood , ALL cells are taken into consideration, and the rules are reinterpreted as:<br />
<br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br />
Like in other (trivial) implementations, the 2D space is parsed cell by cell, from left to right, and from top to bottom. Since the game of life does not work "in situ" (updating the current cell instantly will lead to wrong results of following calculations), current cells are "marked", and when the calculations are advanced far enough that the cell in question does not influence any calculation of the current iteration, it will be "corrected" to the target value of the next iteration. The summation is as usual, an inner loop, adding up 3 cells of one column, and the outer loop, shifting from right (+1) to the left (-1), thus adding up 9 cells of a 3x3 neighbourhood. <br /><br />
<br />
At the start of the loop, there is already the first "trick" happening. The register of summation is not properly cleaned, but at this point it can either contain 0 or 32...<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,93h ; Set mode 13h <br />
int 10h <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
== Switching to Textmode : 39 bytes==<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp LifeLoop<br />
</syntaxhighlight><br />
== TODO ==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=526Game of Life 32b2020-04-30T14:32:27Z<p>HellMood: /* Original version : 65 bytes */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
<br /><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only neighbour cells inside a 3x3 neighbourhood , ALL cells are taken into consideration, and the rules are reinterpreted as:<br />
<br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,93h ; Set mode 13h <br />
int 10h <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
== Switching to Textmode : 39 bytes==<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp LifeLoop<br />
</syntaxhighlight><br />
== TODO ==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=525Game of Life 32b2020-04-30T14:32:03Z<p>HellMood: /* Original version : 65 bytes */</p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br /><br />
<br />
</br><br />
It will help to understand what the core algorithm does, before optimizing it. I will not go into the details of random number generation and key handling since these parts are removed in the final version anyway. The core routine computes the [https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#Rules "normal" game of life rules, but with a twist. Instead of regarding only neighbour cells inside a 3x3 neighbourhood , ALL cells are taken into consideration, and the rules are reinterpreted as:<br />
<br />
* If the number of cells is 3, the center cell will be alive.<br />
* If the number of cells is 4, the center cell keeps its state.<br />
* Otherwise, the cell dies (or stays dead).<br />
<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,93h ; Set mode 13h <br />
int 10h <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
== Switching to Textmode : 39 bytes==<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp LifeLoop<br />
</syntaxhighlight><br />
== TODO ==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=524Game of Life 32b2020-04-30T12:36:46Z<p>HellMood: </p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;) Meanwhile you can [https://www.pouet.net/prod.php?which=85485 download and comment] the intro.<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
==Remove key handler and RNG : 44 bytes==<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,93h ; Set mode 13h <br />
int 10h <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
== Switching to Textmode : 39 bytes==<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp LifeLoop<br />
</syntaxhighlight><br />
== TODO ==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=523Game of Life 32b2020-04-30T12:35:41Z<p>HellMood: </p>
<hr />
<div>This writeup is still in the works! Come back later to get information on all the mean tricks ;)<br />
<br />
==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
==Remove key handler and RNG : 44 bytes==<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,93h ; Set mode 13h <br />
int 10h <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
== Switching to Textmode : 39 bytes==<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp LifeLoop<br />
</syntaxhighlight><br />
== TODO ==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=522Game of Life 32b2020-04-30T12:28:28Z<p>HellMood: </p>
<hr />
<div>==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
==Remove key handler and RNG : 44 bytes==<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,93h ; Set mode 13h <br />
int 10h <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
== Switching to Textmode : 39 bytes==<br />
<syntaxhighlight lang="nasm"><br />
push 0xb800<br />
pop ds<br />
LifeLoop:<br />
stc ; 3 = birth, 4 = stay (tricky):<br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3)<br />
and al,20h ; ^carry | ^<br />
or [si-2],al ; Add in new cell ^<br />
shr byte [si-160-6],5 ; Shift previous value<br />
mov bl,6 <br />
xchg cx,ax<br />
AccLoop: add cl,[si+bx-160-4] ; Add in this column<br />
add cl,[si+bx-4]<br />
add cl,[si+bx+160-4]<br />
dec bx ; Loop back<br />
dec bx ; Loop back<br />
jnz AccLoop<br />
lodsw<br />
jmp LifeLoop<br />
</syntaxhighlight><br />
== TODO ==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=521Game of Life 32b2020-04-30T12:26:56Z<p>HellMood: </p>
<hr />
<div>==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
==Remove key handler and RNG : 44 bytes==<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,93h ; Set mode 13h <br />
int 10h <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
== Switching to Textmode : 39 bytes==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight><br />
== TODO ==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=520Game of Life 32b2020-04-30T12:26:33Z<p>HellMood: </p>
<hr />
<div>==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
==Remove key handler and RNG : 44 bytes==<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,93h ; Set mode 13h <br />
int 10h <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight><br />
== Switching to Textmode : 39 bytes==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=519Game of Life 32b2020-04-30T12:25:46Z<p>HellMood: /* Original version : 65 bytes */</p>
<hr />
<div>==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
== Switching to Textmode : 39 bytes==<br />
<syntaxhighlight lang="nasm"><br />
</syntaxhighlight><br />
<br />
==Remove key handler and RNG : 44 bytes==<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,93h ; Set mode 13h <br />
int 10h <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=518Game of Life 32b2020-04-30T12:20:53Z<p>HellMood: </p>
<hr />
<div>==Original version : 65 bytes==<br />
We'll start with the old 65 bytes version and bring it down to 32 bytes.<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight><br />
==Remove key handler and RNG : 44 bytes==<br />
<syntaxhighlight lang="nasm"><br />
Start: <br />
<br />
mov al,93h ; Set mode 13h <br />
int 10h <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
;mov al,[si] ; Get center cell, set pixel <br />
lodsb<br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
; inc si ; Loop while not zero <br />
jmp short LifeLoop <br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=517Game of Life 32b2020-04-30T09:55:19Z<p>HellMood: Protected "Game of Life 32b" ([Edit=Allow only administrators] (indefinite) [Move=Allow only administrators] (indefinite))</p>
<hr />
<div>We'll start with the old 65 bytes version and bring it down to 32 bytes.<br />
<br />
... more to come soon ...<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Game_of_Life_32b&diff=516Game of Life 32b2020-04-30T09:54:28Z<p>HellMood: Created page with "We'll start with the old 65 bytes version and bring it down to 32 bytes. ... more to come soon ... <syntaxhighlight lang="nasm"> ; http://read.pudn.com/downloads208/sourceco..."</p>
<hr />
<div>We'll start with the old 65 bytes version and bring it down to 32 bytes.<br />
<br />
... more to come soon ...<br />
<br />
<syntaxhighlight lang="nasm"><br />
; http://read.pudn.com/downloads208/sourcecode/asm/981812/LIFE65.ASM__.htm<br />
; Life simulator, 72 bytes - Vladislav Kaipetsky and Tenie Remmel <br />
; 65 bytes - Mark Andreas <br />
<br />
; If no args, regs on startup are: <br />
<br />
; AX = BX = 0000h <br />
; SI = IP = 0100h <br />
; DI = SP = FFFEh <br />
<br />
IDEAL <br />
MODEL TINY <br />
P386 <br />
CODESEG <br />
ORG 100h <br />
<br />
Start: int 1ah ; ah=00: cx=hours, dx=tic counter <br />
<br />
mov al,13h ; Set mode 13h <br />
int 10h <br />
<br />
xchg dx,ax <br />
<br />
push 09000h ; DS = last 64K segment <br />
pop ds <br />
push 0A000h ; ES = video memory <br />
pop es <br />
; BX is already zero <br />
RandLoop: <br />
rol ax,1 ; Generate random number <br />
adc [bx],al <br />
dec bx <br />
jnz RandLoop <br />
<br />
; BX will not be equal to 3 the first time this loop is executed, but <br />
; it will be for all other times. As SI = 0100h and DI = FFFEh on <br />
; startup, SI - DI will be equal to 258. <br />
<br />
LifeLoop: <br />
xchg cx,ax <br />
AccLoop: <br />
add cl,[di+bx-64] ; Add in this column <br />
add cl,[si+bx-2] <br />
add cl,[si+bx+318] <br />
dec bx ; Loop back <br />
jnz AccLoop <br />
<br />
mov al,[si] ; Get center cell, set pixel <br />
stosb <br />
stc ; 3 = birth, 4 = stay (tricky): <br />
rcr al,cl ; 1.00?0000x --> 0.0x100?00 (rcr 3) <br />
and al,20h ; ^carry | ^ <br />
; +---> 0.00x100?0 (rcr 4) <br />
or [si-1],al ; Add in new cell ^ <br />
shr [byte di-65],5 ; Shift previous value <br />
<br />
mov bl,3 ; 3 iterations in AccLoop <br />
inc si ; Loop while not zero <br />
jnz LifeLoop <br />
<br />
mov ah,1 ; Check for key <br />
int 16h <br />
jz LifeLoop ; Loop if no key <br />
<br />
xchg ax,bx ; Set text mode <br />
int 10h <br />
ret ; Return <br />
End Start <br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Case_Studies&diff=515Case Studies2020-04-30T09:52:39Z<p>HellMood: </p>
<hr />
<div>Sometimes it can be helpful to examine a tinyprog in detail, seeing what choices were made and why:<br />
<br />
:[[MELT.COM]]: We take a cute program from the 1980s and see how far we can crunch it down<br />
<br />
:[[m8trix 8b]]: An "Enter The Matrix" screen display in only 8 bytes<br />
<br />
:[[4is256]]: &#344;r&#345;ola's Tetris is a full tetris clone with scoring<br />
<br />
:[[paint16b]]: Hellmood's paint program in 16 bytes (really)<br />
<br />
:[[Indian Spirit]]: An American Indian tapestry with music in 32 bytes<br />
<br />
:[[Essence]]: Fakery of path tracing and lighting, with audio, in 64 bytes<br />
<br />
:[[Memories]]: Detailed writeup on [https://www.pouet.net/prod.php?which=85227 "Memories" (256 bytes)]<br />
<br />
:[[Game of Life 32b]]; Writeup for game of life in 32 bytes<br />
== External Case Studies ==<br />
<br />
[https://en.wikipedia.org/wiki/Brainfuck Brainfuck] is a very tiny language, having only 8 commands and an instruction pointer. Several compilers and interpreters have been made; Gynvael archived [http://gynvael.coldwind.pl/bf125/ many different versions along with their source code], the smallest of which was 125 bytes. qkumba took that as inspiration and created his own [http://pferrie.host22.com/misc/tiny/brainfck.htm brainfuck compiler in 100 bytes] (104 for one that is fully compliant). (Note: The term "compiler" is used mistakenly a lot in these descriptions; the majority of brainfuck programs are actually interpreters that load brainfuck programs and execute native code token by token. This one is an exception in that it really does compile the code entirely into native instructions first.)<br />
<br />
[http://canonical.org/~kragen/demo/fr-016.html Disassembly of Farbrausch's "fr-016: bytes"]<br />
<br />
[http://canonical.org/~kragen/demo/klappquadrat.html A disassembly of the 64-byte version of Klappquadrat]<br />
<br />
[https://trixter.oldskool.org/2012/12/17/maze-generation-in-thirteen-bytes/ Maze generation in 10 bytes]</div>HellMoodhttp://www.sizecoding.org/index.php?title=General_Coding_Tricks&diff=514General Coding Tricks2020-04-29T07:57:00Z<p>HellMood: added pjc50s explanation</p>
<hr />
<div>== Data is code, code is data ==<br />
<br />
Code is nothing more than data that the CPU interprets. For example, consider this multi-byte instruction:<br />
<br />
<syntaxhighlight lang=nasm><br />
mov ah,37h<br />
</syntaxhighlight><br />
<br />
This assembles to <code>B4 37</code>. <code>B4</code> by itself isn't interesting, but <code>37</code> happens to be the opcode for <code>AAS</code>. Let's say you had this code before a loop, and you needed to perform <code>AAS</code> at the top of a loop. Rather than put AAS at the top of the loop, you can reuse the opcode that will already be there as part of the <code>mov ah,37</code> that comes before it. Just jump directly into the middle of the <code>mov ah,37h</code>, which will get interpreted and executed as <code>AAS</code>:<br />
<br />
<syntaxhighlight lang=nasm><br />
<br />
label:<br />
mov ah,37h<br />
;misc. stuff<br />
loop label+1<br />
</syntaxhighlight><br />
<br />
The <code>+1</code> specifies the jump should go to 1 byte past the actual location.<br />
<br />
=== Reuse ===<br />
<br />
You can use opcodes hidden in your existing data. For example, .COM files can end with <code>RET</code>, which is opcode <code>C3</code>. If you already have a <code>C3</code> somewhere else in your code, even as part of data, just <code>JMP</code> to that pre-existing <code>C3</code> instead of adding a <code>RET</code>.<br />
<br />
== If your environment holds you back, change it ==<br />
<br />
The [https://en.wikipedia.org/wiki/Video_Graphics_Array#/media/File:VGA_palette_with_black_borders.svg default MCGA palette] is fairly horrible, but can be size advantages to changing it: While setting a new palette costs bytes, the new palette arrangement could save you headaches down the road. For example, if your code is calculating pixel colors that fall into goofy ranges, rather than constantly adjust the colors to sane ranges (ie. aligned to powers of 2), just set the palette so that values falling into those ranges look the way you want. (This assumes you have very small ways of redefining the palette, of course.)<br />
<br />
The above is maybe not the best example. Rewrites to this section are welcome.<br />
<br />
== Need a constant? ==<br />
<br />
If you need a constant value but you're out of space, search your assembled code for a byte value you can use.<br />
<br />
== A smaller way to point to Mode 13's screen segment ==<br />
<br />
Rather than <code>mov ah,0a0h; mov es,ax</code> or <code>push word 0a000h; pop es</code>, try this 2-byte wonder:<br />
<br />
<syntaxhighlight lang=nasm><br />
les bp,[bx]<br />
</syntaxhighlight><br />
<br />
This sets ES=9FFF, only one away from A000. You can write to the screen with ES: this way as long as you are aware the segment is one paragraph (16 bytes) behind, so just increase your offset by 16 if you need exact placement.<br />
<br />
How does this work? At start of execution of a .COM file, BX=0, and DS=CS. The contents of the COM file get loaded to offset 0x100 in that segment, but loaded before that is the PSP (program segment prefix), which DOS populates with information about the loaded program and other info. The PSP starts with <code>CD 20</code> (<code>INT 20</code>, which exits the program), so that's what gets loaded into BP. The next word is the number of the last free conventional memory segment, typically 0x9fff (but can be something different if parts of the upper memory range are either not installed or allocated).<br />
<br />
'''Warning:''' This trick doesn't always work. On FreeDos, this can set ES=9FE0 and there is something resident at that location that can screw up the system after normal program exit if you overwrite it.<br />
<br />
== Accessing the timer tick for free ==<br />
<br />
If using a 386+, FS=0 at .COM start. So, <code>FS:[046C]</code> gets you the DOS timer tick variable, which you could use for timing/pacing, or a random seed. Some environments, especially EMS/XMS programs, can modify the FS register, so it can't always assumed to be 0000h. <code>POP DS</code> right after the start and accessing <code>[046C]</code> does the trick then with equal size.<br />
<br />
== Looping twice ==<br />
<br />
If you need to repeat a section of code that doesn't modify the carry flag, and you know the carry flag is clear, you can loop once in only 3 bytes:<br />
<br />
<syntaxhighlight lang=nasm><br />
looping:<br />
;do stuff here<br />
cmc<br />
jc looping<br />
</syntaxhighlight><br />
<br />
== Looping three times ==<br />
<br />
If you need to repeat a section of code and you have a register whose value is zero and can be incremented freely, or whose value is -1 and can be decremented freely, you can loop twice in only 3 bytes:<br />
<br />
<syntaxhighlight lang=nasm><br />
looping:<br />
;do stuff here<br />
inc bx ;if decrementing instead, parity check must be reversed<br />
jpo looping ;1 (01b) and 2 (10b) have odd parity, 3 (11b) has even parity<br />
</syntaxhighlight><br />
<br />
== Obtaining X and Y without DIV (The Rrrola Trick) ==<br />
<br />
In [[Output#Outputting_in_mode_13h_.28320x200.29|320x200 mode]], instead of constructing X and Y from the screen pointer DI with DIV, you can get a decent estimation with multiplying the screen pointer with 0xCCCD and read X and Y from the 8bit registers DH (+DL as 16bit value) and DL (+AH as 16bit value). The idea is to interpret DI as a kind of 16 bit fixed point in the range [0,1], from start to end. Multiplying this number in [0,1] with 65536 / 320 = 204,8 results in the row before the comma, and again as a kind of a fixed point, the column after the comma. The representation 0xCCCD is the nearest rounding of 204,8 * 256 ( = 52428,8 ~ 52429 = 0xCCCD). As long as the 16 bit representations are used, there is no precision loss.<br />
<br />
This is adapted from [http://www.pouet.net/prod.php?which=53816 "Puls" by Rrrola] where X and Y are directly modified on the stack by performing <code>add dword[di],0000CCCDh</code> on each pixel iteration, which requires 7 bytes of code. The vertical alignment correction is solved with a good starting value on said DWORD on the stack before each frame, which requires 2 additional bytes. Both approaches are too different to directly compare, but share the core idea of [http://www.pouet.net/topic.php?which=8791&page=8#c411796 multiplying with <code>0xCCCD</code>], so "Rrrolas trick" is an appropriate term to use.<br />
<br />
=== Alternative explanation by [https://news.ycombinator.com/user?id=pjc50 pjc50]===<br />
[https://gistpreview.github.io/?9b252f267cd1fdf9754059bb73a18487 Interactive snippet]<br />
More clearly: DI = (y * 320) + x<br />
<br />
Multiply by 0xCCCD => (y * 0x1000040) + (x * 0xcccd)<br />
<br />
Take top byte is equivalent to divide by 0x1000000. So that gives you Y.<br />
The next lower (third) byte is then (x * 0xcccd / 0x10000) == (x * 52429 / 65536) =~ (x * 256/320).<br />
And the lower two bytes are noise.<br />
<br />
== Use the entire register for a smaller opcode form ==<br />
<br />
As you know e.g. <code>add cl,1</code> produces 3 Bytes of code while <code>inc cl</code> compiles to 2 Bytes. If ch does not matter (or you know that it won't be affected) use <code>inc cx</code> instead and get the most out of that 1 Byte. This is no real trick but sometimes such things can be overlooked - while the 2 saved Bytes could be invested wisely.<br />
<br />
== Use the carry flag in your calculations ==<br />
<br />
Let's say you have to <code>add si,128</code>. Unfortunately this takes 1 Byte more than <code>add si,127</code>. But you can add 128 without that extra Byte. If your previous code sets the carry flag simply include it into your calculation and <code>adc si,127</code>. The same goes for <code>sub si,128</code> vs <code>sbb si,127</code>.</div>HellMoodhttp://www.sizecoding.org/index.php?title=General_Coding_Tricks&diff=513General Coding Tricks2020-04-28T14:24:13Z<p>HellMood: apparently more than a few persons reading what i wrote in a minute, tiny fixes ;)</p>
<hr />
<div>== Data is code, code is data ==<br />
<br />
Code is nothing more than data that the CPU interprets. For example, consider this multi-byte instruction:<br />
<br />
<syntaxhighlight lang=nasm><br />
mov ah,37h<br />
</syntaxhighlight><br />
<br />
This assembles to <code>B4 37</code>. <code>B4</code> by itself isn't interesting, but <code>37</code> happens to be the opcode for <code>AAS</code>. Let's say you had this code before a loop, and you needed to perform <code>AAS</code> at the top of a loop. Rather than put AAS at the top of the loop, you can reuse the opcode that will already be there as part of the <code>mov ah,37</code> that comes before it. Just jump directly into the middle of the <code>mov ah,37h</code>, which will get interpreted and executed as <code>AAS</code>:<br />
<br />
<syntaxhighlight lang=nasm><br />
<br />
label:<br />
mov ah,37h<br />
;misc. stuff<br />
loop label+1<br />
</syntaxhighlight><br />
<br />
The <code>+1</code> specifies the jump should go to 1 byte past the actual location.<br />
<br />
=== Reuse ===<br />
<br />
You can use opcodes hidden in your existing data. For example, .COM files can end with <code>RET</code>, which is opcode <code>C3</code>. If you already have a <code>C3</code> somewhere else in your code, even as part of data, just <code>JMP</code> to that pre-existing <code>C3</code> instead of adding a <code>RET</code>.<br />
<br />
== If your environment holds you back, change it ==<br />
<br />
The [https://en.wikipedia.org/wiki/Video_Graphics_Array#/media/File:VGA_palette_with_black_borders.svg default MCGA palette] is fairly horrible, but can be size advantages to changing it: While setting a new palette costs bytes, the new palette arrangement could save you headaches down the road. For example, if your code is calculating pixel colors that fall into goofy ranges, rather than constantly adjust the colors to sane ranges (ie. aligned to powers of 2), just set the palette so that values falling into those ranges look the way you want. (This assumes you have very small ways of redefining the palette, of course.)<br />
<br />
The above is maybe not the best example. Rewrites to this section are welcome.<br />
<br />
== Need a constant? ==<br />
<br />
If you need a constant value but you're out of space, search your assembled code for a byte value you can use.<br />
<br />
== A smaller way to point to Mode 13's screen segment ==<br />
<br />
Rather than <code>mov ah,0a0h; mov es,ax</code> or <code>push word 0a000h; pop es</code>, try this 2-byte wonder:<br />
<br />
<syntaxhighlight lang=nasm><br />
les bp,[bx]<br />
</syntaxhighlight><br />
<br />
This sets ES=9FFF, only one away from A000. You can write to the screen with ES: this way as long as you are aware the segment is one paragraph (16 bytes) behind, so just increase your offset by 16 if you need exact placement.<br />
<br />
How does this work? At start of execution of a .COM file, BX=0, and DS=CS. The contents of the COM file get loaded to offset 0x100 in that segment, but loaded before that is the PSP (program segment prefix), which DOS populates with information about the loaded program and other info. The PSP starts with <code>CD 20</code> (<code>INT 20</code>, which exits the program), so that's what gets loaded into BP. The next word is the number of the last free conventional memory segment, typically 0x9fff (but can be something different if parts of the upper memory range are either not installed or allocated).<br />
<br />
'''Warning:''' This trick doesn't always work. On FreeDos, this can set ES=9FE0 and there is something resident at that location that can screw up the system after normal program exit if you overwrite it.<br />
<br />
== Accessing the timer tick for free ==<br />
<br />
If using a 386+, FS=0 at .COM start. So, <code>FS:[046C]</code> gets you the DOS timer tick variable, which you could use for timing/pacing, or a random seed. Some environments, especially EMS/XMS programs, can modify the FS register, so it can't always assumed to be 0000h. <code>POP DS</code> right after the start and accessing <code>[046C]</code> does the trick then with equal size.<br />
<br />
== Looping twice ==<br />
<br />
If you need to repeat a section of code that doesn't modify the carry flag, and you know the carry flag is clear, you can loop once in only 3 bytes:<br />
<br />
<syntaxhighlight lang=nasm><br />
looping:<br />
;do stuff here<br />
cmc<br />
jc looping<br />
</syntaxhighlight><br />
<br />
== Looping three times ==<br />
<br />
If you need to repeat a section of code and you have a register whose value is zero and can be incremented freely, or whose value is -1 and can be decremented freely, you can loop twice in only 3 bytes:<br />
<br />
<syntaxhighlight lang=nasm><br />
looping:<br />
;do stuff here<br />
inc bx ;if decrementing instead, parity check must be reversed<br />
jpo looping ;1 (01b) and 2 (10b) have odd parity, 3 (11b) has even parity<br />
</syntaxhighlight><br />
<br />
== Obtaining X and Y without DIV (The Rrrola Trick) ==<br />
<br />
In [[Output#Outputting_in_mode_13h_.28320x200.29|320x200 mode]], instead of constructing X and Y from the screen pointer DI with DIV, you can get a decent estimation with multiplying the screen pointer with 0xCCCD and read X and Y from the 8bit registers DH (+DL as 16bit value) and DL (+AH as 16bit value). The idea is to interpret DI as a kind of 16 bit fixed point in the range [0,1], from start to end. Multiplying this number in [0,1] with 65536 / 320 = 204,8 results in the row before the comma, and again as a kind of a fixed point, the column after the comma. The representation 0xCCCD is the nearest rounding of 204,8 * 256 ( = 52428,8 ~ 52429 = 0xCCCD). As long as the 16 bit representations are used, there is no precision loss.<br />
<br />
This is adapted from [http://www.pouet.net/prod.php?which=53816 "Puls" by Rrrola] where X and Y are directly modified on the stack by performing <code>add dword[di],0000CCCDh</code> on each pixel iteration, which requires 7 bytes of code. The vertical alignment correction is solved with a good starting value on said DWORD on the stack before each frame, which requires 2 additional bytes. Both approaches are too different to directly compare, but share the core idea of [http://www.pouet.net/topic.php?which=8791&page=8#c411796 multiplying with <code>0xCCCD</code>], so "Rrrolas trick" is an appropriate term to use.<br />
<br />
== Use the entire register for a smaller opcode form ==<br />
<br />
As you know e.g. <code>add cl,1</code> produces 3 Bytes of code while <code>inc cl</code> compiles to 2 Bytes. If ch does not matter (or you know that it won't be affected) use <code>inc cx</code> instead and get the most out of that 1 Byte. This is no real trick but sometimes such things can be overlooked - while the 2 saved Bytes could be invested wisely.<br />
<br />
== Use the carry flag in your calculations ==<br />
<br />
Let's say you have to <code>add si,128</code>. Unfortunately this takes 1 Byte more than <code>add si,127</code>. But you can add 128 without that extra Byte. If your previous code sets the carry flag simply include it into your calculation and <code>adc si,127</code>. The same goes for <code>sub si,128</code> vs <code>sbb si,127</code>.</div>HellMoodhttp://www.sizecoding.org/index.php?title=Getting_Started&diff=512Getting Started2020-04-28T07:24:18Z<p>HellMood: replaced the memory segment trick</p>
<hr />
<div>== Words of warning ==<br />
<br />
Sizecoding assumes a basic level of assembler knowledge. You should have at least a few regular (non-optimized) assembler programs under your belt before you attempt sizecoding. Also, don't assume sizecoding is ''normal'' -- shaving bytes is a black art that should be kept far, far away from normal programming targets. People sizecode for fun, not profit!<br />
<br />
== Tools and Workflows ==<br />
<br />
Most people use NASM (it's free) and DOSBox for testing. A simple workflow:<br />
<br />
<syntaxhighlight><br />
nasm main.asm -fbin -o main.com<br />
ndisasm -o100h main.com<br />
dir main.com<br />
</syntaxhighlight><br />
<br />
Some crazy sizecoders like writing tinyprogs on very old hardware. For that, a86 is a good choice for an assembler, as it assembles source directly to a .COM file in one pass. Even on the first 8086 systems, you can use debug.com to check the assembly and size:<br />
<br />
<syntaxhighlight><br />
a86 main.asm<br />
debug main.com<br />
dir main.com<br />
</syntaxhighlight><br />
<br />
== Know your environment ==<br />
<br />
=== A simple graphics mode ===<br />
<br />
Most sizecoders choose to write to mode 13h, a chunky 320x200 graphics mode located at segment A000:0000. Each byte is a pixel, and the graphics buffer is linear, so it is extremely easy to program for. Because it is contained to a single segment, you can be sloppy, as overwriting or underwriting the offset value won't damage anything. The [https://en.wikipedia.org/wiki/Video_Graphics_Array#/media/File:VGA_palette_with_black_borders.svg default palette] leaves a little to be desired, so if you have room, you may want to change it.<br />
<br />
=== .COM file defaults ===<br />
<br />
Knowing what register values are initialized at program start can save you the trouble of having to set them in your code. On most (but not all) DOS environments, the following registers have these default values:<br />
<br />
<pre><br />
AX=0000<br />
BX=0000<br />
CX=00FF<br />
DX=CS<br />
SI=0100<br />
DI=FFFE<br />
BP=09xx<br />
SP=FFFC (DOS child process) or FFFE (the default)<br />
<br />
flags=xx02<br />
</pre><br />
<br />
Because .COM files only support 64K executables, <code>DS</code>, <code>ES</code>, and <code>SS</code> are all set to the same value as <code>CS</code>. The rest can't be counted on for any specific value, except that <code>BP</code> is mostly 09??h so you can usually count on the high byte being <code>09h</code>.<br />
<br />
Usually, the top stack contains <code>0000h</code>. That allows for using the one byte instruction <code>RET</code> to exit your program, since at location <code>CS:0000h</code> there is the value <code>20CDh</code> (= <code>int 20h</code>). See [https://en.wikipedia.org/wiki/Program_Segment_Prefix Program Segment Prefix].<br />
<br />
Different DOS versions may differ slightly in the above. A full chart exists at http://www.fysnet.net/yourhelp.htm to help you check if your specific target has different defaults than the standard settings above.<br />
<br />
=== Boot sector defaults ===<br />
<br />
Boot sector tinyprogs are occasionally explored, but the BIOS changes every register value as it executes before the boot sequence, so there's not much to count on other than what occurs directly before execution of the boot sector:<br />
* The boot sector is loaded at 0000:7C00<br />
* DL holds the drive number that was booted from, so if booted from a floppy disk in drive A:, it will be 00<br />
* The stack pointer is 512 bytes beyond the end of the boot sector, so SP is likely 7E00h<br />
<br />
This is why most sizecoders target .COM files, and is also why [https://github.com/nanochess/Toledo-Atomchess Toledo Atomchess] is 9 bytes larger if loaded from boot sector than from a .COM file -- it has to spend bytes setting things up like a .COM file for the rest of the code to work. <br />
<br />
If you still want to target boot sectors, your reward is 510 bytes available for program code (the last two bytes must be 0x55, 0xAA to be [http://thestarman.pcministry.com/asm/mbr/PartTables.htm recognized as a boot sector]).<br />
<br />
=== 1-byte opcodes ===<br />
The 80x86 family was originally a [https://en.wikipedia.org/wiki/Complex_instruction_set_computing CISC] design, a design philosophy that attempts to create instructions that perform multiple steps. As such, there are some single instructions that perform complex actions, such as the string opcodes (<code>LODS</code>, <code>MOVS</code>, <code>CMPS</code>, and <code>SCAS</code>). In sizecoding, you are trying to perform as much work in as little space as possible, so it is helpful to know (or memorize!) every 1-byte instruction in the 80x86 family. Here's a handy chart (segments and prefixes omitted):<br />
<br />
{| class="wikitable sortable"<br />
|-<br />
! Opcode !! Mnemonic !! Arch !! Description !! Notes<br />
|-<br />
| 37 || AAA || || ASCII adjust AL (carry into AH) after addition<br />
|-<br />
|3F||AAS || ||ASCII adjust AL (borrow from AH) after subtraction <br />
|-<br />
|98||CBW || ||Convert byte into word (AH = top bit of AL) <br />
|-<br />
|99||CDQ ||80386+||Convert dword to qword (EDX = top bit of EAX) <br />
|-<br />
|F8||CLC || ||Clear carry flag <br />
|-<br />
|FC||CLD || ||Clear direction flag so SI and DI will increment <br />
|-<br />
|FA||CLI || ||Clear interrupt enable flag; interrupts disabled <br />
|-<br />
|F5||CMC || ||Complement carry flag <br />
|-<br />
|A6||CMPS mb,mb || ||Compare bytes [SI] - ES:[DI], advance SI,DI <br />
|-<br />
|A7||CMPS mv,mv || ||Compare words [SI] - ES:[DI], advance SI,DI <br />
|-<br />
|A6||CMPSB || ||Compare bytes DS:[SI] - ES:[DI], advance SI,DI <br />
|-<br />
|A7||CMPSD ||80386+||Compare dwords DS:[SI] - ES:[DI], advance SI,DI <br />
|-<br />
|A7||CMPSW || ||Compare words DS:[SI] - ES:[DI], advance SI,DI <br />
|-<br />
|99||CWD || ||Convert word to doubleword (DX = top bit of AX) <br />
|-<br />
|98||CWDE ||80386+||Sign-extend word AX to doubleword EAX <br />
|-<br />
|27||DAA || ||Decimal adjust AL after addition <br />
|-<br />
|2F||DAS || ||Decimal adjust AL after subtraction <br />
|-<br />
|F4||HLT || ||Halt||Resumes operation if an interrupt occurs; could use this for pacing effects that run too fast<br />
|-<br />
|EC||IN AL,DX || ||Input byte from port DX into AL <br />
|-<br />
|ED||IN eAX,DX || ||Input word from port DX into eAX <br />
|-<br />
|6C||INS rmb,DX ||80186+||Input byte from port DX into [DI], advance DI <br />
|-<br />
|6D||INS rmv,DX ||80186+||Input word from port DX into [DI], advance DI <br />
|-<br />
|6C||INSB ||80186+||Input byte from port DX into ES:[DI], advance DI <br />
|-<br />
|6D||INSD ||80386+||Input dword from port DX into ES:[DI], advance DI <br />
|-<br />
|6D||INSW ||80186+||Input word from port DX into ES:[DI], advance DI <br />
|-<br />
|CC||INT 3 || ||Interrupt 3 (trap to debugger)||If performing very many CALLs to a single procedure, could make it INT 3<br />
|-<br />
|CE||INTO || ||Interrupt 4 if overflow flag is 1 <br />
|-<br />
|CF||IRET || ||Interrupt return (far return and pop flags) <br />
|-<br />
|CF||IRETD ||80386+||Interrupt return (pop EIP, ECS, Eflags) <br />
|-<br />
|9F||LAHF || ||Load: AH = flags SF ZF xx AF xx PF xx CF <br />
|-<br />
|C9||LEAVE ||80186+||Set SP to BP, then POP BP (reverses previous ENTER) <br />
|-<br />
|AC||LODS mb || ||Load byte [SI] into AL, advance SI <br />
|-<br />
|AD||LODS mv || ||Load word [SI] into eAX, advance SI <br />
|-<br />
|AC||LODSB || ||Load byte [SI] into AL, advance SI <br />
|-<br />
|AD||LODSD ||80386+||Load dword [SI] into EAX, advance SI <br />
|-<br />
|AD||LODSW || ||Load word [SI] into AX, advance SI <br />
|-<br />
|A4||MOVS mb,mb || ||Move byte [SI] to ES:[DI], advance SI,DI <br />
|-<br />
|A5||MOVS mv,mv || ||Move word [SI] to ES:[DI], advance SI,DI <br />
|-<br />
|A4||MOVSB || ||Move byte DS:[SI] to ES:[DI], advance SI,DI <br />
|-<br />
|A5||MOVSD ||80386+||Move dword DS:[SI] to ES:[DI], advance SI,DI <br />
|-<br />
|A5||MOVSW || ||Move word DS:[SI] to ES:[DI], advance SI,DI <br />
|-<br />
|90||NOP || ||No Operation <br />
|-<br />
|EE||OUT DX,AL || ||Output byte AL to port number DX <br />
|-<br />
|EF||OUT DX,eAX || ||Output word eAX to port number DX <br />
|-<br />
|6E||OUTS DX,rmb ||80186+||Output byte [SI] to port number DX, advance SI <br />
|-<br />
|6F||OUTS DX,rmv ||80186+||Output word [SI] to port number DX, advance SI <br />
|-<br />
|6E||OUTSB ||80186+||Output byte DS:[SI] to port number DX, advance SI <br />
|-<br />
|6F||OUTSD ||80386+||Output dword DS:[SI] to port number DX, advance SI <br />
|-<br />
|6F||OUTSW ||80186+||Output word DS:[SI] to port number DX, advance SI <br />
|-<br />
|1F||POP DS || ||Set DS to top of stack, increment SP by 2 <br />
|-<br />
|07||POP ES || ||Set ES to top of stack, increment SP by 2 <br />
|-<br />
|17||POP SS || ||Set SS to top of stack, increment SP by 2 <br />
|-<br />
|61||POPA ||80186+||Pop DI,SI,BP,x ,BX,DX,CX,AX (SP value is ignored) <br />
|-<br />
|61||POPAD ||80386+||Pop EDI,ESI,EBP,x,EBX,EDX,ECX,EAX (ESP ign.) <br />
|-<br />
|9D||POPF || ||Set flags register to top of stack, increment SP by 2 <br />
|-<br />
|9D||POPFD ||80386+||Set eflags reg to top of stack, incr SP by 2 <br />
|-<br />
|0E||PUSH CS || ||Set [SP-2] to CS, then decrement SP by 2 <br />
|-<br />
|1E||PUSH DS || ||Set [SP-2] to DS, then decrement SP by 2 <br />
|-<br />
|06||PUSH ES || ||Set [SP-2] to ES, then decrement SP by 2 <br />
|-<br />
|16||PUSH SS || ||Set [SP-2] to SS, then decrement SP by 2 <br />
|-<br />
|60||PUSHA ||80186+||Push AX,CX,DX,BX,original SP,BP,SI,DI <br />
|-<br />
|60||PUSHAD ||80386+||Push EAX,ECX,EDX,EBX,original ESP,EBP,ESI,EDI <br />
|-<br />
|9C||PUSHF || ||Set [SP-2] to flags register, then decrement SP by 2 <br />
|-<br />
|9C||PUSHFD ||80386+||Set [SP-4] to eflags reg, then decr SP by 4 <br />
|-<br />
|C3||RET || ||Return to caller (near or far, depending on PROC) <br />
|-<br />
|CB||RETF || ||Return to far caller (pop offset, then seg) <br />
|-<br />
|C3||RETN || ||Return to near caller (pop offset only) <br />
|-<br />
|9E||SAHF || ||Store AH into flags SF ZF xx AF xx PF xx CF <br />
|-<br />
|AE||SCAS mb || ||Compare bytes AL - ES:[DI], advance DI <br />
|-<br />
|AF||SCAS mv || ||Compare words eAX - ES:[DI], advance DI <br />
|-<br />
|AE||SCASB || ||Compare bytes AL - ES:[DI], advance DI <br />
|-<br />
|AF||SCASD ||80386+||Compare dwords EAX - ES:[DI], advance DI <br />
|-<br />
|AF||SCASW || ||Compare words AX - ES:[DI], advance DI <br />
|-<br />
|36||SS || ||Use SS segment for the following memory reference <br />
|-<br />
|F9||STC || ||Set carry flag <br />
|-<br />
|FD||STD || ||Set direction flag so SI and DI will decrement <br />
|-<br />
|FB||STI || ||Set interrupt enable flag, interrupts enabled <br />
|-<br />
|AA||STOS mb || ||Store AL to byte [DI], advance DI <br />
|-<br />
|AB||STOS mv || ||Store eAX to word [DI], advance DI <br />
|-<br />
|AA||STOSB || ||Store AL to byte ES:[DI], advance DI <br />
|-<br />
|AB||STOSD ||80386+||Store EAX to dword ES:[DI], advance DI <br />
|-<br />
|AB||STOSW || ||Store AX to word ES:[DI], advance DI <br />
|-<br />
|9B||WAIT || ||Wait until floating-point operation is completed <br />
|-<br />
|D7||XLAT || ||Set AL to memory byte DS:[BX + unsigned AL] <br />
|-<br />
<br />
|}<br />
<br />
Additionally, there are accumulator-optimized (AX) forms of opcodes that are one byte, such as <code>DEC AX</code>, <code>INC AX</code>, and <code>XCHG reg,AX</code>.<br />
<br />
== A sample framework ==<br />
<br />
Want to just dive in and see what happens? Here's a skeleton that sets up Mode 13h, loops until a keypress is detected, then exits. Although it assembles to only 19 bytes, do not consider this fully and finally optimized; your program's structure may allow some of this to be reduced further. But it's a good place to start:<br />
<br />
<syntaxhighlight lang=nasm><br />
org 100h ;specify .COM file<br />
<br />
start:<br />
mov al,13h ;AX=0000 at program start<br />
int 10h ;init mode 13h<br />
push word 0A000h ;Requires 80186 or higher to PUSH IMMED<br />
pop es ;ES now points to mode 13h screen segment<br />
<br />
mainloop:<br />
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<br />
;This is where you do your mega-amazing tiny program.<br />
;Write 8-bit values to A000:0000 to draw some pixels.<br />
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<br />
<br />
;Check for user wanting to leave by pressing ESC<br />
in al,60h ;read whatever is at keyboard port; looking for ESC which is #1<br />
dec al ;if ESC, AL now 0<br />
jnz mainloop ;fall through if 0, jump otherwise<br />
mov al,03 ;AX=0000 due to mainloop exit condition<br />
int 10h ;Switch back to text mode as a convenience<br />
ret ;.COM files can exit with RET<br />
</syntaxhighlight><br />
<br />
=== How to think like a sizecoder ===<br />
<br />
This example framework can be shrunk! Think a little bit about how you might do it, then check out the example below:<br />
<br />
<syntaxhighlight lang=nasm><br />
org 100h ;specify .COM file<br />
<br />
start:<br />
mov al,13h ;AX=0000 at program start<br />
int 10h ;init mode 13h<br />
les bx,[bx] ;contains 0x9FFF at program start; close enough to 0xA000 ;-)<br />
<br />
mainloop:<br />
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<br />
;This is where you do your mega-amazing tiny program.<br />
;Write 8-bit values to ES:xxxx to draw some pixels.<br />
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<br />
<br />
;Check for user wanting to leave by pressing ESC<br />
in al,60h ;read whatever is at keyboard port; looking for ESC which is #1<br />
dec al ;if ESC, AL now 0<br />
jnz mainloop ;fall through if 0, jump otherwise<br />
ret ;.COM files can exit with RET<br />
;Don't care if we set text mode, user can just MODE CO80<br />
</syntaxhighlight><br />
<br />
By casually disregarding the user experience :-) and fudging the start of video memory, our basic framework is now 13 bytes.<br />
<br />
== Where to go from here? ==<br />
<br />
[[Output#Outputting_in_Textmode_.2880x25.29| Hello World! console output]] shows how to output text.<br />
<br />
[[Techniques|Tips, Tricks, and Techniques]] can help you with ideas on optimizing your next production, or help you design while you're writing it.<br />
<br />
Some [[Case Studies]] are provided that illustrate and explain some of the choices made when sizecoding.<br />
<br />
Can't find what you need? Check our list of [[Resources|external resources]].</div>HellMoodhttp://www.sizecoding.org/index.php?title=Memories&diff=511Memories2020-04-22T07:38:50Z<p>HellMood: /* External Links */</p>
<hr />
<div>==Introduction==<br />
<br />
Hello! My name is "HellMood" and this is not a usual wiki style article. It will be written from my point of view, in first person. This article is all about a tiny MS DOS program called "Memories". This program has a size of 256 bytes and won the "PC 256 byte" competition of the [https://en.wikipedia.org/wiki/Demoscene demoscene] event [https://2020.revision-party.net/ "Revision"] in 2020, as well as the public choice award. You can watch a video of the output of this program here (https://www.youtube.com/watch?v=Imquk_3oFf4) and a video including live reactions of online audience and moderators here (https://www.twitch.tv/videos/589179638?t=7h02m40s). You can download the release and comment here (https://www.pouet.net/prod.php?which=85227). This wiki article will give you an indepth analysis of the program and point out historical references as well as development steps. In case you did not notice, you are now on the "sizecoding wiki". This will help you not only to understand the interior of "Memories", but also enable you to produce something like this yourself. Feel free to look around! It is recommended that you start with the [http://www.sizecoding.org/wiki/Getting_Started basics of this wiki] first if you're new to sizecoding and/or x86 assembler. While the principles might be easy to grasp, details of the implementation might not.<br />
<br />
==Disclaimer==<br />
This article discusses the submitted DosBox version (256 bytes). The available archive contains versions for FreeDos and Windows XP DOS which, at the time of this writing, don't work on all computers. These alternative versions were included as proof of concept, to show that this program does not only run on the emulator "DosBox". For the "PC 256 bytes" competition of the event "Revision" in 2020, it was allowed to specify either "FreeDos" or "DosBox" as platform, the latter with a specific configuration. It is in fact possible to modify the DosBox version to work on FreeDos, MS DOS, WinXP and Win98 as the released alternative versions showcase, but this is not the focus of this article. Anyway, safe versions for all platform and all computers are in the works. Regarding possible further optimizations, I will discuss/explain only the submitted version although i already found several spots in the code that can be optimized. As the history of tiny effects show, almost no code is perfect, and can be brought down further in size. To avoid confusion between different versions, only the submitted version will be explained.<br />
<br />
==History of tiny effects==<br />
[[File:Pouet categories.png|thumb|Size categories on http://www.pouet.net]]<br />
Sizecoders think in size categories. For MS DOS, these categories are 256b, 128b, 64b and 32b. These are the standards of one of the biggest demoscene archives, http://www.pouet.net. There is no 16b category, although many tiny effects can be implemented in 16 bytes. Almost all of the effects of "Memories" have been coded and optimized by me before, and mostly the implementations were attempts to reduce an already existing effect in size, or do something similar in less size. Reducing the size in this case means, reducing it to the (one of the) next lower 2^N category. For example I might have seen an effect done in 33 to 64 bytes, and then reduced the size to 32 bytes or less. Almost everytime I moved an effect to a lower category, I submitted the resulting tiny program to a demoscene party which allowed remote entries in the 256b category and/or published the result to http://www.pouet.net . In this section I will introduce the effects and mention the origins and authors.<br />
===Array of chessboards===<br />
[[File:Kasparov.gif|frame|kasparov, 16 bytes]]<br />
The origin of this effect is my own "Kasparov 16b" from 2018 (https://www.pouet.net/prod.php?which=75912) apparently I "had this piece lying around, stuck at 17 bytes and not that beautiful, until i found a trick ;)" (quote from the release notes) There was a similar effect in 32 bytes before : "ew" by "headcrash" from 2003. (https://www.pouet.net/prod.php?which=17567) In this case, my design goal was to show "real" chessboards of 8x8 fields and recognizable dark and light squares, as well as the correct orientation of the single chessboards, meaning that the bottom right corner (h1) has to be a [https://en.wikibooks.org/wiki/Chess/Arranging_The_Board white square]. For "Memories", the effect had to be reimplemented with another approach of writing to the screen to work with the framework, also the scrolling direction was changed to be different from the "scrolling tilted plane" effect.<br />
<br />
===Zooming circles===<br />
The zooming circles were supposed to be entered to a demoparty as a 32 bytes intro, but I never actually did it. There is no real predecessor in 64 bytes, because in the 64b category much more complex effects are possible. The zooming circles were the result of desperately trying to reach the 32b category for a circular "tunnel" effect, for that my own record is still 52 bytes ("Neontube" - 2016)(https://www.pouet.net/prod.php?which=66808) which in turn was the optimization of the all-time-classic 64 byte effect "constant evolution" from ryg/Farbrausch (2003) (https://www.pouet.net/prod.php?which=8697). In the zooming circles routine, the distance and the angle are removed/ignored, so that the 32b category could be reached.<br />
<br />
===Scrolling tilted plane===<br />
[[File:Floorcast variation.png|thumb|Floorcast, 32 bytes version, variation]]<br />
The scrolling tilted plane is one of my own releases, "floorcast 32b" from 2018. Floorcasts have its own history in sizecoding, the effect was brought down stepwise from 256 bytes to 32 bytes. The versions differ in the number of planes, some show two planes, others only one. For the "floorcast 32b" release I specifically decided against the "XOR" texture, whereas in "Memories" I used it again, but disguised with another final touch of "AND".<br />
* "rain_storm version 2008 - 256 bytes - https://www.pouet.net/prod.php?which=50479<br />
* "org_100h" version 2008 - 128 bytes - https://www.pouet.net/prod.php?which=63271<br />
* "Baudsurfer" version 2013 - 86 bytes - https://www.pouet.net/prod.php?which=61070<br />
* "Baudsurfer" version 2014 - 64 bytes - https://www.pouet.net/prod.php?which=63305<br />
* "HellMood" version 2018 - 32 bytes - https://www.pouet.net/prod.php?which=77774<br />
<br />
===Parallax checkerboards===<br />
[[File:Projektbeschreibung.gif|frame|Projektbeschreibung, 32 bytes]]<br />
The parallax checkerboards have been released by me as 32 bytes effect [https://www.pouet.net/prod.php?which=76788 "Projektbeschreibung"] in 2018. It was the direct attempt to bring [https://www.pouet.net/prod.php?which=28543 "Follow the light"] from "Digimind" (2006) or my own [https://www.pouet.net/prod.php?which=63293 "Lucy"] (2014) down to 32 bytes. Very helpful inspirations came from [https://www.pouet.net/prod.php?which=32281 "Rrrolas" "Paralaxa"] (32 bytes, 2007) and [https://www.pouet.net/prod.php?which=61075 "Sensenstahl"s "Byteropolis"] (32 bytes, 2013). In fact Rrrolas rendering approach was already close to the final solution, the code was modified to correct the planes arrangement, change the shape from triangles to checkerboards and improve on the colors. In "Memories", the coloring of "Digimind"s version was used. Also, the effect has been modified to reduce shearing as much as possible.<br />
<br />
===Sierpinski rotozoomer===<br />
[[File:Colpinski.png|thumb|colpinski, 16 bytes]]<br />
[[File:Rotastic.png|thumb|left|rotastic, 32 bytes]]<br />
This consists of two effects, the rotozoomer and the sierpinski effect as a texture. The sierpinski effect is based on my own "Colpinski 16b" from 2013 (https://www.pouet.net/prod.php?which=62079) where "frag" of "fsqrt" helped pushing the limits. There is no real predecessor to that, because it is known that this effect can be done directly by combining X and Y, instead of an Iterated Function System or Cellular Automaton approach. The rotozoomer was released as 32b intro "rotastic" by me in 2017 (https://www.pouet.net/prod.php?which=70112), it is based on the ideas of "ryg" of "Farbrausch" (51 bytes, 2002, https://www.pouet.net/prod.php?which=5015) and "Gargaj" of "Conspiracy" (49 bytes, 2002, https://www.pouet.net/prod.php?which=6088).<br />
<br />
===Raycast bent tunnel===<br />
[[File:Intoanew.gif|frame|Into a new era, 64 bytes version]]<br />
The raycast bent tunnel is a tweaked version of my own 64 bytes "Into a new era" release (2018, https://www.pouet.net/prod.php?which=78044). The custom colors were exchanged for colors of the standard palette, the geometry was tweaked a bit and changed accordingly, so that no depth effect was used for the calculation of texture values. The standalone version of this effect has a size of 50 bytes. The two main inspirations for the 64 bytes version were two 128 byte intros, "Spongy" by "TBC" (2009, https://www.pouet.net/prod.php?which=53871) and "Wolf128" from "Baudsurfer" (2014, https://www.pouet.net/prod.php?which=63518), while the particular algorithm was developed independently.<br />
<br />
===Ocean night to day===<br />
[[File:Ocean.png|thumb|Ocean, 64b version]]<br />
The ocean effect is based on my own 64 bytes release "Ocean" from 2016 (https://www.pouet.net/prod.php?which=66857). The custom color generation and music generation have been stripped from the original, both were not compatible with the main framework of "Memories" without consuming a LOT of extra space. The special effect of "dawn" happens as a byproduct of the general framework approach. This will be explained in the next chapter.<br />
<br />
===Fading effect===<br />
The transition between two effects is an effect itself, which has no real predecessor. It is rather an idea that evolved over the years, and might have been done by many others in similar ways. In a nutshell, while calculating a frame, each pixel position is randomized and the time - which determines the effect to use - is offset by this randomization value which is downscaled before. This allows the usage of the standard VGA palette (picture, source) rather than creating custom colors for real crossfading and thus saves space.<br />
<br />
==Tiny megademo framework==<br />
The tiny megademo framework is based on ideas of [https://www.pouet.net/prod.php?which=68020 "Quatro"], done by "Sensenstahl" and me. For putting many tiny effects together into a "megademo", they have all to use the same approach and they have to be mostly free of assumptions (contents of memory and registers). Also they have to use the same timing values and be tuned to a shared timing in general. Preparing the single effects to be includeable in a framework like this took quite some time and required a lot of extra space initially. It has to be noted, that - sadly - some of my most impressive effects (judged by audience reactions and views on social media) could not be included at all, because the approaches were not combineable without a HUGE overhead. Once all effects were streamlined, I could think about "factoring" out common calculations which repeatedly occur, resulting in a few saved bytes. The framework does the following:<br />
* Setting up screen mode to 320 x 200 pixels in 256 colors<br />
* Initializing a pointer to the screen<br />
* Installing a callback for timing and music<br />
* Mainloop<br />
** Calculation of X and Y from the current screen location<br />
*** with the [http://www.sizecoding.org/wiki/General_Coding_Tricks#Obtaining_X_and_Y_without_DIV_.28The_Rrrola_Trick.29 "Rrrola Trick"]<br />
** P = Pseudorandom value from screen location<br />
** Offset current time T by a scaled value P'<br />
** Selection of effect number N according to T'<br />
** Execution of effect N (X,Y) for current pixel<br />
*** On entrance, XY is in register DX (DL,DH)<br />
*** On entrance, the current time is in register BP<br />
*** On exit, the pixel color is expected in AL<br />
** Advance to next pixel<br />
** Triple diagonal interlacing for smoothing<br />
** Repeat until frame is complete<br />
* Set timer to a tempo of ~35 FPS<br />
* Check keyboard for ESC<br />
** Quit on ESC, otherwise continue<br />
=== Code of framework===<br />
<syntaxhighlight lang="nasm"><br />
org 100h<br />
s:<br />
mov al,0x13 ; set AL to mode 320*200 in 256 colors<br />
int 0x10 ; call BIOS to set mode<br />
xchg bp,ax ; set timing value to 0x13 <br />
push 0xa000-10 ; write the screen adress to register ES<br />
pop es ; works in conjunction with Rrrola trick<br />
mov ax,0x251c ; parameter for changing timer interrupt<br />
mov dl,timer ; adress of timer routine, assume DH=1<br />
int 0x21 ; install timer routine<br />
top:<br />
mov ax,0xcccd ; load magic Rrrola constant<br />
mul di ; transform screen pointer to X, Y<br />
add al,ah ; use transformation garbage as<br />
xor ah,ah ; pseudorandom value and clear AH<br />
add ax,bp ; add time value to random value<br />
shr ax,9 ; divide by 512 (basically the speed)<br />
and al,15 ; filter effect number<br />
xchg bx,ax ; move effect number to BX<br />
mov bh,1 ; reset BH to align with start of code<br />
mov bl,[byte bx+table] ; read the effect address from the table<br />
call bx ; call the effect<br />
stosb ; write the return value and advance<br />
inc di ; triple interlace trick for after<br />
inc di ; effect and smoothing the animation<br />
jnz top ; repeat until the frame is complete<br />
mov al,tempo ; set AL to divider for timer<br />
out 40h,al ; set timing (dual pass)<br />
in al,0x60 ; read keyboard<br />
dec al ; quit on ESC<br />
jnz top ; otherwise repeat loop<br />
sounds: db 0xc3, 11, 0x93; 0xc3 is MIDI/RET; fx2-s is used as volume<br />
table: db fx2-s,fx1-s,fx0-s,fx3-s,fx4-s,fx5-s,fx6-s,sounds-s,stop-s<br />
<br />
</syntaxhighlight><br />
<br />
==Explanation of effects==<br />
===Array of chessboards===<br />
[[File:Array of chessboards.png|thumb|array of chessboards]]<br />
The most simple effect and a good one to start with. After offsetting the row by the time, the classic XOR pattern is applied. To achieve the impression of a grid of chessboards, all bits but two are set in the color. The real trick lies in shifting to a "good" palette spot. The portion of the frame which is perceived as black, isn't actually black, it's the dark part of the [https://en.wikipedia.org/wiki/Video_Graphics_Array#Color_palette standard VGA palette]. Shifting that way also makes the dark and light square look like they are from an ancient old chessboard.<br />
<syntaxhighlight lang="nasm"> <br />
xchg dx,ax ; get XY into AX<br />
sub ax,bp ; subtract time from row<br />
xor al,ah ; XOR pattern (x xor y)<br />
or al,0xDB ; pattern for array of boards<br />
add al,13h ; shift to good palette spot<br />
</syntaxhighlight><br />
<br />
===Zooming circles===<br />
[[File:Zooming circles.png|thumb|zooming circles]]<br />
The distance D of a point (X,Y) to the center (0,0) is sqrt(X²+Y²). The framework already makes sure the DL contains a centered X coordinate, DH=Y has to be centered though. Performing a square root on x86 requires quite a bit of code, but this actually can be omitted. With a clever selection of colors, the impression of zooming circles without square root is quite decent.<br />
<syntaxhighlight lang="nasm"> <br />
mov al,dh ; get Y in AL<br />
sub al,100 ; align Y vertically<br />
imul al ; AL = Y²<br />
xchg dx,ax ; Y²/256 in DH, X in AL<br />
imul al ; AL = X²<br />
add dh,ah ; DH = (X² + Y²)/256<br />
mov al,dh ; AL = (X² + Y²)/256<br />
add ax,bp ; offset color by time<br />
and al,8+16 ; select special rings<br />
</syntaxhighlight><br />
<br />
===Scrolling tilted plane===<br />
[[File:Tiltplane.png|thumb|scrolling tilted plane]]<br />
This effect is achieved as follow. First, a large constant is divided by the row number Y, to simulate distance. Then the resulting value is used twice, a) scaled by a centered X value and b) offset by current time. These results are then combined with the XOR pattern of which a special pattern is selected.<br />
<syntaxhighlight lang="nasm"> <br />
mov ax,0x1329 ; initialize with constant<br />
add dh,al ; preventing divide overflow<br />
div dh ; reverse divide AL = C/Y'<br />
xchg dx,ax ; DL = C/Y', AL = X<br />
imul dl ; AH = CX/Y'<br />
sub dx,bp ; DL = C/Y'-T <br />
xor ah,dl ; AH = (CX/Y') ^ (C/Y'-T)<br />
mov al,ah ; move to AL<br />
and al,4+8+16 ; select special pattern<br />
</syntaxhighlight><br />
<br />
===Parallax checkerboards===<br />
[[File:Paralax.png|thumb|parallax checker boards]]<br />
This is kind of raycasting with dynamic geometry. Each object plane is split horizontally because of the signed 16 bit multiplication and vertically because of an implicit logic column number operation. Additionally, implicit shearing is applied, so that the edges of the resulting grid (4 "solid" patches, 4 "transparent" patches, alternating) connect. If the ray hits one of the solid patches, the color becomes the iteration number (+palette offset into grayscale), if not, the plane is shifted by the screenpointer and the process is repeated until the maximum iteration number is reached.<br />
<syntaxhighlight lang="nasm"><br />
mov cx,bp ; set inital point to time<br />
mov bx,-16 ; limit to 16 iterations<br />
fx3L:<br />
add cx,di ; offset point by screenpointer<br />
mov ax,819 ; magic, related to Rrrola constant<br />
imul cx ; get X',Y' in DX<br />
ror dx,1 ; set carry flag on "hit"<br />
inc bx ; increment iteration count<br />
ja fx3L ; loop until "hit" or "iter=max"<br />
lea ax,[bx+31] ; map value to standard gray scale<br />
</syntaxhighlight><br />
<br />
===Sierpinski rotozoomer===<br />
[[File:Sierpinski rotozoomer.png|thumb|sierpinski rotozoomer]]<br />
[[File:1 div cos(atan(x)).png|thumb|1/cos(atan(x)), made with www.google.com]]<br />
Rotozooming normally requires trigonometric functions, or good estimations of them. Let's look at the [https://en.wikipedia.org/wiki/Rotation_matrix#In_two_dimensions usual 2D rotation equation] and interpret it in a special way to get rid of the trigonometry in the calculation:<br />
<syntaxhighlight lang="nasm"><br />
x' = x*cos(a) - y*sin(a)<br />
y' = x*sin(a) + y*cos(a)<br />
</syntaxhighlight><br />
If we add zooming to it, it looks like this:<br />
<syntaxhighlight lang="nasm"><br />
x' = z * (x*cos(a) - y*sin(a))<br />
y' = z * (x*sin(a) + y*cos(a))<br />
</syntaxhighlight><br />
Now let's assume, we don't specify z ourself, and factor out cos(a):<br />
<syntaxhighlight lang="nasm"><br />
x' = cos(a) * (x - y*tan(a))<br />
y' = cos(a) * (x*tan(a) + y)<br />
</syntaxhighlight><br />
Finally, let's substitute tan(a) with our time variable, since the tangens function rises to infinity the closer we get to +-90°:<br />
<syntaxhighlight lang="nasm"><br />
x' = cos(atan(T)) * (x - y*T)<br />
y' = cos(atan(T)) * (x*T + y)<br />
</syntaxhighlight><br />
As long as we don't care about specifying the zoomfactor and don't directly control the angle, we can now rotate between -90° and +90° without using trigonometric functions. As a byproduct, the zoomfactor is now tied to the time T. The function for the zoomfactor is shown in the image, it results in zooming from infinitely small to one (original) and back to infinitely small. Quite a few bytes have been spent to beautify the effect, correct the timing offset, speeding up the animation, enlarging the sierpinski pixels and have decent colors, but I thought it's worth the space.<br />
<syntaxhighlight lang="nasm"><br />
lea cx,[bp-2048]; center time to pass zero<br />
sal cx,3 ; speed up by factor 8!<br />
movzx ax,dh ; get X into AL<br />
movsx dx,dl ; get Y int DL<br />
mov bx,ax ; save X in BX<br />
imul bx,cx ; BX = X*T<br />
add bh,dl ; BH = X*T/256+Y<br />
imul dx,cx ; DX = Y*T<br />
sub al,dh ; AL = X-Y*T/256<br />
and al,bh ; AL = (X-Y*T/256)&(X*T/256+Y)<br />
and al,252 ; thicker sierpinski<br />
salc ; set pixel value to black<br />
jnz fx4q ; leave black if not sierpinski<br />
mov al,0x2A ; otherwise: a nice orange<br />
fx4q:<br />
</syntaxhighlight><br />
<br />
===Raycast bent tunnel===<br />
[[File:Raycastbenttunnel.png|thumb|raycast bent tunnel]]<br />
This is a variation of a the [https://www.pouet.net/prod.php?which=78044 raycast tunnel "into a new era"] (see also above). A [https://www.reddit.com/r/tinycode/comments/de7dj9/moving_cubes_with_pathtracing_lighting_and/ detailed explanation] has been posted to reddit for the [https://www.pouet.net/prod.php?which=83204 similar production "Essence"]. The custom colors have been removed, the bend direction is different, and the geometry is a bit more closed, to increase the performance on slower computers and DosBox.<br />
<syntaxhighlight lang="nasm"><br />
mov cl,-9 ; start with depth 9 (moves backwards)<br />
fx5L: <br />
push dx ; save DX, destroyed inside the loop<br />
mov al,dh ; Get Y into AL<br />
sub al,100 ; Centering Y has to be done "manually".<br />
imul cl ; Multiply AL=Y by the current distance, to get a projection(1)<br />
xchg ax,dx ; Get X into AL, while saving the result in DX (DH)<br />
add al,cl ; add distance to projection, (bend to the right)<br />
imul cl ; Multiply AL=X by the current distance, to get a projection(2)<br />
mov al,dh ; Get projection(1) in AL<br />
xor al,ah ; combine with projection(2)<br />
add al,4 ; center the walls around 0<br />
test al,-8 ; check if the wall is hit<br />
pop dx ; restore DX<br />
loopz fx5L ; repeat until "hit" or "iter=max"<br />
sub cx,bp ; offset depth by time<br />
xor al,cl ; XOR pattern for texture <br />
aam 6 ; irregular pattern with MOD 6<br />
add al,20 ; offset into grayscale palette<br />
</syntaxhighlight><br />
<br />
===Ocean night to day===<br />
[[File:Oceannight.png|thumb|left|oceannight]]<br />
[[File:Oceanday.png|thumb|right|oceanday]]<br />
The ocean effect is a wonderful example for a [https://en.wikipedia.org/wiki/Bob_Ross#Style "happy accident"]. If you load a value into the FPU as integer, and store it as floating point value, and interpret it as integer again, you get a cool pattern. Combine that with a reverse divide and you get a neat wavy effect. This works in tight conjunction with the register DX, which has a sign flip at the position we want, so we can easily separate sky and sea. The final touch is the coloring though. By design of the global framework, the value of AL is determined on function entrance, it contains the effect adress. With a bit of code shuffling, the color of the sky came "for free", without using instructions, as well as the "fading to day"-color, which is also an effect adress. And that is no happy accident. In other versions than the original submitted DosBox version, the sky color can vary for that reason.<br />
<syntaxhighlight lang="nasm"><br />
sub dh,120 ; check if pixel is in the sky<br />
js fx6q ; quit if that's the case<br />
mov [bx+si],dx ; move XY to a memory location<br />
fild word [bx+si] ; read memory location as integer<br />
fidivr dword [bx+si]; reverse divide by constant<br />
fstp dword [bx+si-1]; store result as floating point<br />
mov ax,[bx+si] ; get the result into AX<br />
add ax,bp ; modify color by time<br />
and al,128 ; threshold into two bands<br />
dec ax ; beautify colors to blue/black<br />
</syntaxhighlight><br />
<br />
===Fading effect===<br />
For convenience, the fading effect from the global framework, isolated. Basically it is generating a pseudorandom value from the screenpointer, then offsetting the time by a scaled value of it, then calling the destined effect.<br />
<syntaxhighlight lang="nasm"><br />
mov ax,0xcccd ; load magic Rrrola constant<br />
mul di ; transform screen pointer to X, Y<br />
add al,ah ; use transformation garbage as<br />
xor ah,ah ; pseudorandom value and clear AH<br />
add ax,bp ; add time value to random value<br />
shr ax,9 ; divide by 512 (basically the speed)<br />
and al,15 ; filter effect number<br />
xchg bx,ax ; move effect number to BX<br />
mov bh,1 ; reset BH to align with start of code<br />
mov bl,[byte bx+table] ; read the effect address from the table<br />
</syntaxhighlight><br />
<br />
==MIDI music part==<br />
This code section increments the timing value and produces sound. By chosing channel 3 we can reuse the "change channel instrument" instruction again as "RET". By shifting around the code of the effects, a reasonable value for the volume could be created and thus save another byte. Be aware that this code sequence only works if the MIDI device is already in UART mode, otherwise three more bytes have to be spent. Quite a few people from the audience and sceners told my afterwards that the melody sounds a bit like [https://en.wikipedia.org/wiki/Incantations_(album) Incantations from Mike Oldfield], but it is rather created by a simple approach. Starting with [https://www.pouet.net/prod.php?which=65604 "Hypnoteye" in 2015], I experimented with procedural MIDI and eventually released a [https://www.pouet.net/prod.php?which=66313 small MIDI framework (64 bytes)]. The basic principle is to jump through tonal space with fixed steps, and map high values back to lower ones (by modulo). With simple combinations of stepwidth and modvalue, interesting effects can be produced. For example stepwidth=3 for reduced minor chords which are never resolved, likewise stepwidth=4, or stepwidth=6 for tritonal effects. With a well chosen modvalue, these patterns can create progressions. I did not do an appropriate theoretical analysis yet, rather explored the tonal space and noted when something sounded interesting.<br />
<syntaxhighlight lang="nasm"><br />
sounds: db 0xc3, 11, 0x93, fx2-s<br />
...<br />
inc bp ; increment timing value<br />
test bp, 7 ; play a note every 8th step<br />
jnz nomuse ; quit if in between<br />
mov dx,0x330 ; port number for MIDI<br />
mov si,sounds ; adress for sound data<br />
outsb ; change instrument of channel 3<br />
outsb ; to vibraphone<br />
outsb ; play a note on channel 3<br />
imul ax,bp,-19*32*4 ; the magic melody constant<br />
shr ax,10 ; scale down and implicit "and 63"<br />
add al,22 ; pitch base is 22<br />
out dx,al ; play THIS note on channel 3<br />
outsb ; play it with THIS volume<br />
</syntaxhighlight><br />
==Further tweaks==<br />
To be done ;)<br />
==Differences between dos versions==<br />
To be done ;)<br />
==External Links==<br />
*[https://en.wikipedia.org/wiki/Demoscene Demoscene]<br />
*[https://www.youtube.com/watch?v=Imquk_3oFf4 Youtube Capture]<br />
*[https://www.twitch.tv/videos/589179638?t=7h02m40s Twitch Live Reactions]<br />
*[https://www.pouet.net/prod.php?which=85227 Production Page on Pouet]<br />
*[https://www.reddit.com/r/programming/comments/g1qhf8/memories_256_byte_msdos_intro/ Reddit/Programming Thread I]<br />
*[https://www.reddit.com/r/programming/comments/g4ewh7/how_its_made_memories_a_256_byte_dos_intro_with_8/ Reddit/Programming Thread II]<br />
*[https://github.com/cesarmiquel/memories-256b-msdos-intro Analysis by "cesarmiquel"]<br />
*[https://hackaday.com/2020/04/21/a-jaw-dropping-demo-in-only-256-bytes/ Hackaday Article]<br />
<br />
==Original release code (complete)==<br />
<syntaxhighlight lang="nasm"> <br />
; "memories" by HellMood/DESiRE<br />
; the tiny megademo, 256 byte msdos intro<br />
; shown in April 2020 @ REVISION<br />
;<br />
; (= WILL BE COMMENTED IN DETAIL LATER =)<br />
;<br />
; create : nasm.exe memories.asm -fbin -o memories.com<br />
; CHOOSE YOUR TARGET PLATFORM (compo version is dosbox)<br />
; be sure to use the dosbox.conf from this archive!<br />
; only ONE of the defines should be active!<br />
%define dosbox ; size : 256 bytes<br />
;%define freedos ; size : 230 bytes<br />
;%define winxpdos ; size : 263 bytes<br />
<br />
; DON'T TOUCH THESE UNLESS YOU KNOW WHAT YOU'RE DOING<br />
%ifdef winxpdos<br />
%define music<br />
%define switch_uart<br />
%define safe_dx<br />
%define safe_segment<br />
%endif<br />
%ifdef freedos<br />
%define safe_dx<br />
%endif<br />
%ifdef dosbox<br />
%define music<br />
;%define safe_dx ; sometimes needed<br />
%endif<br />
<br />
; GLOBAL PARAMETERS, TUNE WITH CARE!<br />
%define volume 127 ; not used on dosbox (optimization)<br />
%define instrument 11<br />
%define scale_mod -19*32*4; <br />
%define time_mask 7<br />
%define targetFPS 35<br />
%define tempo 1193182/256/targetFPS <br />
%define sierp_color 0x2A<br />
%define tunnel_base_color 20<br />
%define tunnel_pattern 6<br />
%define tilt_plate_pattern 4+8+16<br />
%define circles_pattern 8+16<br />
<br />
org 100h<br />
s:<br />
%ifdef freedos<br />
mov fs,ax<br />
mov [fs:0x46c],ax<br />
%endif<br />
mov al,0x13<br />
int 0x10 <br />
xchg bp,ax<br />
push 0xa000-10<br />
pop es<br />
%ifndef freedos<br />
mov ax,0x251c<br />
%ifdef safe_dx <br />
mov dx,timer <br />
%else ; assume DH=1, mostly true on DosBox<br />
mov dl,timer<br />
%endif<br />
int 0x21<br />
%endif<br />
top:<br />
%ifdef freedos<br />
mov bp,[fs:0x46c]<br />
%endif <br />
mov ax,0xcccd<br />
mul di<br />
add al,ah<br />
xor ah,ah<br />
add ax,bp<br />
shr ax,9<br />
and al,15<br />
xchg bx,ax<br />
mov bh,1<br />
mov bl,[byte bx+table]<br />
call bx<br />
stosb<br />
inc di<br />
inc di<br />
jnz top<br />
mov al,tempo<br />
out 40h,al<br />
in al,0x60<br />
dec al<br />
jnz top<br />
sounds:<br />
db 0xc3 ; is MIDI/RET<br />
%ifdef music<br />
db instrument,0x93<br />
%ifdef switch_uart<br />
db volume ; without switch, volume is in table<br />
db 0x3f <br />
%endif<br />
%endif<br />
table: ; first index is volume, change order with care! <br />
db fx2-s,fx1-s,fx0-s,fx3-s,fx4-s,fx5-s,fx6-s,sounds-s,stop-s<br />
stop:<br />
pop ax<br />
ret<br />
timer:<br />
%ifndef freedos<br />
%ifdef safe_segment<br />
push cs<br />
pop ds<br />
%endif<br />
inc bp<br />
%ifdef music <br />
test bp, time_mask<br />
jnz nomuse<br />
mov dx,0x330<br />
mov si,sounds<br />
outsb<br />
outsb<br />
outsb<br />
imul ax,bp,scale_mod<br />
shr ax,10<br />
add al,22<br />
out dx,al<br />
outsb<br />
%ifdef switch_uart<br />
inc dx<br />
outsb<br />
%endif<br />
%endif<br />
nomuse:<br />
iret<br />
%endif <br />
fx0: ; tilted plane, scrolling<br />
mov ax,0x1329<br />
add dh,al<br />
div dh<br />
xchg dx,ax<br />
imul dl<br />
sub dx,bp<br />
xor ah,dl<br />
mov al,ah<br />
and al,tilt_plate_pattern<br />
ret<br />
fx2: ; board of chessboards<br />
xchg dx,ax<br />
sub ax,bp<br />
xor al,ah<br />
or al,0xDB<br />
add al,13h<br />
ret<br />
fx1: ; circles, zooming<br />
mov al,dh<br />
sub al,100<br />
imul al<br />
xchg dx,ax<br />
imul al<br />
add dh,ah<br />
mov al,dh<br />
add ax,bp<br />
and al,circles_pattern<br />
ret<br />
fx3: ; parallax checkerboards<br />
mov cx,bp<br />
mov bx,-16<br />
fx3L:<br />
add cx,di<br />
mov ax,819<br />
imul cx <br />
ror dx,1 <br />
inc bx <br />
ja fx3L<br />
lea ax,[bx+31] <br />
ret<br />
fx4: ; sierpinski rotozoomer <br />
lea cx,[bp-2048]<br />
sal cx,3<br />
movzx ax,dh<br />
movsx dx,dl<br />
mov bx,ax<br />
imul bx,cx<br />
add bh,dl<br />
imul dx,cx<br />
sub al,dh<br />
and al,bh<br />
and al,0b11111100<br />
salc ; VERY slow on dosbox, but ok<br />
jnz fx4q<br />
mov al,sierp_color<br />
fx4q:<br />
ret<br />
fx5: ; raycast bent tunnel<br />
mov cl,-9<br />
fx5L: <br />
push dx<br />
mov al,dh<br />
sub al,100<br />
imul cl<br />
xchg ax,dx <br />
add al,cl<br />
imul cl<br />
mov al,dh<br />
xor al,ah<br />
add al,4<br />
test al,-8<br />
pop dx<br />
loopz fx5L<br />
sub cx,bp<br />
xor al,cl<br />
aam tunnel_pattern; VERY slow on dosbox, but ok<br />
add al,tunnel_base_color<br />
ret<br />
fx6: ; ocean night / to day sky<br />
sub dh,120<br />
js fx6q<br />
mov [bx+si],dx<br />
fild word [bx+si]<br />
fidivr dword [bx+si]<br />
fstp dword [bx+si-1]<br />
mov ax,[bx+si]<br />
add ax,bp<br />
and al,128<br />
dec ax<br />
fx6q:<br />
ret<br />
</syntaxhighlight><br />
<br />
==Bonus - NFO/ASCII==<br />
<br />
<syntaxhighlight lang="nasm"> <br />
art : hammerfist<br />
∂#MW%e _d$Ng,<br />
'B, ∂b _jM@$QZb,cQ" )@<br />
,edRB$b,l@ Wk,yGR$KM&$b, ,dP" Wl ]bsd%UR8BG6&$@DSyG#ZKM&$b,<br />
,dP "T%L 'MGF "*∂R_ Tg "*4Zk,#I YP W" 7P "*∂R<br />
4M gd@ ^ ∂@ d@b dQ$#@Z@R3L_ "*GMj 'W ,gd$ d@b 9Q$#%b<br />
W#, `M Wb `* _4P `Qk *#N8L `H5 @b 'QR7YK `* _4F" Qk<br />
`6@L dML '@ ,BK 'M ∂B *b, '#L<br />
^QBb,_ _,4&M∞∂@=,_ _dGL _gQKM GL @k 'Mg,_ _dG,<br />
"*BN5W$2#MNP" "*G3WRM8&B5P"`Y@QNW3Z5P" ∂#$W8BRM3XZN87 "*GW38M%EBDW5P"`<br />
<br />
<br />
p r e s e n t s<br />
<br />
4<br />
d@,<br />
_& `Wl<br />
_,aP "#baedM$#@@K JP*"?ML<br />
,ad@$#P" ,d@NEWVB" X,aQPYb,_<br />
V@Mm,_ ,d@MW#BW' EMP" '¶R ,ngBP^fML<br />
¶M@N@y Y#BNW#M" J9" `MQ9" "MgRBq ,QBMg,<br />
VN#P` ,d@@ `WM@^ 7f ¶F` 7kY" ^G _.eQNE1.<br />
]B _G@MWN$, `P ' 4b QP ¶w@F*^ ^Qb<br />
]O@NRM#W@MNB, ; ^` j JP^ Yl<br />
J#NRNWM@#BcT"^ ,A _J _q@ `X<br />
'¶WM#B@WdY`,7 _G#YN#PM _,gG" M,<br />
*BN#WP" dK ,Q@NRMB"]9 , _,M@Q* #A<br />
"U^ V@h, iNBW#NT J' J9 s@QN" _; 'D,<br />
¶RMBv&NMQR@9 .W .K' "9` ,6BA _JL ]l<br />
Y#NE@W#NRP #[ `¶8 _d@MW#B_jW#W BN<br />
"GQ@MR#W QL_ *B _,p#NBW#NQMG@WY 3Q<br />
"Y@F ,XW@M%im,_Yb_ _,g5@#MW@QMNE@E@NRMB ,WM<br />
` _,gP*"#REM#GB@N#MQbnd@N#M@MW#R8QSB^'WQERM@ ;4NB,<br />
,GYKL ¶E#B8R8QSB@M@#BM#W@MNB"`_ , "^` N ,dW@Ql<br />
_Q`'W`*t '¶@GS#MBQ#E@W#NQBW[ 'LvQ_ ,K _dNABGM#N<br />
,F ' `^WAB@QGE9*"9^*@L jP7FY, ¶h,_.jWM#BR#GBM,<br />
J; , _ ' ' "LL YxE#B8R8QSBNW@W;<br />
AP _,Ag6^ _ J ¶A `"Q#M@MW#R8E#P<br />
j@ `"XQW[ 'LvK,_ 'L_,/ @t Y#NE@WNR"<br />
:M/ 9^*@L jP7F" _PYKL _,A; ¶RSNQ"<br />
dKL ' ` ' "L "`'W`*t `"XQb `W^<br />
Q`8t 'L_,/ , _ ' 9^Q<br />
,W _PYKL _,Ag6^ ' W, _ ,#N&<br />
!N _ J "`'W`*t `"XQW[ _ J N!_JG9^RwQ' *t<br />
`W, 'LvK,_ ' _gGB8@L _ 'LvK,_ ,WgB' V 7L<br />
_.,gm&@B&wBZF" j@'` "WL _gML jZd7Yb lN" dBWl<br />
,g&QB*"^` `"*G@g, .gR&k,_ ,N" '@QF ¶k;gMF *QvQ jQ, ,@N@B#,<br />
.eQF*` `Yb@" "*6Qg,gF , 7 XMN" 'MNB, ^¶QWSER@N;<br />
,gP" qy, W' ^Q' &L ,g@W' `QMEL `"WBNWP<br />
g7 ¶9 ,X M? 9" _q8MSK ¶EMt *@K<br />
Vh _,m#L _AH le ,GBDNE9^A, *@F NMg<br />
¶L,qQ@ND _.m@Bl We ,gM@B8#Q' ¶h_ lWE,<br />
W9NHW@` JWM#B@] @e 4WR@NGF^ 'QL dRWl<br />
VMd* "@BE@PM 'N *UP" VW, JRSB;<br />
,@F j `¶WK W, ¶t XNt _A@E#N<br />
_JP ,6& "GLdM XD, _.g8NMA@k,_ _,gG#NMGR;<br />
"Z .JRER 'VMi _jNB#W&_ _,j@E@W#Nl ¶MBGMNQGNQMG@QBW9<br />
¶h ,G@NRMBl `" ,d#R@M$F ¶Mg,_.gp&@@NEWVBWBMG *QMN8R8SBN$E@WF<br />
Vb dW#R8QSRb, *YM@EQ,_ 'MENBW#NQMG#B@R@MW#l "BM@QNENRQG'<br />
*WGS#MBMNEYL `^"*8M@Q@NRM#W@BWSNW@QBF"` `^*@QBF^ [HFT]<br />
^M@MW#Q9 ^Wt `^¶RQ@W8NQGP*`<br />
¶Q#@P Vk lA `"^`<br />
Y" `MA J#,<br />
*R@, ,MQl<br />
Y#Wk, GWM8L<br />
W8RQSt,_ AQ@MR#,<br />
`@M@#SB@Mbm.,_ QNBW#NW<br />
¶QB8R8SBN$WNRM@#GNtwg@NMQR@B'<br />
*MBQ#8R8QS@NE@WNBW#NQMG@NR;<br />
`WGS#MBQ#R8QSB@NE@W#NQBW9<br />
*OMW@QMNE@E@NRMW@QMB@*<br />
`^"YQW@Q#SB#NE@EGP<br />
`^"*8R@GBQF`<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Memories&diff=510Memories2020-04-21T15:57:50Z<p>HellMood: /* Sierpinski rotozoomer */</p>
<hr />
<div>==Introduction==<br />
<br />
Hello! My name is "HellMood" and this is not a usual wiki style article. It will be written from my point of view, in first person. This article is all about a tiny MS DOS program called "Memories". This program has a size of 256 bytes and won the "PC 256 byte" competition of the [https://en.wikipedia.org/wiki/Demoscene demoscene] event [https://2020.revision-party.net/ "Revision"] in 2020, as well as the public choice award. You can watch a video of the output of this program here (https://www.youtube.com/watch?v=Imquk_3oFf4) and a video including live reactions of online audience and moderators here (https://www.twitch.tv/videos/589179638?t=7h02m40s). You can download the release and comment here (https://www.pouet.net/prod.php?which=85227). This wiki article will give you an indepth analysis of the program and point out historical references as well as development steps. In case you did not notice, you are now on the "sizecoding wiki". This will help you not only to understand the interior of "Memories", but also enable you to produce something like this yourself. Feel free to look around! It is recommended that you start with the [http://www.sizecoding.org/wiki/Getting_Started basics of this wiki] first if you're new to sizecoding and/or x86 assembler. While the principles might be easy to grasp, details of the implementation might not.<br />
<br />
==Disclaimer==<br />
This article discusses the submitted DosBox version (256 bytes). The available archive contains versions for FreeDos and Windows XP DOS which, at the time of this writing, don't work on all computers. These alternative versions were included as proof of concept, to show that this program does not only run on the emulator "DosBox". For the "PC 256 bytes" competition of the event "Revision" in 2020, it was allowed to specify either "FreeDos" or "DosBox" as platform, the latter with a specific configuration. It is in fact possible to modify the DosBox version to work on FreeDos, MS DOS, WinXP and Win98 as the released alternative versions showcase, but this is not the focus of this article. Anyway, safe versions for all platform and all computers are in the works. Regarding possible further optimizations, I will discuss/explain only the submitted version although i already found several spots in the code that can be optimized. As the history of tiny effects show, almost no code is perfect, and can be brought down further in size. To avoid confusion between different versions, only the submitted version will be explained.<br />
<br />
==History of tiny effects==<br />
[[File:Pouet categories.png|thumb|Size categories on http://www.pouet.net]]<br />
Sizecoders think in size categories. For MS DOS, these categories are 256b, 128b, 64b and 32b. These are the standards of one of the biggest demoscene archives, http://www.pouet.net. There is no 16b category, although many tiny effects can be implemented in 16 bytes. Almost all of the effects of "Memories" have been coded and optimized by me before, and mostly the implementations were attempts to reduce an already existing effect in size, or do something similar in less size. Reducing the size in this case means, reducing it to the (one of the) next lower 2^N category. For example I might have seen an effect done in 33 to 64 bytes, and then reduced the size to 32 bytes or less. Almost everytime I moved an effect to a lower category, I submitted the resulting tiny program to a demoscene party which allowed remote entries in the 256b category and/or published the result to http://www.pouet.net . In this section I will introduce the effects and mention the origins and authors.<br />
===Array of chessboards===<br />
[[File:Kasparov.gif|frame|kasparov, 16 bytes]]<br />
The origin of this effect is my own "Kasparov 16b" from 2018 (https://www.pouet.net/prod.php?which=75912) apparently I "had this piece lying around, stuck at 17 bytes and not that beautiful, until i found a trick ;)" (quote from the release notes) There was a similar effect in 32 bytes before : "ew" by "headcrash" from 2003. (https://www.pouet.net/prod.php?which=17567) In this case, my design goal was to show "real" chessboards of 8x8 fields and recognizable dark and light squares, as well as the correct orientation of the single chessboards, meaning that the bottom right corner (h1) has to be a [https://en.wikibooks.org/wiki/Chess/Arranging_The_Board white square]. For "Memories", the effect had to be reimplemented with another approach of writing to the screen to work with the framework, also the scrolling direction was changed to be different from the "scrolling tilted plane" effect.<br />
<br />
===Zooming circles===<br />
The zooming circles were supposed to be entered to a demoparty as a 32 bytes intro, but I never actually did it. There is no real predecessor in 64 bytes, because in the 64b category much more complex effects are possible. The zooming circles were the result of desperately trying to reach the 32b category for a circular "tunnel" effect, for that my own record is still 52 bytes ("Neontube" - 2016)(https://www.pouet.net/prod.php?which=66808) which in turn was the optimization of the all-time-classic 64 byte effect "constant evolution" from ryg/Farbrausch (2003) (https://www.pouet.net/prod.php?which=8697). In the zooming circles routine, the distance and the angle are removed/ignored, so that the 32b category could be reached.<br />
<br />
===Scrolling tilted plane===<br />
[[File:Floorcast variation.png|thumb|Floorcast, 32 bytes version, variation]]<br />
The scrolling tilted plane is one of my own releases, "floorcast 32b" from 2018. Floorcasts have its own history in sizecoding, the effect was brought down stepwise from 256 bytes to 32 bytes. The versions differ in the number of planes, some show two planes, others only one. For the "floorcast 32b" release I specifically decided against the "XOR" texture, whereas in "Memories" I used it again, but disguised with another final touch of "AND".<br />
* "rain_storm version 2008 - 256 bytes - https://www.pouet.net/prod.php?which=50479<br />
* "org_100h" version 2008 - 128 bytes - https://www.pouet.net/prod.php?which=63271<br />
* "Baudsurfer" version 2013 - 86 bytes - https://www.pouet.net/prod.php?which=61070<br />
* "Baudsurfer" version 2014 - 64 bytes - https://www.pouet.net/prod.php?which=63305<br />
* "HellMood" version 2018 - 32 bytes - https://www.pouet.net/prod.php?which=77774<br />
<br />
===Parallax checkerboards===<br />
[[File:Projektbeschreibung.gif|frame|Projektbeschreibung, 32 bytes]]<br />
The parallax checkerboards have been released by me as 32 bytes effect [https://www.pouet.net/prod.php?which=76788 "Projektbeschreibung"] in 2018. It was the direct attempt to bring [https://www.pouet.net/prod.php?which=28543 "Follow the light"] from "Digimind" (2006) or my own [https://www.pouet.net/prod.php?which=63293 "Lucy"] (2014) down to 32 bytes. Very helpful inspirations came from [https://www.pouet.net/prod.php?which=32281 "Rrrolas" "Paralaxa"] (32 bytes, 2007) and [https://www.pouet.net/prod.php?which=61075 "Sensenstahl"s "Byteropolis"] (32 bytes, 2013). In fact Rrrolas rendering approach was already close to the final solution, the code was modified to correct the planes arrangement, change the shape from triangles to checkerboards and improve on the colors. In "Memories", the coloring of "Digimind"s version was used. Also, the effect has been modified to reduce shearing as much as possible.<br />
<br />
===Sierpinski rotozoomer===<br />
[[File:Colpinski.png|thumb|colpinski, 16 bytes]]<br />
[[File:Rotastic.png|thumb|left|rotastic, 32 bytes]]<br />
This consists of two effects, the rotozoomer and the sierpinski effect as a texture. The sierpinski effect is based on my own "Colpinski 16b" from 2013 (https://www.pouet.net/prod.php?which=62079) where "frag" of "fsqrt" helped pushing the limits. There is no real predecessor to that, because it is known that this effect can be done directly by combining X and Y, instead of an Iterated Function System or Cellular Automaton approach. The rotozoomer was released as 32b intro "rotastic" by me in 2017 (https://www.pouet.net/prod.php?which=70112), it is based on the ideas of "ryg" of "Farbrausch" (51 bytes, 2002, https://www.pouet.net/prod.php?which=5015) and "Gargaj" of "Conspiracy" (49 bytes, 2002, https://www.pouet.net/prod.php?which=6088).<br />
<br />
===Raycast bent tunnel===<br />
[[File:Intoanew.gif|frame|Into a new era, 64 bytes version]]<br />
The raycast bent tunnel is a tweaked version of my own 64 bytes "Into a new era" release (2018, https://www.pouet.net/prod.php?which=78044). The custom colors were exchanged for colors of the standard palette, the geometry was tweaked a bit and changed accordingly, so that no depth effect was used for the calculation of texture values. The standalone version of this effect has a size of 50 bytes. The two main inspirations for the 64 bytes version were two 128 byte intros, "Spongy" by "TBC" (2009, https://www.pouet.net/prod.php?which=53871) and "Wolf128" from "Baudsurfer" (2014, https://www.pouet.net/prod.php?which=63518), while the particular algorithm was developed independently.<br />
<br />
===Ocean night to day===<br />
[[File:Ocean.png|thumb|Ocean, 64b version]]<br />
The ocean effect is based on my own 64 bytes release "Ocean" from 2016 (https://www.pouet.net/prod.php?which=66857). The custom color generation and music generation have been stripped from the original, both were not compatible with the main framework of "Memories" without consuming a LOT of extra space. The special effect of "dawn" happens as a byproduct of the general framework approach. This will be explained in the next chapter.<br />
<br />
===Fading effect===<br />
The transition between two effects is an effect itself, which has no real predecessor. It is rather an idea that evolved over the years, and might have been done by many others in similar ways. In a nutshell, while calculating a frame, each pixel position is randomized and the time - which determines the effect to use - is offset by this randomization value which is downscaled before. This allows the usage of the standard VGA palette (picture, source) rather than creating custom colors for real crossfading and thus saves space.<br />
<br />
==Tiny megademo framework==<br />
The tiny megademo framework is based on ideas of [https://www.pouet.net/prod.php?which=68020 "Quatro"], done by "Sensenstahl" and me. For putting many tiny effects together into a "megademo", they have all to use the same approach and they have to be mostly free of assumptions (contents of memory and registers). Also they have to use the same timing values and be tuned to a shared timing in general. Preparing the single effects to be includeable in a framework like this took quite some time and required a lot of extra space initially. It has to be noted, that - sadly - some of my most impressive effects (judged by audience reactions and views on social media) could not be included at all, because the approaches were not combineable without a HUGE overhead. Once all effects were streamlined, I could think about "factoring" out common calculations which repeatedly occur, resulting in a few saved bytes. The framework does the following:<br />
* Setting up screen mode to 320 x 200 pixels in 256 colors<br />
* Initializing a pointer to the screen<br />
* Installing a callback for timing and music<br />
* Mainloop<br />
** Calculation of X and Y from the current screen location<br />
*** with the [http://www.sizecoding.org/wiki/General_Coding_Tricks#Obtaining_X_and_Y_without_DIV_.28The_Rrrola_Trick.29 "Rrrola Trick"]<br />
** P = Pseudorandom value from screen location<br />
** Offset current time T by a scaled value P'<br />
** Selection of effect number N according to T'<br />
** Execution of effect N (X,Y) for current pixel<br />
*** On entrance, XY is in register DX (DL,DH)<br />
*** On entrance, the current time is in register BP<br />
*** On exit, the pixel color is expected in AL<br />
** Advance to next pixel<br />
** Triple diagonal interlacing for smoothing<br />
** Repeat until frame is complete<br />
* Set timer to a tempo of ~35 FPS<br />
* Check keyboard for ESC<br />
** Quit on ESC, otherwise continue<br />
=== Code of framework===<br />
<syntaxhighlight lang="nasm"><br />
org 100h<br />
s:<br />
mov al,0x13 ; set AL to mode 320*200 in 256 colors<br />
int 0x10 ; call BIOS to set mode<br />
xchg bp,ax ; set timing value to 0x13 <br />
push 0xa000-10 ; write the screen adress to register ES<br />
pop es ; works in conjunction with Rrrola trick<br />
mov ax,0x251c ; parameter for changing timer interrupt<br />
mov dl,timer ; adress of timer routine, assume DH=1<br />
int 0x21 ; install timer routine<br />
top:<br />
mov ax,0xcccd ; load magic Rrrola constant<br />
mul di ; transform screen pointer to X, Y<br />
add al,ah ; use transformation garbage as<br />
xor ah,ah ; pseudorandom value and clear AH<br />
add ax,bp ; add time value to random value<br />
shr ax,9 ; divide by 512 (basically the speed)<br />
and al,15 ; filter effect number<br />
xchg bx,ax ; move effect number to BX<br />
mov bh,1 ; reset BH to align with start of code<br />
mov bl,[byte bx+table] ; read the effect address from the table<br />
call bx ; call the effect<br />
stosb ; write the return value and advance<br />
inc di ; triple interlace trick for after<br />
inc di ; effect and smoothing the animation<br />
jnz top ; repeat until the frame is complete<br />
mov al,tempo ; set AL to divider for timer<br />
out 40h,al ; set timing (dual pass)<br />
in al,0x60 ; read keyboard<br />
dec al ; quit on ESC<br />
jnz top ; otherwise repeat loop<br />
sounds: db 0xc3, 11, 0x93; 0xc3 is MIDI/RET; fx2-s is used as volume<br />
table: db fx2-s,fx1-s,fx0-s,fx3-s,fx4-s,fx5-s,fx6-s,sounds-s,stop-s<br />
<br />
</syntaxhighlight><br />
<br />
==Explanation of effects==<br />
===Array of chessboards===<br />
[[File:Array of chessboards.png|thumb|array of chessboards]]<br />
The most simple effect and a good one to start with. After offsetting the row by the time, the classic XOR pattern is applied. To achieve the impression of a grid of chessboards, all bits but two are set in the color. The real trick lies in shifting to a "good" palette spot. The portion of the frame which is perceived as black, isn't actually black, it's the dark part of the [https://en.wikipedia.org/wiki/Video_Graphics_Array#Color_palette standard VGA palette]. Shifting that way also makes the dark and light square look like they are from an ancient old chessboard.<br />
<syntaxhighlight lang="nasm"> <br />
xchg dx,ax ; get XY into AX<br />
sub ax,bp ; subtract time from row<br />
xor al,ah ; XOR pattern (x xor y)<br />
or al,0xDB ; pattern for array of boards<br />
add al,13h ; shift to good palette spot<br />
</syntaxhighlight><br />
<br />
===Zooming circles===<br />
[[File:Zooming circles.png|thumb|zooming circles]]<br />
The distance D of a point (X,Y) to the center (0,0) is sqrt(X²+Y²). The framework already makes sure the DL contains a centered X coordinate, DH=Y has to be centered though. Performing a square root on x86 requires quite a bit of code, but this actually can be omitted. With a clever selection of colors, the impression of zooming circles without square root is quite decent.<br />
<syntaxhighlight lang="nasm"> <br />
mov al,dh ; get Y in AL<br />
sub al,100 ; align Y vertically<br />
imul al ; AL = Y²<br />
xchg dx,ax ; Y²/256 in DH, X in AL<br />
imul al ; AL = X²<br />
add dh,ah ; DH = (X² + Y²)/256<br />
mov al,dh ; AL = (X² + Y²)/256<br />
add ax,bp ; offset color by time<br />
and al,8+16 ; select special rings<br />
</syntaxhighlight><br />
<br />
===Scrolling tilted plane===<br />
[[File:Tiltplane.png|thumb|scrolling tilted plane]]<br />
This effect is achieved as follow. First, a large constant is divided by the row number Y, to simulate distance. Then the resulting value is used twice, a) scaled by a centered X value and b) offset by current time. These results are then combined with the XOR pattern of which a special pattern is selected.<br />
<syntaxhighlight lang="nasm"> <br />
mov ax,0x1329 ; initialize with constant<br />
add dh,al ; preventing divide overflow<br />
div dh ; reverse divide AL = C/Y'<br />
xchg dx,ax ; DL = C/Y', AL = X<br />
imul dl ; AH = CX/Y'<br />
sub dx,bp ; DL = C/Y'-T <br />
xor ah,dl ; AH = (CX/Y') ^ (C/Y'-T)<br />
mov al,ah ; move to AL<br />
and al,4+8+16 ; select special pattern<br />
</syntaxhighlight><br />
<br />
===Parallax checkerboards===<br />
[[File:Paralax.png|thumb|parallax checker boards]]<br />
This is kind of raycasting with dynamic geometry. Each object plane is split horizontally because of the signed 16 bit multiplication and vertically because of an implicit logic column number operation. Additionally, implicit shearing is applied, so that the edges of the resulting grid (4 "solid" patches, 4 "transparent" patches, alternating) connect. If the ray hits one of the solid patches, the color becomes the iteration number (+palette offset into grayscale), if not, the plane is shifted by the screenpointer and the process is repeated until the maximum iteration number is reached.<br />
<syntaxhighlight lang="nasm"><br />
mov cx,bp ; set inital point to time<br />
mov bx,-16 ; limit to 16 iterations<br />
fx3L:<br />
add cx,di ; offset point by screenpointer<br />
mov ax,819 ; magic, related to Rrrola constant<br />
imul cx ; get X',Y' in DX<br />
ror dx,1 ; set carry flag on "hit"<br />
inc bx ; increment iteration count<br />
ja fx3L ; loop until "hit" or "iter=max"<br />
lea ax,[bx+31] ; map value to standard gray scale<br />
</syntaxhighlight><br />
<br />
===Sierpinski rotozoomer===<br />
[[File:Sierpinski rotozoomer.png|thumb|sierpinski rotozoomer]]<br />
[[File:1 div cos(atan(x)).png|thumb|1/cos(atan(x)), made with www.google.com]]<br />
Rotozooming normally requires trigonometric functions, or good estimations of them. Let's look at the [https://en.wikipedia.org/wiki/Rotation_matrix#In_two_dimensions usual 2D rotation equation] and interpret it in a special way to get rid of the trigonometry in the calculation:<br />
<syntaxhighlight lang="nasm"><br />
x' = x*cos(a) - y*sin(a)<br />
y' = x*sin(a) + y*cos(a)<br />
</syntaxhighlight><br />
If we add zooming to it, it looks like this:<br />
<syntaxhighlight lang="nasm"><br />
x' = z * (x*cos(a) - y*sin(a))<br />
y' = z * (x*sin(a) + y*cos(a))<br />
</syntaxhighlight><br />
Now let's assume, we don't specify z ourself, and factor out cos(a):<br />
<syntaxhighlight lang="nasm"><br />
x' = cos(a) * (x - y*tan(a))<br />
y' = cos(a) * (x*tan(a) + y)<br />
</syntaxhighlight><br />
Finally, let's substitute tan(a) with our time variable, since the tangens function rises to infinity the closer we get to +-90°:<br />
<syntaxhighlight lang="nasm"><br />
x' = cos(atan(T)) * (x - y*T)<br />
y' = cos(atan(T)) * (x*T + y)<br />
</syntaxhighlight><br />
As long as we don't care about specifying the zoomfactor and don't directly control the angle, we can now rotate between -90° and +90° without using trigonometric functions. As a byproduct, the zoomfactor is now tied to the time T. The function for the zoomfactor is shown in the image, it results in zooming from infinitely small to one (original) and back to infinitely small. Quite a few bytes have been spent to beautify the effect, correct the timing offset, speeding up the animation, enlarging the sierpinski pixels and have decent colors, but I thought it's worth the space.<br />
<syntaxhighlight lang="nasm"><br />
lea cx,[bp-2048]; center time to pass zero<br />
sal cx,3 ; speed up by factor 8!<br />
movzx ax,dh ; get X into AL<br />
movsx dx,dl ; get Y int DL<br />
mov bx,ax ; save X in BX<br />
imul bx,cx ; BX = X*T<br />
add bh,dl ; BH = X*T/256+Y<br />
imul dx,cx ; DX = Y*T<br />
sub al,dh ; AL = X-Y*T/256<br />
and al,bh ; AL = (X-Y*T/256)&(X*T/256+Y)<br />
and al,252 ; thicker sierpinski<br />
salc ; set pixel value to black<br />
jnz fx4q ; leave black if not sierpinski<br />
mov al,0x2A ; otherwise: a nice orange<br />
fx4q:<br />
</syntaxhighlight><br />
<br />
===Raycast bent tunnel===<br />
[[File:Raycastbenttunnel.png|thumb|raycast bent tunnel]]<br />
This is a variation of a the [https://www.pouet.net/prod.php?which=78044 raycast tunnel "into a new era"] (see also above). A [https://www.reddit.com/r/tinycode/comments/de7dj9/moving_cubes_with_pathtracing_lighting_and/ detailed explanation] has been posted to reddit for the [https://www.pouet.net/prod.php?which=83204 similar production "Essence"]. The custom colors have been removed, the bend direction is different, and the geometry is a bit more closed, to increase the performance on slower computers and DosBox.<br />
<syntaxhighlight lang="nasm"><br />
mov cl,-9 ; start with depth 9 (moves backwards)<br />
fx5L: <br />
push dx ; save DX, destroyed inside the loop<br />
mov al,dh ; Get Y into AL<br />
sub al,100 ; Centering Y has to be done "manually".<br />
imul cl ; Multiply AL=Y by the current distance, to get a projection(1)<br />
xchg ax,dx ; Get X into AL, while saving the result in DX (DH)<br />
add al,cl ; add distance to projection, (bend to the right)<br />
imul cl ; Multiply AL=X by the current distance, to get a projection(2)<br />
mov al,dh ; Get projection(1) in AL<br />
xor al,ah ; combine with projection(2)<br />
add al,4 ; center the walls around 0<br />
test al,-8 ; check if the wall is hit<br />
pop dx ; restore DX<br />
loopz fx5L ; repeat until "hit" or "iter=max"<br />
sub cx,bp ; offset depth by time<br />
xor al,cl ; XOR pattern for texture <br />
aam 6 ; irregular pattern with MOD 6<br />
add al,20 ; offset into grayscale palette<br />
</syntaxhighlight><br />
<br />
===Ocean night to day===<br />
[[File:Oceannight.png|thumb|left|oceannight]]<br />
[[File:Oceanday.png|thumb|right|oceanday]]<br />
The ocean effect is a wonderful example for a [https://en.wikipedia.org/wiki/Bob_Ross#Style "happy accident"]. If you load a value into the FPU as integer, and store it as floating point value, and interpret it as integer again, you get a cool pattern. Combine that with a reverse divide and you get a neat wavy effect. This works in tight conjunction with the register DX, which has a sign flip at the position we want, so we can easily separate sky and sea. The final touch is the coloring though. By design of the global framework, the value of AL is determined on function entrance, it contains the effect adress. With a bit of code shuffling, the color of the sky came "for free", without using instructions, as well as the "fading to day"-color, which is also an effect adress. And that is no happy accident. In other versions than the original submitted DosBox version, the sky color can vary for that reason.<br />
<syntaxhighlight lang="nasm"><br />
sub dh,120 ; check if pixel is in the sky<br />
js fx6q ; quit if that's the case<br />
mov [bx+si],dx ; move XY to a memory location<br />
fild word [bx+si] ; read memory location as integer<br />
fidivr dword [bx+si]; reverse divide by constant<br />
fstp dword [bx+si-1]; store result as floating point<br />
mov ax,[bx+si] ; get the result into AX<br />
add ax,bp ; modify color by time<br />
and al,128 ; threshold into two bands<br />
dec ax ; beautify colors to blue/black<br />
</syntaxhighlight><br />
<br />
===Fading effect===<br />
For convenience, the fading effect from the global framework, isolated. Basically it is generating a pseudorandom value from the screenpointer, then offsetting the time by a scaled value of it, then calling the destined effect.<br />
<syntaxhighlight lang="nasm"><br />
mov ax,0xcccd ; load magic Rrrola constant<br />
mul di ; transform screen pointer to X, Y<br />
add al,ah ; use transformation garbage as<br />
xor ah,ah ; pseudorandom value and clear AH<br />
add ax,bp ; add time value to random value<br />
shr ax,9 ; divide by 512 (basically the speed)<br />
and al,15 ; filter effect number<br />
xchg bx,ax ; move effect number to BX<br />
mov bh,1 ; reset BH to align with start of code<br />
mov bl,[byte bx+table] ; read the effect address from the table<br />
</syntaxhighlight><br />
<br />
==MIDI music part==<br />
This code section increments the timing value and produces sound. By chosing channel 3 we can reuse the "change channel instrument" instruction again as "RET". By shifting around the code of the effects, a reasonable value for the volume could be created and thus save another byte. Be aware that this code sequence only works if the MIDI device is already in UART mode, otherwise three more bytes have to be spent. Quite a few people from the audience and sceners told my afterwards that the melody sounds a bit like [https://en.wikipedia.org/wiki/Incantations_(album) Incantations from Mike Oldfield], but it is rather created by a simple approach. Starting with [https://www.pouet.net/prod.php?which=65604 "Hypnoteye" in 2015], I experimented with procedural MIDI and eventually released a [https://www.pouet.net/prod.php?which=66313 small MIDI framework (64 bytes)]. The basic principle is to jump through tonal space with fixed steps, and map high values back to lower ones (by modulo). With simple combinations of stepwidth and modvalue, interesting effects can be produced. For example stepwidth=3 for reduced minor chords which are never resolved, likewise stepwidth=4, or stepwidth=6 for tritonal effects. With a well chosen modvalue, these patterns can create progressions. I did not do an appropriate theoretical analysis yet, rather explored the tonal space and noted when something sounded interesting.<br />
<syntaxhighlight lang="nasm"><br />
sounds: db 0xc3, 11, 0x93, fx2-s<br />
...<br />
inc bp ; increment timing value<br />
test bp, 7 ; play a note every 8th step<br />
jnz nomuse ; quit if in between<br />
mov dx,0x330 ; port number for MIDI<br />
mov si,sounds ; adress for sound data<br />
outsb ; change instrument of channel 3<br />
outsb ; to vibraphone<br />
outsb ; play a note on channel 3<br />
imul ax,bp,-19*32*4 ; the magic melody constant<br />
shr ax,10 ; scale down and implicit "and 63"<br />
add al,22 ; pitch base is 22<br />
out dx,al ; play THIS note on channel 3<br />
outsb ; play it with THIS volume<br />
</syntaxhighlight><br />
==Further tweaks==<br />
To be done ;)<br />
==Differences between dos versions==<br />
To be done ;)<br />
==External Links==<br />
*[https://en.wikipedia.org/wiki/Demoscene Demoscene]<br />
*[https://www.youtube.com/watch?v=Imquk_3oFf4 Youtube Capture]<br />
*[https://www.twitch.tv/videos/589179638?t=7h02m40s Twitch Live Reactions]<br />
*[https://www.pouet.net/prod.php?which=85227 Production Page on Pouet]<br />
*[https://www.reddit.com/r/programming/comments/g1qhf8/memories_256_byte_msdos_intro/ Reddit/Programming Thread I]<br />
*[https://www.reddit.com/r/programming/comments/g4ewh7/how_its_made_memories_a_256_byte_dos_intro_with_8/ Reddit/Programming Thread II]<br />
*[https://github.com/cesarmiquel/memories-256b-msdos-intro Analysis by "cesarmiquel"]<br />
<br />
==Original release code (complete)==<br />
<syntaxhighlight lang="nasm"> <br />
; "memories" by HellMood/DESiRE<br />
; the tiny megademo, 256 byte msdos intro<br />
; shown in April 2020 @ REVISION<br />
;<br />
; (= WILL BE COMMENTED IN DETAIL LATER =)<br />
;<br />
; create : nasm.exe memories.asm -fbin -o memories.com<br />
; CHOOSE YOUR TARGET PLATFORM (compo version is dosbox)<br />
; be sure to use the dosbox.conf from this archive!<br />
; only ONE of the defines should be active!<br />
%define dosbox ; size : 256 bytes<br />
;%define freedos ; size : 230 bytes<br />
;%define winxpdos ; size : 263 bytes<br />
<br />
; DON'T TOUCH THESE UNLESS YOU KNOW WHAT YOU'RE DOING<br />
%ifdef winxpdos<br />
%define music<br />
%define switch_uart<br />
%define safe_dx<br />
%define safe_segment<br />
%endif<br />
%ifdef freedos<br />
%define safe_dx<br />
%endif<br />
%ifdef dosbox<br />
%define music<br />
;%define safe_dx ; sometimes needed<br />
%endif<br />
<br />
; GLOBAL PARAMETERS, TUNE WITH CARE!<br />
%define volume 127 ; not used on dosbox (optimization)<br />
%define instrument 11<br />
%define scale_mod -19*32*4; <br />
%define time_mask 7<br />
%define targetFPS 35<br />
%define tempo 1193182/256/targetFPS <br />
%define sierp_color 0x2A<br />
%define tunnel_base_color 20<br />
%define tunnel_pattern 6<br />
%define tilt_plate_pattern 4+8+16<br />
%define circles_pattern 8+16<br />
<br />
org 100h<br />
s:<br />
%ifdef freedos<br />
mov fs,ax<br />
mov [fs:0x46c],ax<br />
%endif<br />
mov al,0x13<br />
int 0x10 <br />
xchg bp,ax<br />
push 0xa000-10<br />
pop es<br />
%ifndef freedos<br />
mov ax,0x251c<br />
%ifdef safe_dx <br />
mov dx,timer <br />
%else ; assume DH=1, mostly true on DosBox<br />
mov dl,timer<br />
%endif<br />
int 0x21<br />
%endif<br />
top:<br />
%ifdef freedos<br />
mov bp,[fs:0x46c]<br />
%endif <br />
mov ax,0xcccd<br />
mul di<br />
add al,ah<br />
xor ah,ah<br />
add ax,bp<br />
shr ax,9<br />
and al,15<br />
xchg bx,ax<br />
mov bh,1<br />
mov bl,[byte bx+table]<br />
call bx<br />
stosb<br />
inc di<br />
inc di<br />
jnz top<br />
mov al,tempo<br />
out 40h,al<br />
in al,0x60<br />
dec al<br />
jnz top<br />
sounds:<br />
db 0xc3 ; is MIDI/RET<br />
%ifdef music<br />
db instrument,0x93<br />
%ifdef switch_uart<br />
db volume ; without switch, volume is in table<br />
db 0x3f <br />
%endif<br />
%endif<br />
table: ; first index is volume, change order with care! <br />
db fx2-s,fx1-s,fx0-s,fx3-s,fx4-s,fx5-s,fx6-s,sounds-s,stop-s<br />
stop:<br />
pop ax<br />
ret<br />
timer:<br />
%ifndef freedos<br />
%ifdef safe_segment<br />
push cs<br />
pop ds<br />
%endif<br />
inc bp<br />
%ifdef music <br />
test bp, time_mask<br />
jnz nomuse<br />
mov dx,0x330<br />
mov si,sounds<br />
outsb<br />
outsb<br />
outsb<br />
imul ax,bp,scale_mod<br />
shr ax,10<br />
add al,22<br />
out dx,al<br />
outsb<br />
%ifdef switch_uart<br />
inc dx<br />
outsb<br />
%endif<br />
%endif<br />
nomuse:<br />
iret<br />
%endif <br />
fx0: ; tilted plane, scrolling<br />
mov ax,0x1329<br />
add dh,al<br />
div dh<br />
xchg dx,ax<br />
imul dl<br />
sub dx,bp<br />
xor ah,dl<br />
mov al,ah<br />
and al,tilt_plate_pattern<br />
ret<br />
fx2: ; board of chessboards<br />
xchg dx,ax<br />
sub ax,bp<br />
xor al,ah<br />
or al,0xDB<br />
add al,13h<br />
ret<br />
fx1: ; circles, zooming<br />
mov al,dh<br />
sub al,100<br />
imul al<br />
xchg dx,ax<br />
imul al<br />
add dh,ah<br />
mov al,dh<br />
add ax,bp<br />
and al,circles_pattern<br />
ret<br />
fx3: ; parallax checkerboards<br />
mov cx,bp<br />
mov bx,-16<br />
fx3L:<br />
add cx,di<br />
mov ax,819<br />
imul cx <br />
ror dx,1 <br />
inc bx <br />
ja fx3L<br />
lea ax,[bx+31] <br />
ret<br />
fx4: ; sierpinski rotozoomer <br />
lea cx,[bp-2048]<br />
sal cx,3<br />
movzx ax,dh<br />
movsx dx,dl<br />
mov bx,ax<br />
imul bx,cx<br />
add bh,dl<br />
imul dx,cx<br />
sub al,dh<br />
and al,bh<br />
and al,0b11111100<br />
salc ; VERY slow on dosbox, but ok<br />
jnz fx4q<br />
mov al,sierp_color<br />
fx4q:<br />
ret<br />
fx5: ; raycast bent tunnel<br />
mov cl,-9<br />
fx5L: <br />
push dx<br />
mov al,dh<br />
sub al,100<br />
imul cl<br />
xchg ax,dx <br />
add al,cl<br />
imul cl<br />
mov al,dh<br />
xor al,ah<br />
add al,4<br />
test al,-8<br />
pop dx<br />
loopz fx5L<br />
sub cx,bp<br />
xor al,cl<br />
aam tunnel_pattern; VERY slow on dosbox, but ok<br />
add al,tunnel_base_color<br />
ret<br />
fx6: ; ocean night / to day sky<br />
sub dh,120<br />
js fx6q<br />
mov [bx+si],dx<br />
fild word [bx+si]<br />
fidivr dword [bx+si]<br />
fstp dword [bx+si-1]<br />
mov ax,[bx+si]<br />
add ax,bp<br />
and al,128<br />
dec ax<br />
fx6q:<br />
ret<br />
</syntaxhighlight><br />
<br />
==Bonus - NFO/ASCII==<br />
<br />
<syntaxhighlight lang="nasm"> <br />
art : hammerfist<br />
∂#MW%e _d$Ng,<br />
'B, ∂b _jM@$QZb,cQ" )@<br />
,edRB$b,l@ Wk,yGR$KM&$b, ,dP" Wl ]bsd%UR8BG6&$@DSyG#ZKM&$b,<br />
,dP "T%L 'MGF "*∂R_ Tg "*4Zk,#I YP W" 7P "*∂R<br />
4M gd@ ^ ∂@ d@b dQ$#@Z@R3L_ "*GMj 'W ,gd$ d@b 9Q$#%b<br />
W#, `M Wb `* _4P `Qk *#N8L `H5 @b 'QR7YK `* _4F" Qk<br />
`6@L dML '@ ,BK 'M ∂B *b, '#L<br />
^QBb,_ _,4&M∞∂@=,_ _dGL _gQKM GL @k 'Mg,_ _dG,<br />
"*BN5W$2#MNP" "*G3WRM8&B5P"`Y@QNW3Z5P" ∂#$W8BRM3XZN87 "*GW38M%EBDW5P"`<br />
<br />
<br />
p r e s e n t s<br />
<br />
4<br />
d@,<br />
_& `Wl<br />
_,aP "#baedM$#@@K JP*"?ML<br />
,ad@$#P" ,d@NEWVB" X,aQPYb,_<br />
V@Mm,_ ,d@MW#BW' EMP" '¶R ,ngBP^fML<br />
¶M@N@y Y#BNW#M" J9" `MQ9" "MgRBq ,QBMg,<br />
VN#P` ,d@@ `WM@^ 7f ¶F` 7kY" ^G _.eQNE1.<br />
]B _G@MWN$, `P ' 4b QP ¶w@F*^ ^Qb<br />
]O@NRM#W@MNB, ; ^` j JP^ Yl<br />
J#NRNWM@#BcT"^ ,A _J _q@ `X<br />
'¶WM#B@WdY`,7 _G#YN#PM _,gG" M,<br />
*BN#WP" dK ,Q@NRMB"]9 , _,M@Q* #A<br />
"U^ V@h, iNBW#NT J' J9 s@QN" _; 'D,<br />
¶RMBv&NMQR@9 .W .K' "9` ,6BA _JL ]l<br />
Y#NE@W#NRP #[ `¶8 _d@MW#B_jW#W BN<br />
"GQ@MR#W QL_ *B _,p#NBW#NQMG@WY 3Q<br />
"Y@F ,XW@M%im,_Yb_ _,g5@#MW@QMNE@E@NRMB ,WM<br />
` _,gP*"#REM#GB@N#MQbnd@N#M@MW#R8QSB^'WQERM@ ;4NB,<br />
,GYKL ¶E#B8R8QSB@M@#BM#W@MNB"`_ , "^` N ,dW@Ql<br />
_Q`'W`*t '¶@GS#MBQ#E@W#NQBW[ 'LvQ_ ,K _dNABGM#N<br />
,F ' `^WAB@QGE9*"9^*@L jP7FY, ¶h,_.jWM#BR#GBM,<br />
J; , _ ' ' "LL YxE#B8R8QSBNW@W;<br />
AP _,Ag6^ _ J ¶A `"Q#M@MW#R8E#P<br />
j@ `"XQW[ 'LvK,_ 'L_,/ @t Y#NE@WNR"<br />
:M/ 9^*@L jP7F" _PYKL _,A; ¶RSNQ"<br />
dKL ' ` ' "L "`'W`*t `"XQb `W^<br />
Q`8t 'L_,/ , _ ' 9^Q<br />
,W _PYKL _,Ag6^ ' W, _ ,#N&<br />
!N _ J "`'W`*t `"XQW[ _ J N!_JG9^RwQ' *t<br />
`W, 'LvK,_ ' _gGB8@L _ 'LvK,_ ,WgB' V 7L<br />
_.,gm&@B&wBZF" j@'` "WL _gML jZd7Yb lN" dBWl<br />
,g&QB*"^` `"*G@g, .gR&k,_ ,N" '@QF ¶k;gMF *QvQ jQ, ,@N@B#,<br />
.eQF*` `Yb@" "*6Qg,gF , 7 XMN" 'MNB, ^¶QWSER@N;<br />
,gP" qy, W' ^Q' &L ,g@W' `QMEL `"WBNWP<br />
g7 ¶9 ,X M? 9" _q8MSK ¶EMt *@K<br />
Vh _,m#L _AH le ,GBDNE9^A, *@F NMg<br />
¶L,qQ@ND _.m@Bl We ,gM@B8#Q' ¶h_ lWE,<br />
W9NHW@` JWM#B@] @e 4WR@NGF^ 'QL dRWl<br />
VMd* "@BE@PM 'N *UP" VW, JRSB;<br />
,@F j `¶WK W, ¶t XNt _A@E#N<br />
_JP ,6& "GLdM XD, _.g8NMA@k,_ _,gG#NMGR;<br />
"Z .JRER 'VMi _jNB#W&_ _,j@E@W#Nl ¶MBGMNQGNQMG@QBW9<br />
¶h ,G@NRMBl `" ,d#R@M$F ¶Mg,_.gp&@@NEWVBWBMG *QMN8R8SBN$E@WF<br />
Vb dW#R8QSRb, *YM@EQ,_ 'MENBW#NQMG#B@R@MW#l "BM@QNENRQG'<br />
*WGS#MBMNEYL `^"*8M@Q@NRM#W@BWSNW@QBF"` `^*@QBF^ [HFT]<br />
^M@MW#Q9 ^Wt `^¶RQ@W8NQGP*`<br />
¶Q#@P Vk lA `"^`<br />
Y" `MA J#,<br />
*R@, ,MQl<br />
Y#Wk, GWM8L<br />
W8RQSt,_ AQ@MR#,<br />
`@M@#SB@Mbm.,_ QNBW#NW<br />
¶QB8R8SBN$WNRM@#GNtwg@NMQR@B'<br />
*MBQ#8R8QS@NE@WNBW#NQMG@NR;<br />
`WGS#MBQ#R8QSB@NE@W#NQBW9<br />
*OMW@QMNE@E@NRMW@QMB@*<br />
`^"YQW@Q#SB#NE@EGP<br />
`^"*8R@GBQF`<br />
</syntaxhighlight></div>HellMoodhttp://www.sizecoding.org/index.php?title=Memories&diff=509Memories2020-04-21T15:40:28Z<p>HellMood: /* External Links */</p>
<hr />
<div>==Introduction==<br />
<br />
Hello! My name is "HellMood" and this is not a usual wiki style article. It will be written from my point of view, in first person. This article is all about a tiny MS DOS program called "Memories". This program has a size of 256 bytes and won the "PC 256 byte" competition of the [https://en.wikipedia.org/wiki/Demoscene demoscene] event [https://2020.revision-party.net/ "Revision"] in 2020, as well as the public choice award. You can watch a video of the output of this program here (https://www.youtube.com/watch?v=Imquk_3oFf4) and a video including live reactions of online audience and moderators here (https://www.twitch.tv/videos/589179638?t=7h02m40s). You can download the release and comment here (https://www.pouet.net/prod.php?which=85227). This wiki article will give you an indepth analysis of the program and point out historical references as well as development steps. In case you did not notice, you are now on the "sizecoding wiki". This will help you not only to understand the interior of "Memories", but also enable you to produce something like this yourself. Feel free to look around! It is recommended that you start with the [http://www.sizecoding.org/wiki/Getting_Started basics of this wiki] first if you're new to sizecoding and/or x86 assembler. While the principles might be easy to grasp, details of the implementation might not.<br />
<br />
==Disclaimer==<br />
This article discusses the submitted DosBox version (256 bytes). The available archive contains versions for FreeDos and Windows XP DOS which, at the time of this writing, don't work on all computers. These alternative versions were included as proof of concept, to show that this program does not only run on the emulator "DosBox". For the "PC 256 bytes" competition of the event "Revision" in 2020, it was allowed to specify either "FreeDos" or "DosBox" as platform, the latter with a specific configuration. It is in fact possible to modify the DosBox version to work on FreeDos, MS DOS, WinXP and Win98 as the released alternative versions showcase, but this is not the focus of this article. Anyway, safe versions for all platform and all computers are in the works. Regarding possible further optimizations, I will discuss/explain only the submitted version although i already found several spots in the code that can be optimized. As the history of tiny effects show, almost no code is perfect, and can be brought down further in size. To avoid confusion between different versions, only the submitted version will be explained.<br />
<br />
==History of tiny effects==<br />
[[File:Pouet categories.png|thumb|Size categories on http://www.pouet.net]]<br />
Sizecoders think in size categories. For MS DOS, these categories are 256b, 128b, 64b and 32b. These are the standards of one of the biggest demoscene archives, http://www.pouet.net. There is no 16b category, although many tiny effects can be implemented in 16 bytes. Almost all of the effects of "Memories" have been coded and optimized by me before, and mostly the implementations were attempts to reduce an already existing effect in size, or do something similar in less size. Reducing the size in this case means, reducing it to the (one of the) next lower 2^N category. For example I might have seen an effect done in 33 to 64 bytes, and then reduced the size to 32 bytes or less. Almost everytime I moved an effect to a lower category, I submitted the resulting tiny program to a demoscene party which allowed remote entries in the 256b category and/or published the result to http://www.pouet.net . In this section I will introduce the effects and mention the origins and authors.<br />
===Array of chessboards===<br />
[[File:Kasparov.gif|frame|kasparov, 16 bytes]]<br />
The origin of this effect is my own "Kasparov 16b" from 2018 (https://www.pouet.net/prod.php?which=75912) apparently I "had this piece lying around, stuck at 17 bytes and not that beautiful, until i found a trick ;)" (quote from the release notes) There was a similar effect in 32 bytes before : "ew" by "headcrash" from 2003. (https://www.pouet.net/prod.php?which=17567) In this case, my design goal was to show "real" chessboards of 8x8 fields and recognizable dark and light squares, as well as the correct orientation of the single chessboards, meaning that the bottom right corner (h1) has to be a [https://en.wikibooks.org/wiki/Chess/Arranging_The_Board white square]. For "Memories", the effect had to be reimplemented with another approach of writing to the screen to work with the framework, also the scrolling direction was changed to be different from the "scrolling tilted plane" effect.<br />
<br />
===Zooming circles===<br />
The zooming circles were supposed to be entered to a demoparty as a 32 bytes intro, but I never actually did it. There is no real predecessor in 64 bytes, because in the 64b category much more complex effects are possible. The zooming circles were the result of desperately trying to reach the 32b category for a circular "tunnel" effect, for that my own record is still 52 bytes ("Neontube" - 2016)(https://www.pouet.net/prod.php?which=66808) which in turn was the optimization of the all-time-classic 64 byte effect "constant evolution" from ryg/Farbrausch (2003) (https://www.pouet.net/prod.php?which=8697). In the zooming circles routine, the distance and the angle are removed/ignored, so that the 32b category could be reached.<br />
<br />
===Scrolling tilted plane===<br />
[[File:Floorcast variation.png|thumb|Floorcast, 32 bytes version, variation]]<br />
The scrolling tilted plane is one of my own releases, "floorcast 32b" from 2018. Floorcasts have its own history in sizecoding, the effect was brought down stepwise from 256 bytes to 32 bytes. The versions differ in the number of planes, some show two planes, others only one. For the "floorcast 32b" release I specifically decided against the "XOR" texture, whereas in "Memories" I used it again, but disguised with another final touch of "AND".<br />
* "rain_storm version 2008 - 256 bytes - https://www.pouet.net/prod.php?which=50479<br />
* "org_100h" version 2008 - 128 bytes - https://www.pouet.net/prod.php?which=63271<br />
* "Baudsurfer" version 2013 - 86 bytes - https://www.pouet.net/prod.php?which=61070<br />
* "Baudsurfer" version 2014 - 64 bytes - https://www.pouet.net/prod.php?which=63305<br />
* "HellMood" version 2018 - 32 bytes - https://www.pouet.net/prod.php?which=77774<br />
<br />
===Parallax checkerboards===<br />
[[File:Projektbeschreibung.gif|frame|Projektbeschreibung, 32 bytes]]<br />
The parallax checkerboards have been released by me as 32 bytes effect [https://www.pouet.net/prod.php?which=76788 "Projektbeschreibung"] in 2018. It was the direct attempt to bring [https://www.pouet.net/prod.php?which=28543 "Follow the light"] from "Digimind" (2006) or my own [https://www.pouet.net/prod.php?which=63293 "Lucy"] (2014) down to 32 bytes. Very helpful inspirations came from [https://www.pouet.net/prod.php?which=32281 "Rrrolas" "Paralaxa"] (32 bytes, 2007) and [https://www.pouet.net/prod.php?which=61075 "Sensenstahl"s "Byteropolis"] (32 bytes, 2013). In fact Rrrolas rendering approach was already close to the final solution, the code was modified to correct the planes arrangement, change the shape from triangles to checkerboards and improve on the colors. In "Memories", the coloring of "Digimind"s version was used. Also, the effect has been modified to reduce shearing as much as possible.<br />
<br />
===Sierpinski rotozoomer===<br />
[[File:Colpinski.png|thumb|colpinski, 16 bytes]]<br />
[[File:Rotastic.png|thumb|left|rotastic, 32 bytes]]<br />
This consists of two effects, the rotozoomer and the sierpinski effect as a texture. The sierpinski effect is based on my own "Colpinski 16b" from 2013 (https://www.pouet.net/prod.php?which=62079) where "frag" of "fsqrt" helped pushing the limits. There is no real predecessor to that, because it is known that this effect can be done directly by combining X and Y, instead of an Iterated Function System or Cellular Automaton approach. The rotozoomer was released as 32b intro "rotastic" by me in 2017 (https://www.pouet.net/prod.php?which=70112), it is based on the ideas of "ryg" of "Farbrausch" (51 bytes, 2002, https://www.pouet.net/prod.php?which=5015) and "Gargaj" of "Conspiracy" (49 bytes, 2002, https://www.pouet.net/prod.php?which=6088).<br />
<br />
===Raycast bent tunnel===<br />
[[File:Intoanew.gif|frame|Into a new era, 64 bytes version]]<br />
The raycast bent tunnel is a tweaked version of my own 64 bytes "Into a new era" release (2018, https://www.pouet.net/prod.php?which=78044). The custom colors were exchanged for colors of the standard palette, the geometry was tweaked a bit and changed accordingly, so that no depth effect was used for the calculation of texture values. The standalone version of this effect has a size of 50 bytes. The two main inspirations for the 64 bytes version were two 128 byte intros, "Spongy" by "TBC" (2009, https://www.pouet.net/prod.php?which=53871) and "Wolf128" from "Baudsurfer" (2014, https://www.pouet.net/prod.php?which=63518), while the particular algorithm was developed independently.<br />
<br />
===Ocean night to day===<br />
[[File:Ocean.png|thumb|Ocean, 64b version]]<br />
The ocean effect is based on my own 64 bytes release "Ocean" from 2016 (https://www.pouet.net/prod.php?which=66857). The custom color generation and music generation have been stripped from the original, both were not compatible with the main framework of "Memories" without consuming a LOT of extra space. The special effect of "dawn" happens as a byproduct of the general framework approach. This will be explained in the next chapter.<br />
<br />
===Fading effect===<br />
The transition between two effects is an effect itself, which has no real predecessor. It is rather an idea that evolved over the years, and might have been done by many others in similar ways. In a nutshell, while calculating a frame, each pixel position is randomized and the time - which determines the effect to use - is offset by this randomization value which is downscaled before. This allows the usage of the standard VGA palette (picture, source) rather than creating custom colors for real crossfading and thus saves space.<br />
<br />
==Tiny megademo framework==<br />
The tiny megademo framework is based on ideas of [https://www.pouet.net/prod.php?which=68020 "Quatro"], done by "Sensenstahl" and me. For putting many tiny effects together into a "megademo", they have all to use the same approach and they have to be mostly free of assumptions (contents of memory and registers). Also they have to use the same timing values and be tuned to a shared timing in general. Preparing the single effects to be includeable in a framework like this took quite some time and required a lot of extra space initially. It has to be noted, that - sadly - some of my most impressive effects (judged by audience reactions and views on social media) could not be included at all, because the approaches were not combineable without a HUGE overhead. Once all effects were streamlined, I could think about "factoring" out common calculations which repeatedly occur, resulting in a few saved bytes. The framework does the following:<br />
* Setting up screen mode to 320 x 200 pixels in 256 colors<br />
* Initializing a pointer to the screen<br />
* Installing a callback for timing and music<br />
* Mainloop<br />
** Calculation of X and Y from the current screen location<br />
*** with the [http://www.sizecoding.org/wiki/General_Coding_Tricks#Obtaining_X_and_Y_without_DIV_.28The_Rrrola_Trick.29 "Rrrola Trick"]<br />
** P = Pseudorandom value from screen location<br />
** Offset current time T by a scaled value P'<br />
** Selection of effect number N according to T'<br />
** Execution of effect N (X,Y) for current pixel<br />
*** On entrance, XY is in register DX (DL,DH)<br />
*** On entrance, the current time is in register BP<br />
*** On exit, the pixel color is expected in AL<br />
** Advance to next pixel<br />
** Triple diagonal interlacing for smoothing<br />
** Repeat until frame is complete<br />
* Set timer to a tempo of ~35 FPS<br />
* Check keyboard for ESC<br />
** Quit on ESC, otherwise continue<br />
=== Code of framework===<br />
<syntaxhighlight lang="nasm"><br />
org 100h<br />
s:<br />
mov al,0x13 ; set AL to mode 320*200 in 256 colors<br />
int 0x10 ; call BIOS to set mode<br />
xchg bp,ax ; set timing value to 0x13 <br />
push 0xa000-10 ; write the screen adress to register ES<br />
pop es ; works in conjunction with Rrrola trick<br />
mov ax,0x251c ; parameter for changing timer interrupt<br />
mov dl,timer ; adress of timer routine, assume DH=1<br />
int 0x21 ; install timer routine<br />
top:<br />
mov ax,0xcccd ; load magic Rrrola constant<br />
mul di ; transform screen pointer to X, Y<br />
add al,ah ; use transformation garbage as<br />
xor ah,ah ; pseudorandom value and clear AH<br />
add ax,bp ; add time value to random value<br />
shr ax,9 ; divide by 512 (basically the speed)<br />
and al,15 ; filter effect number<br />
xchg bx,ax ; move effect number to BX<br />
mov bh,1 ; reset BH to align with start of code<br />
mov bl,[byte bx+table] ; read the effect address from the table<br />
call bx ; call the effect<br />
stosb ; write the return value and advance<br />
inc di ; triple interlace trick for after<br />
inc di ; effect and smoothing the animation<br />
jnz top ; repeat until the frame is complete<br />
mov al,tempo ; set AL to divider for timer<br />
out 40h,al ; set timing (dual pass)<br />
in al,0x60 ; read keyboard<br />
dec al ; quit on ESC<br />
jnz top ; otherwise repeat loop<br />
sounds: db 0xc3, 11, 0x93; 0xc3 is MIDI/RET; fx2-s is used as volume<br />
table: db fx2-s,fx1-s,fx0-s,fx3-s,fx4-s,fx5-s,fx6-s,sounds-s,stop-s<br />
<br />
</syntaxhighlight><br />
<br />
==Explanation of effects==<br />
===Array of chessboards===<br />
[[File:Array of chessboards.png|thumb|array of chessboards]]<br />
The most simple effect and a good one to start with. After offsetting the row by the time, the classic XOR pattern is applied. To achieve the impression of a grid of chessboards, all bits but two are set in the color. The real trick lies in shifting to a "good" palette spot. The portion of the frame which is perceived as black, isn't actually black, it's the dark part of the [https://en.wikipedia.org/wiki/Video_Graphics_Array#Color_palette standard VGA palette]. Shifting that way also makes the dark and light square look like they are from an ancient old chessboard.<br />
<syntaxhighlight lang="nasm"> <br />
xchg dx,ax ; get XY into AX<br />
sub ax,bp ; subtract time from row<br />
xor al,ah ; XOR pattern (x xor y)<br />
or al,0xDB ; pattern for array of boards<br />
add al,13h ; shift to good palette spot<br />
</syntaxhighlight><br />
<br />
===Zooming circles===<br />
[[File:Zooming circles.png|thumb|zooming circles]]<br />
The distance D of a point (X,Y) to the center (0,0) is sqrt(X²+Y²). The framework already makes sure the DL contains a centered X coordinate, DH=Y has to be centered though. Performing a square root on x86 requires quite a bit of code, but this actually can be omitted. With a clever selection of colors, the impression of zooming circles without square root is quite decent.<br />
<syntaxhighlight lang="nasm"> <br />
mov al,dh ; get Y in AL<br />
sub al,100 ; align Y vertically<br />
imul al ; AL = Y²<br />
xchg dx,ax ; Y²/256 in DH, X in AL<br />
imul al ; AL = X²<br />
add dh,ah ; DH = (X² + Y²)/256<br />
mov al,dh ; AL = (X² + Y²)/256<br />
add ax,bp ; offset color by time<br />
and al,8+16 ; select special rings<br />
</syntaxhighlight><br />
<br />
===Scrolling tilted plane===<br />
[[File:Tiltplane.png|thumb|scrolling tilted plane]]<br />
This effect is achieved as follow. First, a large constant is divided by the row number Y, to simulate distance. Then the resulting value is used twice, a) scaled by a centered X value and b) offset by current time. These results are then combined with the XOR pattern of which a special pattern is selected.<br />
<syntaxhighlight lang="nasm"> <br />
mov ax,0x1329 ; initialize with constant<br />
add dh,al ; preventing divide overflow<br />
div dh ; reverse divide AL = C/Y'<br />
xchg dx,ax ; DL = C/Y', AL = X<br />
imul dl ; AH = CX/Y'<br />
sub dx,bp ; DL = C/Y'-T <br />
xor ah,dl ; AH = (CX/Y') ^ (C/Y'-T)<br />
mov al,ah ; move to AL<br />
and al,4+8+16 ; select special pattern<br />
</syntaxhighlight><br />
<br />
===Parallax checkerboards===<br />
[[File:Paralax.png|thumb|parallax checker boards]]<br />
This is kind of raycasting with dynamic geometry. Each object plane is split horizontally because of the signed 16 bit multiplication and vertically because of an implicit logic column number operation. Additionally, implicit shearing is applied, so that the edges of the resulting grid (4 "solid" patches, 4 "transparent" patches, alternating) connect. If the ray hits one of the solid patches, the color becomes the iteration number (+palette offset into grayscale), if not, the plane is shifted by the screenpointer and the process is repeated until the maximum iteration number is reached.<br />
<syntaxhighlight lang="nasm"><br />
mov cx,bp ; set inital point to time<br />
mov bx,-16 ; limit to 16 iterations<br />
fx3L:<br />
add cx,di ; offset point by screenpointer<br />
mov ax,819 ; magic, related to Rrrola constant<br />
imul cx ; get X',Y' in DX<br />
ror dx,1 ; set carry flag on "hit"<br />
inc bx ; increment iteration count<br />
ja fx3L ; loop until "hit" or "iter=max"<br />
lea ax,[bx+31] ; map value to standard gray scale<br />
</syntaxhighlight><br />
<br />
===Sierpinski rotozoomer===<br />
[[File:Sierpinski rotozoomer.png|thumb|sierpinski rotozoomer]]<br />
[[File:1 div cos(atan(x)).png|thumb|1/cos(atan(x)), made with www.google.com]]<br />
Rotozooming normally requires trigonometric functions, or good estimations of them. Let's look at the [https://en.wikipedia.org/wiki/Rotation_matrix#In_two_dimensions usual 2D rotation equation] and interpret it in a special way to get rid of the trigonometry in the calculation:<br />
<syntaxhighlight lang="nasm"><br />
x' = x*cos(a) - y*sin(a)<br />
y' = x*sin(a) + y*cos(a)<br />
</syntaxhighlight><br />
If we add zooming to it, it looks like this:<br />
<syntaxhighlight lang="nasm"><br />
x' = z * (x*cos(a) - y*sin(a))<br />
y' = z * (x*sin(a) + y*cos(a))<br />
</syntaxhighlight><br />
Now let's assume, we don't specify z ourself, and factor out cos(a):<br />
<syntaxhighlight lang="nasm"><br />
x' = cos(a) * (x - y*tan(a))<br />
y' = cos(a) * (x*tan(a) + y)<br />
</syntaxhighlight><br />
Finally, let's substitute tan(a) with our time variable, since the tangens function rises to infinity the closer we get to 180°:<br />
<syntaxhighlight lang="nasm"><br />
x' = cos(atan(T)) * (x - y*T)<br />
y' = cos(atan(T)) * (x*T + y)<br />
</syntaxhighlight><br />
As long as we don't care about specifying the zoomfactor and don't directly control the angle, we can now rotate between -180° and +180° without using trigonometric functions. As a byproduct, the zoomfactor is now tied to the time T. The function for the zoomfactor is shown in the image, it results in zooming from infinitely small to one (original) and back to infinitely small. Quite a few bytes have been spent to beautify the effect, correct the timing offset, speeding up the animation, enlarging the sierpinski pixels and have decent colors, but I thought it's worth the space.<br />
<syntaxhighlight lang="nasm"><br />
lea cx,[bp-2048]; center time to pass zero<br />
sal cx,3 ; speed up by factor 8!<br />
movzx ax,dh ; get X into AL<br />
movsx dx,dl ; get Y int DL<br />
mov bx,ax ; save X in BX<br />
imul bx,cx ; BX = X*T<br />
add bh,dl ; BH = X*T/256+Y<br />
imul dx,cx ; DX = Y*T<br />
sub al,dh ; AL = X-Y*T/256<br />
and al,bh ; AL = (X-Y*T/256)&(X*T/256+Y)<br />
and al,252 ; thicker sierpinski<br />
salc ; set pixel value to black<br />
jnz fx4q ; leave black if not sierpinski<br />
mov al,0x2A ; otherwise: a nice orange<br />
fx4q:<br />
</syntaxhighlight><br />
<br />
===Raycast bent tunnel===<br />
[[File:Raycastbenttunnel.png|thumb|raycast bent tunnel]]<br />
This is a variation of a the [https://www.pouet.net/prod.php?which=78044 raycast tunnel "into a new era"] (see also above). A [https://www.reddit.com/r/tinycode/comments/de7dj9/moving_cubes_with_pathtracing_lighting_and/ detailed explanation] has been posted to reddit for the [https://www.pouet.net/prod.php?which=83204 similar production "Essence"]. The custom colors have been removed, the bend direction is different, and the geometry is a bit more closed, to increase the performance on slower computers and DosBox.<br />
<syntaxhighlight lang="nasm"><br />
mov cl,-9 ; start with depth 9 (moves backwards)<br />
fx5L: <br />
push dx ; save DX, destroyed inside the loop<br />
mov al,dh ; Get Y into AL<br />
sub al,100 ; Centering Y has to be done "manually".<br />
imul cl ; Multiply AL=Y by the current distance, to get a projection(1)<br />
xchg ax,dx ; Get X into AL, while saving the result in DX (DH)<br />
add al,cl ; add distance to projection, (bend to the right)<br />
imul cl ; Multiply AL=X by the current distance, to get a projection(2)<br />
mov al,dh ; Get projection(1) in AL<br />
xor al,ah ; combine with projection(2)<br />
add al,4 ; center the walls around 0<br />
test al,-8 ; check if the wall is hit<br />
pop dx ; restore DX<br />
loopz fx5L ; repeat until "hit" or "iter=max"<br />
sub cx,bp ; offset depth by time<br />
xor al,cl ; XOR pattern for texture <br />
aam 6 ; irregular pattern with MOD 6<br />
add al,20 ; offset into grayscale palette<br />
</syntaxhighlight><br />
<br />
===Ocean night to day===<br />
[[File:Oceannight.png|thumb|left|oceannight]]<br />
[[File:Oceanday.png|thumb|right|oceanday]]<br />
The ocean effect is a wonderful example for a [https://en.wikipedia.org/wiki/Bob_Ross#Style "happy accident"]. If you load a value into the FPU as integer, and store it as floating point value, and interpret it as integer again, you get a cool pattern. Combine that with a reverse divide and you get a neat wavy effect. This works in tight conjunction with the register DX, which has a sign flip at the position we want, so we can easily separate sky and sea. The final touch is the coloring though. By design of the global framework, the value of AL is determined on function entrance, it contains the effect adress. With a bit of code shuffling, the color of the sky came "for free", without using instructions, as well as the "fading to day"-color, which is also an effect adress. And that is no happy accident. In other versions than the original submitted DosBox version, the sky color can vary for that reason.<br />
<syntaxhighlight lang="nasm"><br />
sub dh,120 ; check if pixel is in the sky<br />
js fx6q ; quit if that's the case<br />
mov [bx+si],dx ; move XY to a memory location<br />
fild word [bx+si] ; read memory location as integer<br />
fidivr dword [bx+si]; reverse divide by constant<br />
fstp dword [bx+si-1]; store result as floating point<br />
mov ax,[bx+si] ; get the result into AX<br />
add ax,bp ; modify color by time<br />
and al,128 ; threshold into two bands<br />
dec ax ; beautify colors to blue/black<br />
</syntaxhighlight><br />
<br />
===Fading effect===<br />
For convenience, the fading effect from the global framework, isolated. Basically it is generating a pseudorandom value from the screenpointer, then offsetting the time by a scaled value of it, then calling the destined effect.<br />
<syntaxhighlight lang="nasm"><br />
mov ax,0xcccd ; load magic Rrrola constant<br />
mul di ; transform screen pointer to X, Y<br />
add al,ah ; use transformation garbage as<br />
xor ah,ah ; pseudorandom value and clear AH<br />
add ax,bp ; add time value to random value<br />
shr ax,9 ; divide by 512 (basically the speed)<br />
and al,15 ; filter effect number<br />
xchg bx,ax ; move effect number to BX<br />
mov bh,1 ; reset BH to align with start of code<br />
mov bl,[byte bx+table] ; read the effect address from the table<br />
</syntaxhighlight><br />
<br />
==MIDI music part==<br />
This code section increments the timing value and produces sound. By chosing channel 3 we can reuse the "change channel instrument" instruction again as "RET". By shifting around the code of the effects, a reasonable value for the volume could be created and thus save another byte. Be aware that this code sequence only works if the MIDI device is already in UART mode, otherwise three more bytes have to be spent. Quite a few people from the audience and sceners told my afterwards that the melody sounds a bit like [https://en.wikipedia.org/wiki/Incantations_(album) Incantations from Mike Oldfield], but it is rather created by a simple approach. Starting with [https://www.pouet.net/prod.php?which=65604 "Hypnoteye" in 2015], I experimented with procedural MIDI and eventually released a [https://www.pouet.net/prod.php?which=66313 small MIDI framework (64 bytes)]. The basic principle is to jump through tonal space with fixed steps, and map high values back to lower ones (by modulo). With simple combinations of stepwidth and modvalue, interesting effects can be produced. For example stepwidth=3 for reduced minor chords which are never resolved, likewise stepwidth=4, or stepwidth=6 for tritonal effects. With a well chosen modvalue, these patterns can create progressions. I did not do an appropriate theoretical analysis yet, rather explored the tonal space and noted when something sounded interesting.<br />
<syntaxhighlight lang="nasm"><br />
sounds: db 0xc3, 11, 0x93, fx2-s<br />
...<br />
inc bp ; increment timing value<br />
test bp, 7 ; play a note every 8th step<br />
jnz nomuse ; quit if in between<br />
mov dx,0x330 ; port number for MIDI<br />
mov si,sounds ; adress for sound data<br />
outsb ; change instrument of channel 3<br />
outsb ; to vibraphone<br />
outsb ; play a note on channel 3<br />
imul ax,bp,-19*32*4 ; the magic melody constant<br />
shr ax,10 ; scale down and implicit "and 63"<br />
add al,22 ; pitch base is 22<br />
out dx,al ; play THIS note on channel 3<br />
outsb ; play it with THIS volume<br />
</syntaxhighlight><br />
==Further tweaks==<br />
To be done ;)<br />
==Differences between dos versions==<br />
To be done ;)<br />
==External Links==<br />
*[https://en.wikipedia.org/wiki/Demoscene Demoscene]<br />
*[https://www.youtube.com/watch?v=Imquk_3oFf4 Youtube Capture]<br />
*[https://www.twitch.tv/videos/589179638?t=7h02m40s Twitch Live Reactions]<br />
*[https://www.pouet.net/prod.php?which=85227 Production Page on Pouet]<br />
*[https://www.reddit.com/r/programming/comments/g1qhf8/memories_256_byte_msdos_intro/ Reddit/Programming Thread I]<br />
*[https://www.reddit.com/r/programming/comments/g4ewh7/how_its_made_memories_a_256_byte_dos_intro_with_8/ Reddit/Programming Thread II]<br />
*[https://github.com/cesarmiquel/memories-256b-msdos-intro Analysis by "cesarmiquel"]<br />
<br />
==Original release code (complete)==<br />
<syntaxhighlight lang="nasm"> <br />
; "memories" by HellMood/DESiRE<br />
; the tiny megademo, 256 byte msdos intro<br />
; shown in April 2020 @ REVISION<br />
;<br />
; (= WILL BE COMMENTED IN DETAIL LATER =)<br />
;<br />
; create : nasm.exe memories.asm -fbin -o memories.com<br />
; CHOOSE YOUR TARGET PLATFORM (compo version is dosbox)<br />
; be sure to use the dosbox.conf from this archive!<br />
; only ONE of the defines should be active!<br />
%define dosbox ; size : 256 bytes<br />
;%define freedos ; size : 230 bytes<br />
;%define winxpdos ; size : 263 bytes<br />
<br />
; DON'T TOUCH THESE UNLESS YOU KNOW WHAT YOU'RE DOING<br />
%ifdef winxpdos<br />
%define music<br />
%define switch_uart<br />
%define safe_dx<br />
%define safe_segment<br />
%endif<br />
%ifdef freedos<br />
%define safe_dx<br />
%endif<br />
%ifdef dosbox<br />
%define music<br />
;%define safe_dx ; sometimes needed<br />
%endif<br />
<br />
; GLOBAL PARAMETERS, TUNE WITH CARE!<br />
%define volume 127 ; not used on dosbox (optimization)<br />
%define instrument 11<br />
%define scale_mod -19*32*4; <br />
%define time_mask 7<br />
%define targetFPS 35<br />
%define tempo 1193182/256/targetFPS <br />
%define sierp_color 0x2A<br />
%define tunnel_base_color 20<br />
%define tunnel_pattern 6<br />
%define tilt_plate_pattern 4+8+16<br />
%define circles_pattern 8+16<br />
<br />
org 100h<br />
s:<br />
%ifdef freedos<br />
mov fs,ax<br />
mov [fs:0x46c],ax<br />
%endif<br />
mov al,0x13<br />
int 0x10 <br />
xchg bp,ax<br />
push 0xa000-10<br />
pop es<br />
%ifndef freedos<br />
mov ax,0x251c<br />
%ifdef safe_dx <br />
mov dx,timer <br />
%else ; assume DH=1, mostly true on DosBox<br />
mov dl,timer<br />
%endif<br />
int 0x21<br />
%endif<br />
top:<br />
%ifdef freedos<br />
mov bp,[fs:0x46c]<br />
%endif <br />
mov ax,0xcccd<br />
mul di<br />
add al,ah<br />
xor ah,ah<br />
add ax,bp<br />
shr ax,9<br />
and al,15<br />
xchg bx,ax<br />
mov bh,1<br />
mov bl,[byte bx+table]<br />
call bx<br />
stosb<br />
inc di<br />
inc di<br />
jnz top<br />
mov al,tempo<br />
out 40h,al<br />
in al,0x60<br />
dec al<br />
jnz top<br />
sounds:<br />
db 0xc3 ; is MIDI/RET<br />
%ifdef music<br />
db instrument,0x93<br />
%ifdef switch_uart<br />
db volume ; without switch, volume is in table<br />
db 0x3f <br />
%endif<br />
%endif<br />
table: ; first index is volume, change order with care! <br />
db fx2-s,fx1-s,fx0-s,fx3-s,fx4-s,fx5-s,fx6-s,sounds-s,stop-s<br />
stop:<br />
pop ax<br />
ret<br />
timer:<br />
%ifndef freedos<br />
%ifdef safe_segment<br />
push cs<br />
pop ds<br />
%endif<br />
inc bp<br />
%ifdef music <br />
test bp, time_mask<br />
jnz nomuse<br />
mov dx,0x330<br />
mov si,sounds<br />
outsb<br />
outsb<br />
outsb<br />
imul ax,bp,scale_mod<br />
shr ax,10<br />
add al,22<br />
out dx,al<br />
outsb<br />
%ifdef switch_uart<br />
inc dx<br />
outsb<br />
%endif<br />
%endif<br />
nomuse:<br />
iret<br />
%endif <br />
fx0: ; tilted plane, scrolling<br />
mov ax,0x1329<br />
add dh,al<br />
div dh<br />
xchg dx,ax<br />
imul dl<br />
sub dx,bp<br />
xor ah,dl<br />
mov al,ah<br />
and al,tilt_plate_pattern<br />
ret<br />
fx2: ; board of chessboards<br />
xchg dx,ax<br />
sub ax,bp<br />
xor al,ah<br />
or al,0xDB<br />
add al,13h<br />
ret<br />
fx1: ; circles, zooming<br />
mov al,dh<br />
sub al,100<br />
imul al<br />
xchg dx,ax<br />
imul al<br />
add dh,ah<br />
mov al,dh<br />
add ax,bp<br />
and al,circles_pattern<br />
ret<br />
fx3: ; parallax checkerboards<br />
mov cx,bp<br />
mov bx,-16<br />
fx3L:<br />
add cx,di<br />
mov ax,819<br />
imul cx <br />
ror dx,1 <br />
inc bx <br />
ja fx3L<br />
lea ax,[bx+31] <br />
ret<br />
fx4: ; sierpinski rotozoomer <br />
lea cx,[bp-2048]<br />
sal cx,3<br />
movzx ax,dh<br />
movsx dx,dl<br />
mov bx,ax<br />
imul bx,cx<br />
add bh,dl<br />
imul dx,cx<br />
sub al,dh<br />
and al,bh<br />
and al,0b11111100<br />
salc ; VERY slow on dosbox, but ok<br />
jnz fx4q<br />
mov al,sierp_color<br />
fx4q:<br />
ret<br />
fx5: ; raycast bent tunnel<br />
mov cl,-9<br />
fx5L: <br />
push dx<br />
mov al,dh<br />
sub al,100<br />
imul cl<br />
xchg ax,dx <br />
add al,cl<br />
imul cl<br />
mov al,dh<br />
xor al,ah<br />
add al,4<br />
test al,-8<br />
pop dx<br />
loopz fx5L<br />
sub cx,bp<br />
xor al,cl<br />
aam tunnel_pattern; VERY slow on dosbox, but ok<br />
add al,tunnel_base_color<br />
ret<br />
fx6: ; ocean night / to day sky<br />
sub dh,120<br />
js fx6q<br />
mov [bx+si],dx<br />
fild word [bx+si]<br />
fidivr dword [bx+si]<br />
fstp dword [bx+si-1]<br />
mov ax,[bx+si]<br />
add ax,bp<br />
and al,128<br />
dec ax<br />
fx6q:<br />
ret<br />
</syntaxhighlight><br />
<br />
==Bonus - NFO/ASCII==<br />
<br />
<syntaxhighlight lang="nasm"> <br />
art : hammerfist<br />
∂#MW%e _d$Ng,<br />
'B, ∂b _jM@$QZb,cQ" )@<br />
,edRB$b,l@ Wk,yGR$KM&$b, ,dP" Wl ]bsd%UR8BG6&$@DSyG#ZKM&$b,<br />
,dP "T%L 'MGF "*∂R_ Tg "*4Zk,#I YP W" 7P "*∂R<br />
4M gd@ ^ ∂@ d@b dQ$#@Z@R3L_ "*GMj 'W ,gd$ d@b 9Q$#%b<br />
W#, `M Wb `* _4P `Qk *#N8L `H5 @b 'QR7YK `* _4F" Qk<br />
`6@L dML '@ ,BK 'M ∂B *b, '#L<br />
^QBb,_ _,4&M∞∂@=,_ _dGL _gQKM GL @k 'Mg,_ _dG,<br />
"*BN5W$2#MNP" "*G3WRM8&B5P"`Y@QNW3Z5P" ∂#$W8BRM3XZN87 "*GW38M%EBDW5P"`<br />
<br />
<br />
p r e s e n t s<br />
<br />
4<br />
d@,<br />
_& `Wl<br />
_,aP "#baedM$#@@K JP*"?ML<br />
,ad@$#P" ,d@NEWVB" X,aQPYb,_<br />
V@Mm,_ ,d@MW#BW' EMP" '¶R ,ngBP^fML<br />
¶M@N@y Y#BNW#M" J9" `MQ9" "MgRBq ,QBMg,<br />
VN#P` ,d@@ `WM@^ 7f ¶F` 7kY" ^G _.eQNE1.<br />
]B _G@MWN$, `P ' 4b QP ¶w@F*^ ^Qb<br />
]O@NRM#W@MNB, ; ^` j JP^ Yl<br />
J#NRNWM@#BcT"^ ,A _J _q@ `X<br />
'¶WM#B@WdY`,7 _G#YN#PM _,gG" M,<br />
*BN#WP" dK ,Q@NRMB"]9 , _,M@Q* #A<br />
"U^ V@h, iNBW#NT J' J9 s@QN" _; 'D,<br />
¶RMBv&NMQR@9 .W .K' "9` ,6BA _JL ]l<br />
Y#NE@W#NRP #[ `¶8 _d@MW#B_jW#W BN<br />
"GQ@MR#W QL_ *B _,p#NBW#NQMG@WY 3Q<br />
"Y@F ,XW@M%im,_Yb_ _,g5@#MW@QMNE@E@NRMB ,WM<br />
` _,gP*"#REM#GB@N#MQbnd@N#M@MW#R8QSB^'WQERM@ ;4NB,<br />
,GYKL ¶E#B8R8QSB@M@#BM#W@MNB"`_ , "^` N ,dW@Ql<br />
_Q`'W`*t '¶@GS#MBQ#E@W#NQBW[ 'LvQ_ ,K _dNABGM#N<br />
,F ' `^WAB@QGE9*"9^*@L jP7FY, ¶h,_.jWM#BR#GBM,<br />
J; , _ ' ' "LL YxE#B8R8QSBNW@W;<br />
AP _,Ag6^ _ J ¶A `"Q#M@MW#R8E#P<br />
j@ `"XQW[ 'LvK,_ 'L_,/ @t Y#NE@WNR"<br />
:M/ 9^*@L jP7F" _PYKL _,A; ¶RSNQ"<br />
dKL ' ` ' "L "`'W`*t `"XQb `W^<br />
Q`8t 'L_,/ , _ ' 9^Q<br />
,W _PYKL _,Ag6^ ' W, _ ,#N&<br />
!N _ J "`'W`*t `"XQW[ _ J N!_JG9^RwQ' *t<br />
`W, 'LvK,_ ' _gGB8@L _ 'LvK,_ ,WgB' V 7L<br />
_.,gm&@B&wBZF" j@'` "WL _gML jZd7Yb lN" dBWl<br />
,g&QB*"^` `"*G@g, .gR&k,_ ,N" '@QF ¶k;gMF *QvQ jQ, ,@N@B#,<br />
.eQF*` `Yb@" "*6Qg,gF , 7 XMN" 'MNB, ^¶QWSER@N;<br />
,gP" qy, W' ^Q' &L ,g@W' `QMEL `"WBNWP<br />
g7 ¶9 ,X M? 9" _q8MSK ¶EMt *@K<br />
Vh _,m#L _AH le ,GBDNE9^A, *@F NMg<br />
¶L,qQ@ND _.m@Bl We ,gM@B8#Q' ¶h_ lWE,<br />
W9NHW@` JWM#B@] @e 4WR@NGF^ 'QL dRWl<br />
VMd* "@BE@PM 'N *UP" VW, JRSB;<br />
,@F j `¶WK W, ¶t XNt _A@E#N<br />
_JP ,6& "GLdM XD, _.g8NMA@k,_ _,gG#NMGR;<br />
"Z .JRER 'VMi _jNB#W&_ _,j@E@W#Nl ¶MBGMNQGNQMG@QBW9<br />
¶h ,G@NRMBl `" ,d#R@M$F ¶Mg,_.gp&@@NEWVBWBMG *QMN8R8SBN$E@WF<br />
Vb dW#R8QSRb, *YM@EQ,_ 'MENBW#NQMG#B@R@MW#l "BM@QNENRQG'<br />
*WGS#MBMNEYL `^"*8M@Q@NRM#W@BWSNW@QBF"` `^*@QBF^ [HFT]<br />
^M@MW#Q9 ^Wt `^¶RQ@W8NQGP*`<br />
¶Q#@P Vk lA `"^`<br />
Y" `MA J#,<br />
*R@, ,MQl<br />
Y#Wk, GWM8L<br />
W8RQSt,_ AQ@MR#,<br />
`@M@#SB@Mbm.,_ QNBW#NW<br />
¶QB8R8SBN$WNRM@#GNtwg@NMQR@B'<br />
*MBQ#8R8QS@NE@WNBW#NQMG@NR;<br />
`WGS#MBQ#R8QSB@NE@W#NQBW9<br />
*OMW@QMNE@E@NRMW@QMB@*<br />
`^"YQW@Q#SB#NE@EGP<br />
`^"*8R@GBQF`<br />
</syntaxhighlight></div>HellMood