General Coding Tricks
Data is code, code is data
Code is nothing more than data that the CPU interprets. For example, consider this multi-byte instruction:
This assembles to
B4 by itself isn't interesting, but
37 happens to be the opcode for
AAS. Let's say you had this code before a loop, and you needed to perform
AAS at the top of a loop. Rather than put AAS at the top of the loop, you can reuse the opcode that will already be there as part of the
mov ah,37 that comes before it. Just jump directly into the middle of the
mov ah,37h, which will get interpreted and executed as
label: mov ah,37h ;misc. stuff loop label+1
+1 specifies the jump should go to 1 byte past the actual location.
You can use opcodes hidden in your existing data. For example, .COM files can end with
RET, which is opcode
C3. If you already have a
C3 somewhere else in your code, even as part of data, just
JMP to that pre-existing
C3 instead of adding a
If your environment holds you back, change it
The default MCGA palette is fairly horrible, but can be size advantages to changing it: While setting a new palette costs bytes, the new palette arrangement could save you headaches down the road. For example, if your code is calculating pixel colors that fall into goofy ranges, rather than constantly adjust the colors to sane ranges (ie. aligned to powers of 2), just set the palette so that values falling into those ranges look the way you want. (This assumes you have very small ways of redefining the palette, of course.)
The above is maybe not the best example. Rewrites to this section are welcome.
Need a constant?
If you need a constant value but you're out of space, search your assembled code for a byte value you can use.
A smaller way to point to Mode 13's screen segment
mov ah,0a0h; mov es,ax or
push word 0a000h; pop es, try this 2-byte wonder:
This sets ES=9FFF, only one away from A000. You can write to the screen with ES: this way as long as you are aware the segment is one paragraph (16 bytes) behind, so just increase your offset by 16 if you need exact placement.
How does this work? At start of execution of a .COM file, BX=0, and DS=CS. The contents of the COM file get loaded to offset 0x100 in that segment, but loaded before that is the PSP (program segment prefix), which DOS populates with information about the loaded program and other info. The PSP starts with
CD 20 (
INT 20, which exits the program), so that's what gets loaded into BP. The next word is the number of the last free conventional memory segment, typically 0x9fff (but can be something different if parts of the upper memory range are either not installed or allocated).
Warning: This trick doesn't always work. On FreeDos, this can set ES=9FE0 and there is something resident at that location that can screw up the system after normal program exit if you overwrite it.
Accessing the timer tick for free
If using a 386+, FS=0 at .COM start. So,
FS:[046C] gets you the DOS timer tick variable, which you could use for timing/pacing, or a random seed. Some environments, especially EMS/XMS programs, can modify the FS register, so it can't always assumed to be 0000h.
POP DS right after the start and accessing
[046C] does the trick then with equal size.
If you need to repeat a section of code that doesn't modify the carry flag, and you know the carry flag is clear, you can loop once in only 3 bytes:
looping: ;do stuff here cmc jc looping
Looping three times
If you need to repeat a section of code and you have a register whose value is zero and can be incremented freely, or whose value is -1 and can be decremented freely, you can loop twice in only 3 bytes:
looping: ;do stuff here inc bx ;if decrementing instead, parity check must be reversed jpo looping ;1 (01b) and 2 (10b) have odd parity, 3 (11b) has even parity
The Rrrola Trick (Obtaining X and Y without DIV)
In 320x200 mode, instead of constructing X and Y from the screen pointer DI with DIV, you can get a decent estimation with multiplying the screen pointer with 0xCCCD and read X and Y from the 8bit registers DH (+DL as 16bit value) and DL (+AH as 16bit value). The idea is to interpret DI as a kind of 16 bit float in the range [0,1], from start to end. Multiplying this number in [0,1] with 65536 / 320 = 204,8 results in the row before the comma, and again as a kind of a float, the column after the comma. The representation 0xCCCD is the nearest rounding of 204,8 * 256 ( = 52428,8 ~ 52429 = 0xCCCD). As long as the 16 bit representations are used, there is no precision loss.
This is adapted from "Puls" by Rrrola where X and Y are directly modified on the stack by performing
add dword[di],0000CCCDh on each pixel iteration, which requires 7 bytes of code. The vertical alignment correction is solved with a good starting value on said DWORD on the stack before each frame, which requires 2 additional bytes. Both approaches are too different to directly compare, but share the core idea of multiplying with
0xCCCD, so "Rrrolas trick" is an appropriate term to use.