Optimizing code
Things to try to optimise Z80 code
Table of Contents
Writing code on an 8-bit microcomputer requires a skill that has been lost in the modern programming era.
These days, developers are used to having Gigabytes of memory and processors that run at multiple Gigahertz.
In the microcomputer era we had far, far less resources.
Processor clocks ran at 2 or 4 MHz, one thousandth of the clock speed of modern processors.
If we were lucky we had 32K, 48K or 64K of memory to play with and that was it.
Because of this, we had to learn pretty quickly how to optimise our code to fit into memory.
If we were lucky we could use a floppy disk to page in parts of the program as needed,
but even then when a Cassette tape was the primary medium for a platform that wasn't even possible.
Optimisations at the machine code level would be a balance of reducing the size of code and having code run as fast
as possible.
Sometimes you might sacrifice some memory for speed if the routine is important, for example it's doing a transform
for some graphics in real time, but most of the time it's to reduce the memory used.
1 - Accumulator
Optimising use of the A register
Setting the Accumulator to 0
When dealing with loading 0 into the Accumulator, there's several ways to do it.
The downside to the above options is that they also affect the flags.
However, they are only 1 byte long not 2 and are both 3 T-states faster.
Inverting A
If inverting A, i.e. swapping each bit from 1 to 0 and vis-versa then instead of XOR 0xFF
use
CPL
instead. It's both faster, 1 byte and that's all that instruction does.
The downside is that CPL
does not affect the flags whilst XOR
does.
2 - Comparisons
Optimising comparing numbers
A = 0
A common test is to see if A is 0, so you would expect to use CP 0
to test for it.
Just like setting A to 0 we can compare quicker.
In this case, instead of CP 0
we can just use either OR A
or AND A
instead.
Just 1 byte and 3 T-states faster.
For example, take this simple routine which writes a NULL terminated string pointed to by HL to the screen
of the Sinclair ZX Spectrum:
printStrLDA,(HL)
CP0
RETZ
RST2
INCHL
JRprintStr
The optimisation here is to replace CP 0
with OR A
printStrLDA,(HL)
ORA
RETZ
RST2
INCHL
JRprintStr
A = 1
Comparing A to 1 can also be done using DEC A
instead of CP 1
.
By decrementing A, the Z flag will be set if A is now 0.
Like above its faster and 1 byte, but it also alters A, so it's not really of any use unless you don't care about
the value of A after the test.
Internally, CP 1
just does A-1 but discards the result which is why DEC A
works in this instance.
Compare number
With CP
it's easy to test for less than (<), equals (=), not equals (!=) or
greater-than-equals (>=) because of how the C and Z flags are used:
CP15
RETC
RETNC
RETZ
RETNZ
The following shows how to get the other two tests, Less-Than-Equals (<=) and Greater-Than(>):
A <= n
This is a simple one. As CP
tests against A-n then if A=N then Z is set but if A < n then C is
set.
To optimise this we should test against n+1 instead. Doing this means we can just use the Carry flag
as it would be set when A < n+1:
A > n
This is the opposite problem. Here Carry is clear when A>=n, so to get A>n
we first need to test for equals using the Z flag and if that's not set then
check for the Carry flag to be clear:
CP15
JRZ, skip
RETNC
skip
Like the previous example, this can be optimised simply by adding 1 and then testing for A >= (n+1) instead:
Wasteful use of CP
It's easy to forget that some instructions set the flags based on the result so frequently
you do not need to use CP
to test for a condition when the result is already known:
Here we check for bit 1 of A is set and if it is we exit the subroutine:
Here the CP
isn't required as AND
will set Z if A=0,
so we can remove the CP
and use NZ
instead saving 2 bytes:
Testing bits
Testing Bit 0 of A
The standard method of testing if bit 0 of A is set is to use BIT 0,A
:
If we don't need A afterwards then we can optimise this by using a right shift instead:
This works as we just shifted bit 0 into the Carry Flag and we save an additional byte in the process.
Using RRA
would be faster & saves 1 byte, but it destroys A.
If you need to keep A intact then keep the BIT instruction.
Testing Bit 7 of A
Just like testing bit 0, with bit 7 we can do the same but shifting right instead.
So rather than using BIT 7,A
like:
We can just use RLA
and test the Carry flag:
The downside of this is it destroys the contents of A.
3 - Math
Optimising mathematics
Basic Arithmetic
A=-B
A simple one, we want to set A to be -B.
The logical way is to load A with B then negate it:
But a quicker and shorter way is:
4 - Bit Shifting
Optimising bit shifting
Bit shifting, be it rotating left or right is so common it's easy to create slow code if you are not careful.
Shift BC, DE or HL left one bit
This is a 16 bit shift left operation.
The first thought would be, especially if you have a 6502 background like myself, is to shift L left 1 bit, clearing bit 0 with
carry set to the original bit 7 state, then shift H left 1 bit pulling in carry into bit 0:
However any shift left operation is the same as multiplying the value by 2 or just adding to itself,
and the Z80 has a single byte operation to do this.
The same applies for BC or DE.
If you need to shift a 16-bit register left one bit then always use ADD
.
Shift 8-bit register left one bit
This might seem odd but the same optimisation can be done for any of the 8-bit registers.
You can either use SLA
or you can just add the register to itself.
Here we can halve both the code size and the time taken to perform the shift.
The downside with ADD
is that the original bit 7 of the register is lost.
SLA
will preserve it in the Carry flag.
Other than that it's identical, with Z set if the register is now 0 and S set if the new bit 7 is set.