This the multi-page printable view of this section.Click here to print.

Return to the regular view of this page.

Optimizing code

Things to try to optimise Z80 code

Table of Contents

1 Accumulator

2 Comparisons

3 Math

4 Bit Shifting

Writing code on an 8-bit microcomputer requires a skill that has been lost in the modern programming era. These days, developers are used to having Gigabytes of memory and processors that run at multiple Gigahertz.

In the microcomputer era we had far, far less resources. Processor clocks ran at 2 or 4 MHz, one thousandth of the clock speed of modern processors. If we were lucky we had 32K, 48K or 64K of memory to play with and that was it.

Because of this, we had to learn pretty quickly how to optimise our code to fit into memory. If we were lucky we could use a floppy disk to page in parts of the program as needed, but even then when a Cassette tape was the primary medium for a platform that wasn't even possible.

Optimisations at the machine code level would be a balance of reducing the size of code and having code run as fast as possible.

Sometimes you might sacrifice some memory for speed if the routine is important, for example it's doing a transform for some graphics in real time, but most of the time it's to reduce the memory used.

1 - Accumulator

Optimising use of the A register

Setting the Accumulator to 0

When dealing with loading 0 into the Accumulator, there's several ways to do it.

3E00LDA,0Traditional way to set A to 0
AFXORAAnything xor itself is 0
97SUBAA-A=0

The downside to the above options is that they also affect the flags. However, they are only 1 byte long not 2 and are both 3 T-states faster.

Inverting A

If inverting A, i.e. swapping each bit from 1 to 0 and vis-versa then instead of XOR 0xFF use CPL instead. It's both faster, 1 byte and that's all that instruction does.

EEFFXOR0xFFA=A XOR 0xff

2FCPLThis instruction does exactly the same thing and nothing else!

The downside is that CPL does not affect the flags whilst XOR does.

2 - Comparisons

Optimising comparing numbers

A = 0

A common test is to see if A is 0, so you would expect to use CP 0 to test for it.

Just like setting A to 0 we can compare quicker. In this case, instead of CP 0 we can just use either OR A or AND A instead. Just 1 byte and 3 T-states faster.

FE00CP0A-0 will set Z if A is also 0
A7ANDAAnything AND itself is itself but Z is set if A is 0
B7ORAAnything OR itself is itself but Z is set if A is 0

For example, take this simple routine which writes a NULL terminated string pointed to by HL to the screen of the Sinclair ZX Spectrum:

Print null terminated string at HL to the screen
printStrLDA,(HL)get next byte
CP0check for null
RETZStop when we get a null
RST2print the character
INCHLmove to next character
JRprintStrloop back

The optimisation here is to replace CP 0 with OR A

Print null terminated string at HL to the screen
printStrLDA,(HL)get next byte
ORAcheck for null
RETZStop when we get a null
RST2print the character
INCHLmove to next character
JRprintStrloop back

A = 1

Comparing A to 1 can also be done using DEC A instead of CP 1. By decrementing A, the Z flag will be set if A is now 0. Like above its faster and 1 byte, but it also alters A, so it's not really of any use unless you don't care about the value of A after the test.

FE01CP1A-1 will set Z if A is also 1

3DDECAA=A-1, Z is set if A is now 0

Internally, CP 1 just does A-1 but discards the result which is why DEC A works in this instance.

Compare number

With CP it's easy to test for less than (<), equals (=), not equals (!=) or greater-than-equals (>=) because of how the C and Z flags are used:

CP15test A against 15
RETCReturn if A < 15
RETNCReturn if A >= 15
RETZReturn if A = 15
RETNZReturn if A != 15

The following shows how to get the other two tests, Less-Than-Equals (<=) and Greater-Than(>):

A <= n

This is a simple one. As CP tests against A-n then if A=N then Z is set but if A < n then C is set.

CP15test for A<=15
RETCReturn if A<15
RETZReturn if A=15

To optimise this we should test against n+1 instead. Doing this means we can just use the Carry flag as it would be set when A < n+1:

CP15+1test for A<16

RETCReturn if A<16

A > n

This is the opposite problem. Here Carry is clear when A>=n, so to get A>n we first need to test for equals using the Z flag and if that's not set then check for the Carry flag to be clear:

CP15test for A>15
JRZ, skipSkip if A=15
RETNCReturn if A>=15
skipContinue as A was <= 15

Like the previous example, this can be optimised simply by adding 1 and then testing for A >= (n+1) instead:

CP15+1test for A>=16

RETNCReturn if A>=16

Wasteful use of CP

It's easy to forget that some instructions set the flags based on the result so frequently you do not need to use CP to test for a condition when the result is already known:

Here we check for bit 1 of A is set and if it is we exit the subroutine:

E601AND1A=A AND 0x01
FE01CP1Is A set to 1
C8RETZReturn is A is now 1

Here the CP isn't required as AND will set Z if A=0, so we can remove the CP and use NZ instead saving 2 bytes:

E601AND1A=A AND 0x01

C8RETNZReturn as A is now 1

Testing bits

Testing Bit 0 of A

The standard method of testing if bit 0 of A is set is to use BIT 0,A:

CB47BIT0,ATest if BIT 0 is set

C8RETNZReturn as bit 0 of A was set

If we don't need A afterwards then we can optimise this by using a right shift instead:

1FRRAShift A right 1 bit, C=original bit 0

C8RETCReturn as bit 0 of A was set

This works as we just shifted bit 0 into the Carry Flag and we save an additional byte in the process.

Using RRA would be faster & saves 1 byte, but it destroys A. If you need to keep A intact then keep the BIT instruction.

Testing Bit 7 of A

Just like testing bit 0, with bit 7 we can do the same but shifting right instead. So rather than using BIT 7,A like:

CB7FBIT7,ATest if BIT 7 is set

C8RETNZReturn as bit 7 of A was set

We can just use RLA and test the Carry flag:

17RLAShift A left 1 bit, C=original bit 7

C8RETCReturn as bit 7 of A was set

The downside of this is it destroys the contents of A.

3 - Math

Optimising mathematics

Basic Arithmetic

A=-B

A simple one, we want to set A to be -B.

The logical way is to load A with B then negate it:

78LDA,BSet A to B

ED44NEGNegate A to get A=-B

But a quicker and shorter way is:

AFXORAA=0

90SUBBA=0-B = -B

4 - Bit Shifting

Optimising bit shifting

Bit shifting, be it rotating left or right is so common it's easy to create slow code if you are not careful.

Shift BC, DE or HL left one bit

This is a 16 bit shift left operation. The first thought would be, especially if you have a 6502 background like myself, is to shift L left 1 bit, clearing bit 0 with carry set to the original bit 7 state, then shift H left 1 bit pulling in carry into bit 0:

CB25SLALShift L left, set bit 0 to 0

CB14RLHShift H left, set bit 0 to original bit 7 from L

However any shift left operation is the same as multiplying the value by 2 or just adding to itself, and the Z80 has a single byte operation to do this.

29ADDHL,HLShift HL left 1 bit

The same applies for BC or DE. If you need to shift a 16-bit register left one bit then always use ADD.

Shift 8-bit register left one bit

This might seem odd but the same optimisation can be done for any of the 8-bit registers. You can either use SLA or you can just add the register to itself.

Shift A left one bit, set bit 0 to 0
CB27SLAA2 bytes 8 t-states
87ADDA,A1 byte 4 t-states

Here we can halve both the code size and the time taken to perform the shift.

The downside with ADD is that the original bit 7 of the register is lost. SLA will preserve it in the Carry flag.

Other than that it's identical, with Z set if the register is now 0 and S set if the new bit 7 is set.