More SGML conversions
git-svn-id: svn://svn.cc65.org/cc65/trunk@530 b7a2c559-68d2-44c3-8de9-860c34a00d81
This commit is contained in:
parent
0d2f5c0ab5
commit
5bd8009bf9
4 changed files with 383 additions and 342 deletions
11
doc/Makefile
11
doc/Makefile
|
@ -8,11 +8,12 @@
|
||||||
SGML = ar65.sgml \
|
SGML = ar65.sgml \
|
||||||
ca65.sgml \
|
ca65.sgml \
|
||||||
cc65.sgml \
|
cc65.sgml \
|
||||||
cl65.sgml \
|
cl65.sgml \
|
||||||
dio.sgml \
|
coding.sgml \
|
||||||
geos.sgml \
|
dio.sgml \
|
||||||
index.sgml \
|
geos.sgml \
|
||||||
ld65.sgml \
|
index.sgml \
|
||||||
|
ld65.sgml \
|
||||||
library.sgml
|
library.sgml
|
||||||
|
|
||||||
TXT = $(SGML:.sgml=.txt)
|
TXT = $(SGML:.sgml=.txt)
|
||||||
|
|
372
doc/coding.sgml
Normal file
372
doc/coding.sgml
Normal file
|
@ -0,0 +1,372 @@
|
||||||
|
<!doctype linuxdoc system>
|
||||||
|
|
||||||
|
<article>
|
||||||
|
<title>cc65 coding hints
|
||||||
|
<author>Ullrich von Bassewitz, <htmlurl url="mailto:uz@cc65.org" name="uz@cc65.org">
|
||||||
|
<date>03.12.2000
|
||||||
|
|
||||||
|
<abstract>
|
||||||
|
How to generate the most effective code with cc65.
|
||||||
|
</abstract>
|
||||||
|
|
||||||
|
<sect>Use prototypes<p>
|
||||||
|
|
||||||
|
This will not only help to find errors between separate modules, it will also
|
||||||
|
generate better code, since the compiler must not assume that a variable sized
|
||||||
|
parameter list is in place and must not pass the argument count to the called
|
||||||
|
function. This will lead to shorter and faster code.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<sect>Don't declare auto variables in nested function blocks<p>
|
||||||
|
|
||||||
|
Variable declarations in nested blocks are usually a good thing. But with
|
||||||
|
cc65, there is a drawback: Since the compiler generates code in one pass, it
|
||||||
|
must create the variables on the stack each time the block is entered and
|
||||||
|
destroy them when the block is left. This causes a speed penalty and larger
|
||||||
|
code.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<sect>Remember that the compiler does not optimize<p>
|
||||||
|
|
||||||
|
The compiler needs hints from you about the code to generate. When accessing
|
||||||
|
indexed data structures, get a pointer to the element and use this pointer
|
||||||
|
instead of calculating the index again and again. If you want to have your
|
||||||
|
loops unrolled, or loop invariant code moved outside the loop, you have to do
|
||||||
|
that yourself.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<sect>Longs are slow!<p>
|
||||||
|
|
||||||
|
While long support is necessary for some things, it's really, really slow on
|
||||||
|
the 6502. Remember that any long variable will use 4 bytes of memory, and any
|
||||||
|
operation works on double the data compared to an int.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<sect>Use unsigned types wherever possible<p>
|
||||||
|
|
||||||
|
The CPU has no opcodes to handle signed values greater than 8 bit. So sign
|
||||||
|
extension, test of signedness etc. has to be done by hand. The code to handle
|
||||||
|
signed operations is usually a bit slower than the same code for unsigned
|
||||||
|
types.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<sect>Use chars instead of ints if possible<p>
|
||||||
|
|
||||||
|
While in arithmetic operations, chars are immidiately promoted to ints, they
|
||||||
|
are passed as chars in parameter lists and are accessed as chars in variables.
|
||||||
|
The code generated is usually not much smaller, but it is faster, since
|
||||||
|
accessing chars is faster. For several operations, the generated code may be
|
||||||
|
better if intermediate results that are known not to be larger than 8 bit are
|
||||||
|
casted to chars.
|
||||||
|
|
||||||
|
When doing
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
unsigned char a;
|
||||||
|
...
|
||||||
|
if ((a & 0x0F) == 0)
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
the result of the & operator is an int because of the int promotion rules of
|
||||||
|
the language. So the compare is also done with 16 bits. When using
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
unsigned char a;
|
||||||
|
...
|
||||||
|
if ((unsigned char)(a & 0x0F) == 0)
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
the generated code is much shorter, since the operation is done with 8 bits
|
||||||
|
instead of 16.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<sect>Make the size of your array elements one of 1, 2, 4, 8<p>
|
||||||
|
|
||||||
|
When indexing into an array, the compiler has to calculate the byte offset
|
||||||
|
into the array, which is the index multiplied by the size of one element. When
|
||||||
|
doing the multiplication, the compiler will do a strength reduction, that is,
|
||||||
|
replace the multiplication by a shift if possible. For the values 2, 4 and 8,
|
||||||
|
there are even more specialized subroutines available. So, array access is
|
||||||
|
fastest when using one of these sizes.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<sect>Expressions are evaluated from left to right<p>
|
||||||
|
|
||||||
|
Since cc65 is not building an explicit expression tree when parsing an
|
||||||
|
expression, constant subexpressions may not be detected and optimized properly
|
||||||
|
if you don't help. Look at this example:
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
#define OFFS 4
|
||||||
|
int i;
|
||||||
|
i = i + OFFS + 3;
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
The expression is parsed from left to right, that means, the compiler sees
|
||||||
|
'i', and puts it contents into the secondary register. Next is OFFS, which is
|
||||||
|
constant. The compiler emits code to add a constant to the secondary register.
|
||||||
|
Same thing again for the constant 3. So the code produced contains a fetch of
|
||||||
|
'i', two additions of constants, and a store (into 'i'). Unfortunately, the
|
||||||
|
compiler does not see, that "OFFS + 3" is a constant for itself, since it does
|
||||||
|
it's evaluation from left to right. There are some ways to help the compiler
|
||||||
|
to recognize expression like this:
|
||||||
|
|
||||||
|
<enum>
|
||||||
|
|
||||||
|
<item>Write "i = OFFS + 3 + i;". Since the first and second operand are
|
||||||
|
constant, the compiler will evaluate them at compile time reducing the code to
|
||||||
|
a fetch, one addition (secondary + constant) and one store.
|
||||||
|
|
||||||
|
<item>Write "i = i + (OFFS + 3)". When seeing the opening parenthesis, the
|
||||||
|
compiler will start a new expression evaluation for the stuff in the braces,
|
||||||
|
and since all operands in the subexpression are constant, it will detect this
|
||||||
|
and reduce the code to one fetch, one addition and one store.
|
||||||
|
|
||||||
|
</enum>
|
||||||
|
|
||||||
|
|
||||||
|
<sect>Case labels in a switch statments are checked in source order<p>
|
||||||
|
|
||||||
|
Labels that appear first in a switch statement are tested first. So, if your
|
||||||
|
switch statement contains labels that are selected most of the time, put them
|
||||||
|
first in your source code. This will speed up the code.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<sect>Use the preincrement and predecrement operators<p>
|
||||||
|
|
||||||
|
The compiler not always smart enough to figure out, if the rvalue of an
|
||||||
|
increment is used or not. So it has to save and restore that value when
|
||||||
|
producing code for the postincrement and postdecrement operators, even if this
|
||||||
|
value is never used. To avoid the additional overhead, use the preincrement
|
||||||
|
and predecrement operators if you don't need the resulting value. That means,
|
||||||
|
use
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
...
|
||||||
|
++i;
|
||||||
|
...
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
instead of
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
...
|
||||||
|
i++;
|
||||||
|
...
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<sect>Use constants to access absolute memory locations<p>
|
||||||
|
|
||||||
|
The compiler produces optimized code, if the value of a pointer is a constant.
|
||||||
|
So, to access direct memory locations, use
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
#define VDC_DATA 0xD601
|
||||||
|
*(char*)VDC_STATUS = 0x01;
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
That will be translated to
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
lda #$01
|
||||||
|
sta $D600
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
The constant value detection works also for struct pointers and arrays, if the
|
||||||
|
subscript is a constant. So
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
#define VDC ((unsigned char*)0xD600)
|
||||||
|
#define STATUS 0x01
|
||||||
|
VDC [STATUS] = 0x01;
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
will also work.
|
||||||
|
|
||||||
|
If you first load the constant into a variable and use that variable to access
|
||||||
|
an absolute memory location, the generated code will be much slower, since the
|
||||||
|
compiler does not know anything about the contents of the variable.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<sect>Use initialized local variables - but use it with care<p>
|
||||||
|
|
||||||
|
Initialization of local variables when declaring them gives shorter and faster
|
||||||
|
code. So, use
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
int i = 1;
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
instead of
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
int i;
|
||||||
|
i = 1;
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
But beware: To maximize your savings, don't mix uninitialized and initialized
|
||||||
|
variables. Create one block of initialized variables and one of uniniitalized
|
||||||
|
ones. The reason for this is, that the compiler will sum up the space needed
|
||||||
|
for uninitialized variables as long as possible, and then allocate the space
|
||||||
|
once for all these variables. If you mix uninitialized and initialized
|
||||||
|
variables, you force the compiler to allocate space for the uninitialized
|
||||||
|
variables each time, it parses an initialized one. So do this:
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
int i, j;
|
||||||
|
int a = 3;
|
||||||
|
int b = 0;
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
instead of
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
int i;
|
||||||
|
int a = 3;
|
||||||
|
int j;
|
||||||
|
int b = 0;
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
The latter will work, but will create larger and slower code.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<sect>When using the <tt/?:/ operator, cast values that are not ints<p>
|
||||||
|
|
||||||
|
The result type of the <tt/?:/ operator is a long, if one of the second or
|
||||||
|
third operands is a long. If the second operand has been evaluated and it was
|
||||||
|
of type int, and the compiler detects that the third operand is a long, it has
|
||||||
|
to add an additional <tt/int/ → <tt/long/ conversion for the second
|
||||||
|
operand. However, since the code for the second operand has already been
|
||||||
|
emitted, this gives much worse code.
|
||||||
|
|
||||||
|
Look at this:
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
long f (long a)
|
||||||
|
{
|
||||||
|
return (a != 0)? 1 : a;
|
||||||
|
}
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
When the compiler sees the literal "1", it does not know, that the result type
|
||||||
|
of the <tt/?:/ operator is a long, so it will emit code to load a integer
|
||||||
|
constant 1. After parsing "a", which is a long, a <tt/int/ → <tt/long/
|
||||||
|
conversion has to be applied to the second operand. This creates one
|
||||||
|
additional jump, and an additional code for the conversion.
|
||||||
|
|
||||||
|
A better way would have been to write:
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
long f (long a)
|
||||||
|
{
|
||||||
|
return (a != 0)? 1L : a;
|
||||||
|
}
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
By forcing the literal "1" to be of type long, the correct code is created in
|
||||||
|
the first place, and no additional conversion code is needed.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<sect>Use the array operator [] even for pointers<p>
|
||||||
|
|
||||||
|
When addressing an array via a pointer, don't use the plus and dereference
|
||||||
|
operators, but the array operator. This will generate better code in some
|
||||||
|
common cases.
|
||||||
|
|
||||||
|
Don't use
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
char* a;
|
||||||
|
char b, c;
|
||||||
|
char b = *(a + c);
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
Use
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
char* a;
|
||||||
|
char b, c;
|
||||||
|
char b = a[c];
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
instead.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<sect>Use register variables with care<p>
|
||||||
|
|
||||||
|
Register variables may give faster and shorter code, but they do also have an
|
||||||
|
overhead. Register variables are actually zero page locations, so using them
|
||||||
|
saves roughly one cycle per access. Since the old values have to be saved and
|
||||||
|
restored, there is an overhead of about 70 cycles per 2 byte variable. It is
|
||||||
|
easy to see, that - apart from the additional code that is needed to save and
|
||||||
|
restore the values - you need to make heavy use of a variable to justify the
|
||||||
|
overhead.
|
||||||
|
|
||||||
|
An exception are pointers, especially char pointers. The optimizer has code to
|
||||||
|
detect and transform the most common pointer operations if the pointer
|
||||||
|
variable is a register variable. Declaring heavily used character pointers as
|
||||||
|
register may give significant gains in speed and size.
|
||||||
|
|
||||||
|
And remember: Register variables must be enabled with <tt/-Or/.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<sect>Decimal constants greater than 0x7FFF are actually long ints<p>
|
||||||
|
|
||||||
|
The language rules for constant numeric values specify that decimal constants
|
||||||
|
without a type suffix that are not in integer range must be of type long int
|
||||||
|
or unsigned long int. This means that a simple constant like 40000 is of type
|
||||||
|
long int, and may cause an expression to be evaluated with 32 bits.
|
||||||
|
|
||||||
|
An example is:
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
unsigned val;
|
||||||
|
...
|
||||||
|
if (val < 65535) {
|
||||||
|
...
|
||||||
|
}
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
Here, the compare is evaluated using 32 bit precision. This makes the code
|
||||||
|
larger and a lot slower.
|
||||||
|
|
||||||
|
Using
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
unsigned val;
|
||||||
|
...
|
||||||
|
if (val < 0xFFFF) {
|
||||||
|
...
|
||||||
|
}
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
<tscreen><verb>
|
||||||
|
unsigned val;
|
||||||
|
...
|
||||||
|
if (val < 65535U) {
|
||||||
|
...
|
||||||
|
}
|
||||||
|
</verb></tscreen>
|
||||||
|
|
||||||
|
instead will give shorter and faster code.
|
||||||
|
|
||||||
|
|
||||||
|
</article>
|
||||||
|
|
335
doc/coding.txt
335
doc/coding.txt
|
@ -1,335 +0,0 @@
|
||||||
|
|
||||||
How to generate the most effective code with cc65.
|
|
||||||
|
|
||||||
|
|
||||||
1. Use prototypes.
|
|
||||||
|
|
||||||
This will not only help to find errors between separate modules, it will
|
|
||||||
also generate better code, since the compiler must not assume that a
|
|
||||||
variable sized parameter list is in place and must not pass the argument
|
|
||||||
count to the called function. This will lead to shorter and faster code.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
2. Don't declare auto variables in nested function blocks.
|
|
||||||
|
|
||||||
Variable declarations in nested blocks are usually a good thing. But with
|
|
||||||
cc65, there is a drawback: Since the compiler generates code in one pass,
|
|
||||||
it must create the variables on the stack each time the block is entered
|
|
||||||
and destroy them when the block is left. This causes a speed penalty and
|
|
||||||
larger code.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
3. Remember that the compiler does not optimize.
|
|
||||||
|
|
||||||
The compiler needs hints from you about the code to generate. When
|
|
||||||
accessing indexed data structures, get a pointer to the element and
|
|
||||||
use this pointer instead of calculating the index again and again.
|
|
||||||
If you want to have your loops unrolled, or loop invariant code moved
|
|
||||||
outside the loop, you have to do that yourself.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
4. Longs are slow!
|
|
||||||
|
|
||||||
While long support is necessary for some things, it's really, really slow
|
|
||||||
on the 6502. Remember that any long variable will use 4 bytes of memory,
|
|
||||||
and any operation works on double the data compared to an int.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
5. Use unsigned types wherever possible.
|
|
||||||
|
|
||||||
The CPU has no opcodes to handle signed values greater than 8 bit. So
|
|
||||||
sign extension, test of signedness etc. has to be done by hand. The
|
|
||||||
code to handle signed operations is usually a bit slower than the same
|
|
||||||
code for unsigned types.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
6. Use chars instead of ints if possible.
|
|
||||||
|
|
||||||
While in arithmetic operations, chars are immidiately promoted to ints,
|
|
||||||
they are passed as chars in parameter lists and are accessed as chars
|
|
||||||
in variables. The code generated is usually not much smaller, but it
|
|
||||||
is faster, since accessing chars is faster. For several operations, the
|
|
||||||
generated code may be better if intermediate results that are known not
|
|
||||||
to be larger than 8 bit are casted to chars.
|
|
||||||
|
|
||||||
When doing
|
|
||||||
|
|
||||||
unsigned char a;
|
|
||||||
...
|
|
||||||
if ((a & 0x0F) == 0)
|
|
||||||
|
|
||||||
the result of the & operator is an int because of the int promotion
|
|
||||||
rules of the language. So the compare is also done with 16 bits. When
|
|
||||||
using
|
|
||||||
|
|
||||||
unsigned char a;
|
|
||||||
...
|
|
||||||
if ((unsigned char)(a & 0x0F) == 0)
|
|
||||||
|
|
||||||
the generated code is much shorter, since the operation is done with
|
|
||||||
8 bits instead of 16.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
7. Make the size of your array elements one of 1, 2, 4, 8.
|
|
||||||
|
|
||||||
When indexing into an array, the compiler has to calculate the byte
|
|
||||||
offset into the array, which is the index multiplied by the size of
|
|
||||||
one element. When doing the multiplication, the compiler will do a
|
|
||||||
strength reduction, that is, replace the multiplication by a shift
|
|
||||||
if possible. For the values 2, 4 and 8, there are even more specialized
|
|
||||||
subroutines available. So, array access is fastest when using one of
|
|
||||||
these sizes.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
8. Expressions are evaluated from left to right.
|
|
||||||
|
|
||||||
Since cc65 is not building an explicit expression tree when parsing an
|
|
||||||
expression, constant subexpressions may not be detected and optimized
|
|
||||||
properly if you don't help. Look at this example:
|
|
||||||
|
|
||||||
#define OFFS 4
|
|
||||||
int i;
|
|
||||||
i = i + OFFS + 3;
|
|
||||||
|
|
||||||
The expression is parsed from left to right, that means, the compiler sees
|
|
||||||
'i', and puts it contents into the secondary register. Next is OFFS, which
|
|
||||||
is constant. The compiler emits code to add a constant to the secondary
|
|
||||||
register. Same thing again for the constant 3. So the code produced
|
|
||||||
contains a fetch of 'i', two additions of constants, and a store (into
|
|
||||||
'i'). Unfortunately, the compiler does not see, that "OFFS + 3" is a
|
|
||||||
constant for itself, since it does it's evaluation from left to right.
|
|
||||||
There are some ways to help the compiler to recognize expression like
|
|
||||||
this:
|
|
||||||
|
|
||||||
a. Write "i = OFFS + 3 + i;". Since the first and second operand are
|
|
||||||
constant, the compiler will evaluate them at compile time reducing the
|
|
||||||
code to a fetch, one addition (secondary + constant) and one store.
|
|
||||||
|
|
||||||
b. Write "i = i + (OFFS + 3)". When seeing the opening parenthesis, the
|
|
||||||
compiler will start a new expression evaluation for the stuff in the
|
|
||||||
braces, and since all operands in the subexpression are constant, it
|
|
||||||
will detect this and reduce the code to one fetch, one addition and
|
|
||||||
one store.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
9. Case labels in a switch statments are checked in source order.
|
|
||||||
|
|
||||||
Labels that appear first in a switch statement are tested first. So,
|
|
||||||
if your switch statement contains labels that are selected most of
|
|
||||||
the time, put them first in your source code. This will speed up the
|
|
||||||
code.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
10. Use the preincrement and predecrement operators.
|
|
||||||
|
|
||||||
The compiler not always smart enough to figure out, if the rvalue of an
|
|
||||||
increment is used or not. So it has to save and restore that value when
|
|
||||||
producing code for the postincrement and postdecrement operators, even if
|
|
||||||
this value is never used. To avoid the additional overhead, use the
|
|
||||||
preincrement and predecrement operators if you don't need the resulting
|
|
||||||
value. That means, use
|
|
||||||
|
|
||||||
...
|
|
||||||
++i;
|
|
||||||
...
|
|
||||||
|
|
||||||
instead of
|
|
||||||
|
|
||||||
...
|
|
||||||
i++;
|
|
||||||
...
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
11. Use constants to access absolute memory locations.
|
|
||||||
|
|
||||||
The compiler produces optimized code, if the value of a pointer is a
|
|
||||||
constant. So, to access direct memory locations, use
|
|
||||||
|
|
||||||
#define VDC_DATA 0xD601
|
|
||||||
*(char*)VDC_STATUS = 0x01;
|
|
||||||
|
|
||||||
That will be translated to
|
|
||||||
|
|
||||||
lda #$01
|
|
||||||
sta $D600
|
|
||||||
|
|
||||||
The constant value detection works also for struct pointers and arrays,
|
|
||||||
if the subscript is a constant. So
|
|
||||||
|
|
||||||
#define VDC ((unsigned char*)0xD600)
|
|
||||||
#define STATUS 0x01
|
|
||||||
VDC [STATUS] = 0x01;
|
|
||||||
|
|
||||||
will also work.
|
|
||||||
|
|
||||||
If you first load the constant into a variable and use that variable to
|
|
||||||
access an absolute memory location, the generated code will be much
|
|
||||||
slower, since the compiler does not know anything about the contents of
|
|
||||||
the variable.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
12. Use initialized local variables - but use it with care.
|
|
||||||
|
|
||||||
Initialization of local variables when declaring them gives shorter
|
|
||||||
and faster code. So, use
|
|
||||||
|
|
||||||
int i = 1;
|
|
||||||
|
|
||||||
instead of
|
|
||||||
|
|
||||||
int i;
|
|
||||||
i = 1;
|
|
||||||
|
|
||||||
But beware: To maximize your savings, don't mix uninitialized and
|
|
||||||
initialized variables. Create one block of initialized variables and
|
|
||||||
one of uniniitalized ones. The reason for this is, that the compiler
|
|
||||||
will sum up the space needed for uninitialized variables as long as
|
|
||||||
possible, and then allocate the space once for all these variables.
|
|
||||||
If you mix uninitialized and initialized variables, you force the
|
|
||||||
compiler to allocate space for the uninitialized variables each time,
|
|
||||||
it parses an initialized one. So do this:
|
|
||||||
|
|
||||||
int i, j;
|
|
||||||
int a = 3;
|
|
||||||
int b = 0;
|
|
||||||
|
|
||||||
instead of
|
|
||||||
|
|
||||||
int i;
|
|
||||||
int a = 3;
|
|
||||||
int j;
|
|
||||||
int b = 0;
|
|
||||||
|
|
||||||
The latter will work, but will create larger and slower code.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
13. When using the ?: operator, cast values that are not ints.
|
|
||||||
|
|
||||||
The result type of the ?: operator is a long, if one of the second or
|
|
||||||
third operands is a long. If the second operand has been evaluated and
|
|
||||||
it was of type int, and the compiler detects that the third operand is
|
|
||||||
a long, it has to add an additional int->long conversion for the
|
|
||||||
second operand. However, since the code for the second operand has
|
|
||||||
already been emitted, this gives much worse code.
|
|
||||||
|
|
||||||
Look at this:
|
|
||||||
|
|
||||||
long f (long a)
|
|
||||||
{
|
|
||||||
return (a != 0)? 1 : a;
|
|
||||||
}
|
|
||||||
|
|
||||||
When the compiler sees the literal "1", it does not know, that the
|
|
||||||
result type of the ?: operator is a long, so it will emit code to load
|
|
||||||
a integer constant 1. After parsing "a", which is a long, a int->long
|
|
||||||
conversion has to be applied to the second operand. This creates one
|
|
||||||
additional jump, and an additional code for the conversion.
|
|
||||||
|
|
||||||
A better way would have been to write:
|
|
||||||
|
|
||||||
long f (long a)
|
|
||||||
{
|
|
||||||
return (a != 0)? 1L : a;
|
|
||||||
}
|
|
||||||
|
|
||||||
By forcing the literal "1" to be of type long, the correct code is
|
|
||||||
created in the first place, and no additional conversion code is
|
|
||||||
needed.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
14. Use the array operator [] even for pointers.
|
|
||||||
|
|
||||||
When addressing an array via a pointer, don't use the plus and
|
|
||||||
dereference operators, but the array operator. This will generate
|
|
||||||
better code in some common cases.
|
|
||||||
|
|
||||||
Don't use
|
|
||||||
|
|
||||||
char* a;
|
|
||||||
char b, c;
|
|
||||||
char b = *(a + c);
|
|
||||||
|
|
||||||
Use
|
|
||||||
|
|
||||||
char* a;
|
|
||||||
char b, c;
|
|
||||||
char b = a[c];
|
|
||||||
|
|
||||||
instead.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
15. Use register variables with care.
|
|
||||||
|
|
||||||
Register variables may give faster and shorter code, but they do also
|
|
||||||
have an overhead. Register variables are actually zero page
|
|
||||||
locations, so using them saves roughly one cycle per access. Since
|
|
||||||
the old values have to be saved and restored, there is an overhead of
|
|
||||||
about 70 cycles per 2 byte variable. It is easy to see, that - apart
|
|
||||||
from the additional code that is needed to save and restore the
|
|
||||||
values - you need to make heavy use of a variable to justify the
|
|
||||||
overhead.
|
|
||||||
|
|
||||||
An exception are pointers, especially char pointers. The optimizer
|
|
||||||
has code to detect and transform the most common pointer operations
|
|
||||||
if the pointer variable is a register variable. Declaring heavily
|
|
||||||
used character pointers as register may give significant gains in
|
|
||||||
speed and size.
|
|
||||||
|
|
||||||
And remember: Register variables must be enabled with -Or.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
16. Decimal constants greater than 0x7FFF are actually long ints
|
|
||||||
|
|
||||||
The language rules for constant numeric values specify that decimal
|
|
||||||
constants without a type suffix that are not in integer range must be
|
|
||||||
of type long int or unsigned long int. This means that a simple
|
|
||||||
constant like 40000 is of type long int, and may cause an expression
|
|
||||||
to be evaluated with 32 bits.
|
|
||||||
|
|
||||||
An example is:
|
|
||||||
|
|
||||||
unsigned val;
|
|
||||||
...
|
|
||||||
if (val < 65535) {
|
|
||||||
...
|
|
||||||
}
|
|
||||||
|
|
||||||
Here, the compare is evaluated using 32 bit precision. This makes the
|
|
||||||
code larger and a lot slower.
|
|
||||||
|
|
||||||
Using
|
|
||||||
|
|
||||||
unsigned val;
|
|
||||||
...
|
|
||||||
if (val < 0xFFFF) {
|
|
||||||
...
|
|
||||||
}
|
|
||||||
|
|
||||||
or
|
|
||||||
|
|
||||||
unsigned val;
|
|
||||||
...
|
|
||||||
if (val < 65535U) {
|
|
||||||
...
|
|
||||||
}
|
|
||||||
|
|
||||||
instead will give shorter and faster code.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -31,7 +31,7 @@ Main documentation page, contains links to other available stuff.
|
||||||
<tag><htmlurl url="cl65.html" name="cl65.html"></tag>
|
<tag><htmlurl url="cl65.html" name="cl65.html"></tag>
|
||||||
Describes the cl65 compile & link utility.
|
Describes the cl65 compile & link utility.
|
||||||
|
|
||||||
<tag><htmlurl url="coding.txt" name="coding.txt"></tag>
|
<tag><htmlurl url="coding.html" name="coding.html"></tag>
|
||||||
Containes hints on creating the most effective code with cc65.
|
Containes hints on creating the most effective code with cc65.
|
||||||
|
|
||||||
<tag><htmlurl url="compile.txt" name="compile.txt"></tag>
|
<tag><htmlurl url="compile.txt" name="compile.txt"></tag>
|
||||||
|
@ -40,11 +40,14 @@ Main documentation page, contains links to other available stuff.
|
||||||
<tag><htmlurl url="debugging.txt" name="debugging.txt"></tag>
|
<tag><htmlurl url="debugging.txt" name="debugging.txt"></tag>
|
||||||
Debug programs using the VICE emulator.
|
Debug programs using the VICE emulator.
|
||||||
|
|
||||||
|
<tag><htmlurl url="dio.html" name="dio.html"></tag>
|
||||||
|
Low level disk I/O API.
|
||||||
|
|
||||||
<tag><htmlurl url="geos.html" name="geos.html"></tag>
|
<tag><htmlurl url="geos.html" name="geos.html"></tag>
|
||||||
GEOSLib manual in several formats.
|
GEOSLib manual in several formats.
|
||||||
|
|
||||||
<tag><htmlurl url="grc.txt" name="grc.txt"></tag>
|
<tag><htmlurl url="grc.txt" name="grc.txt"></tag>
|
||||||
grc.txt - Describes the GEOS resource compiler (grc).
|
Describes the GEOS resource compiler (grc).
|
||||||
|
|
||||||
<tag><htmlurl url="index.html" name="index.html"></tag>
|
<tag><htmlurl url="index.html" name="index.html"></tag>
|
||||||
This file.
|
This file.
|
||||||
|
|
Loading…
Add table
Reference in a new issue