Fork me on GitHub

10. C Startup

It is not possible to directly execute C code, when the processor comes out of reset. Since, unlike assembly language, C programs need some basic pre-requisites to be satisfied. This section will describe the pre-requisites and how to meet the pre-requisites.

We will take the example of C program that calculates the sum of an array as an example. And by the end of this section, we will be able to perform the necessary setup, transfer control to the C code and execute it.

Listing 12. Sum of Array in C

static int arr[] = { 1, 10, 4, 5, 6, 7 };
static int sum;
static const int n = sizeof(arr) / sizeof(arr[0]);

int main()
        int i;

        for (i = 0; i < n; i++)
                sum += arr[i];

Before transferring control to C code, the following have to be setup correctly.

  1. Stack
  2. Global variables

    1. Initialized
    2. Uninitialized
  3. Read-only data

10.1. Stack

C uses the stack for storing local (auto) variables, passing function arguments, storing return address, etc. So it is essential that the stack be setup correctly, before transferring control to C code.

Stacks are highly flexible in the ARM architecture, since the implementation is completely left to the software. For people not familiar with the ARM architecture a overview is provided in Appendix C, ARM Stacks.

To make sure that code generated by different compilers is interroperable, ARM has created the ARM Architecture Procedure Call Standard (AAPCS). The register to be used as the stack pointer and the direction in which the stack grows is all dictated by the AAPCS. According to the AAPCS, register r13 is to be used as the stack pointer. Also the stack should be full-descending.

One way of placing global variables and the stack is shown in the following diagram.

Figure 5. Stack Placement


So all that has to be done in the startup code is to point r13 at the highest RAM address, so that the stack can grow downwards (towards lower addresses). For the connex board this can be acheived using the following ARM instruction.

        ldr sp, =0xA4000000

Note that the the assembler provides an alias sp for the r13 register.

[Note] Note

The address 0xA4000000 itself does not correspond to RAM. The RAM ends at 0xA3FFFFFF. But that is OK, since the stack is full-descending, during the first push the stack pointer will be decremented first and the value will be stored.

10.2. Global Variables

When C code is compiled, the compiler places initialized global variables in the .data section. So just as with the assembly, the .data has to be copied from Flash to RAM.

The C language guarantees that all uninitialized global variables will be initialized to zero. When C programs are compiled, a separate section called .bss is used for uninitialized variables. Since the value of these variables are all zeroes to start with, they do not have to be stored in Flash. Before transferring control to C code, the memory locations corresponding to these variables have to be initialized to zero.

10.3. Read-only Data

GCC places global variables marked as const in a separate section, called .rodata. The .rodata is also used for storing string constants.

Since contents of .rodata section will not be modified, they can be placed in Flash. The linker script has to modified to accomodate this.

10.4. Startup Code

Now that we know the pre-requisites we can create the linker script and the startup code. The linker script Listing 10, “Linker Script with Section Copy Symbols” is modified to accomodate the following.

  1. .bss section placement
  2. vectors section placement
  3. .rodata section placement

The .bss is placed right after .data section in RAM. Symbols to locate the start of .bss and end of .bss are also created in the linker script. The .rodata is placed right after .text section in Flash. The following diagram shows the placement of the various sections.

Figure 6. Section Placement


Listing 13. Linker Script for C code

        . = 0x00000000;
        .text : {
              * (vectors);
              * (.text);
        .rodata : {
              * (.rodata);
        flash_sdata = .;

        . = 0xA0000000;
        ram_sdata = .;
        .data : AT (flash_sdata) {
              * (.data);
        ram_edata = .;
        data_size = ram_edata - ram_sdata;

        sbss = .;
        .bss : {
             * (.bss);
        ebss = .;
        bss_size = ebss - sbss;

The startup code has the following parts

  1. exception vectors
  2. code to copy the .data from Flash to RAM
  3. code to zero out the .bss
  4. code to setup the stack pointer
  5. branch to main

Listing 14. C Startup Assembly

        .section "vectors"
reset:  b     start
undef:  b     undef
swi:    b     swi
pabt:   b     pabt
dabt:   b     dabt
irq:    b     irq
fiq:    b     fiq

        @@ Copy data to RAM.
        ldr   r0, =flash_sdata
        ldr   r1, =ram_sdata
        ldr   r2, =data_size

        @@ Handle data_size == 0
        cmp   r2, #0
        beq   init_bss
        ldrb   r4, [r0], #1
        strb   r4, [r1], #1
        subs   r2, r2, #1
        bne    copy

        @@ Initialize .bss
        ldr   r0, =sbss
        ldr   r1, =ebss
        ldr   r2, =bss_size

        @@ Handle bss_size == 0
        cmp   r2, #0
        beq   init_stack

        mov   r4, #0
        strb  r4, [r0], #1
        subs  r2, r2, #1
        bne   zero

        @@ Initialize the stack pointer
        ldr   sp, =0xA4000000

        bl    main

stop:   b     stop

To compile the code, it is not necessary to invoke the assembler, compiler and linker individually. gcc is intelligent enough to do that for us.

As promised before, we will compile and execute the C code shown in Listing 12, “Sum of Array in C”.

$ arm-none-eabi-gcc -nostdlib -o csum.elf -T csum.c startup.s

The -nostdlib option is used to specify that the standard C library should not be linked in. A little extra care has to be taken when the C library is linked in. This is discussed in Section 11, “Using the C Library”.

A dump of the symbol table will give a better picture of how things have been placed in memory.

$ arm-none-eabi-nm -n csum.elf
00000000 t reset        ❶
00000004 A bss_size
00000004 t undef
00000008 t swi
0000000c t pabt
00000010 t dabt
00000018 A data_size
00000018 t irq
0000001c t fiq
00000020 T main
00000090 t start        ❷
000000a0 t copy
000000b0 t init_bss
000000c4 t zero
000000d0 t init_stack
000000d8 t stop
000000f4 r n            ❸
000000f8 A flash_sdata
a0000000 d arr          ❹
a0000000 A ram_sdata
a0000018 A ram_edata
a0000018 A sbss
a0000018 b sum          ❺
a000001c A ebss

reset and the rest of the exception vectors are placed starting from 0x0.

The assembly code is placed right after the 8 exception vectors (8 * 4 = 32 = 0x20).

The read-only data n, is placed in Flash after the code.

The initialized data arr, an array of 6 integers, is placed at the start of RAM 0xA0000000.

The uninitialized data sum is placed after the array of 6 integers. (6 * 4 = 24 = 0x18)

To execute the program, convert the program to .bin format, execute in Qemu, and dump the sum variable located at 0xA0000018.

$ arm-none-eabi-objcopy -O binary csum.elf csum.bin
$ dd if=csum.bin of=flash.bin bs=4096 conv=notrunc
$ qemu-system-arm -M connex -pflash flash.bin -nographic -serial /dev/null
(qemu) xp /6dw 0xa0000000
a0000000:          1         10          4          5
a0000010:          6          7
(qemu) xp /1dw 0xa0000018
a0000018:         33