While writing a multi-file program, each file is assembled individually into object files. The linker combines these object files to form the final executable.
While combining the object files together, the linker performs the following operations.
We will look into these operations, in detail, in this section.
In a single file program, while producing the object file, all references to labels are replaced by their corresponding addresses by the assembler. But in a multi-file program, if there are any references to labels defined in another file, the assembler marks these references as "unresolved". When these object files are passed to the linker, the linker determines the values for these references from the other object files, and patches the code with the correct values.
The sum of array example is split into two files, to demonstrate the symbol resolution performed by the linker. The two files will be assembled and their symbol tables examined to show the presence the unresolved references.
The file sum-sub.s contains the sum subroutine, and the file
main.s invokes the subroutine with the required arguments. The
source of the files is shown below.
Listing 4. main.s - Subroutine Invocation
.text
b start @ Skip over the data
arr: .byte 10, 20, 25 @ Read-only array of bytes
eoa: @ Address of end of array + 1
.align
start:
ldr r0, =arr @ r0 = &arr
ldr r1, =eoa @ r1 = &eoa
bl sum @ Invoke the sum subroutine
stop: b stopListing 5. sum-sub.s - Subroutine Definition
@ Args
@ r0: Start address of array
@ r1: End address of array
@
@ Result
@ r3: Sum of Array
.global sum
sum: mov r3, #0 @ r3 = 0
loop: ldrb r2, [r0], #1 @ r2 = *r0++ ; Get array element
add r3, r2, r3 @ r3 += r2 ; Calculate sum
cmp r0, r1 @ if (r0 != r1) ; Check if hit end-of-array
bne loop @ goto loop ; Loop
mov pc, lr @ pc = lr ; Return when doneA word on the .global directive is in order. In C, all variables
declared outside functions are visible to other files, until
explicitly stated as static. In assembly, all labels are static
AKA local (to the file), until explicitly stated that they should be
visible to other files, using the .global directive.
The files are assembled, and the symbol tables are dumped using the
nm command.
$ arm-none-eabi-as -o main.o main.s
$ arm-none-eabi-as -o sum-sub.o sum-sub.s
$ arm-none-eabi-nm main.o
00000004 t arr
00000007 t eoa
00000008 t start
00000018 t stop
U sum
$ arm-none-eabi-nm sum-sub.o
00000004 t loop
00000000 T sumFor now, focus on the letter in the second column, which specifies the
symbol type. A t indicates that the symbol is defined, in the text
section. A u indicates that the symbol is undefined. A letter in
uppercase indicates that the symbol is .global.
It is evident that the symbol sum is defined in sum-sub.o and is
not resolved yet in main.o. When the linker is invoked the symbol
references will be resolved, and the executable will be produced.
Relocation is the process of changing addresses already assigned to labels. This will also involve patching up all label references to reflect the newly assigned address. Primarily, relocation is performed for the following two reasons:
To understand the process of relocation, an understanding of the concept of sections is essential.
Code and data have different run time requirements. For example code
can be placed in read-only memory, and data might require read-write
memory. It would be convenient, if code and data is not
interleaved. For this purpose, programs are divided into
sections. Most programs have at least two sections, .text for code
and .data for data. Assembler directives .text and .data, are
used to switch back and forth between the two sections.
It helps to imagine each section as a bucket. When the assembler hits a section directive, it puts the code/data following the directive in the selected bucket. Thus the code/data that belong to particular section appear in contiguous locations. The following figures shows how the assembler re-arranges data into sections.
Now that we have an understanding of sections, let us look into the primary reasons for which relocation is performed.
When dealing with multi-file programs, the sections with the same name
(example .text) might appear, in each file. The linker is
responsible for merging the sections from the input files, into
sections of the output file. By default, the sections, with the same
name, from each file is placed contiguously and the label references
are patched to reflect the new address.
The effects of section merging can be seen by looking at the symbol
table of the object files and the corresponding executable file. The
multi-file sum of array program can be used to illustrate section
merging. The symbol table of the object files main.o and sum-sub.o
and the symbol table of the executable file sum.elf is shown below.
$ arm-none-eabi-nm main.o
00000004 t arr
00000007 t eoa
00000008 t start
00000018 t stop
U sum
$ arm-none-eabi-nm sum-sub.o
00000004 t loop ❶
00000000 T sum
$ arm-none-eabi-ld -Ttext=0x0 -o sum.elf main.o sum-sub.o
$ arm-none-eabi-nm sum.elf
...
00000004 t arr
00000007 t eoa
00000008 t start
00000018 t stop
00000028 t loop ❷
00000024 T sumWhen a program is assembled, each section is assumed to start from address 0. And thus labels are assigned values relative to start of the section. When the final executable is created, the section is placed at some address X. And all references to the labels defined within the section, are incremented by X, so that they point to the new location.
The placement of each section at a particular location in memory and the patching of all references to the labels in the section, is done by the linker.
The effects of section placement can be seen by looking at the symbol
table of the object file and the corresponding executable file. The
single file sum of array program can be used to illustrate section
placement. To make things clearer, we will place the .text section
at address 0x100.
$ arm-none-eabi-as -o sum.o sum.s
$ arm-none-eabi-nm -n sum.o
00000000 t entry ❶
00000004 t arr
00000007 t eoa
00000008 t start
00000014 t loop
00000024 t stop
$ arm-none-eabi-ld -Ttext=0x100 -o sum.elf sum.o ❷
$ arm-none-eabi-nm -n sum.elf
00000100 t entry ❸
00000104 t arr
00000107 t eoa
00000108 t start
00000114 t loop
00000124 t stop
...
The address for labels are assigned starting from 0 within a
section.
| |
When the executable is created the linker is instructed to place
the text section at address 0x100.
| |
The address for labels in the .text section are re-assigned
starting from 0x100, and all label references will be patched to
reflect this.
|
The process of section merging and placement is shown in the following figure.