Dwarf started as a joke about elf. This is not to say that dwarf is a joke. Dwarf is one of the most used formats for storing debugging information, and its origin is linked to the Executable and Linkable Format (elf).
The official site for Dwarf contains a nice summary of dwarf. However, I wanted an even shorter description to refresh my memory if I ever need to.
Let's discuss this hypothetical program:
brush: c
int main(int argc, char **argv) {
return argc;
}
Dwarf is organized as a tree hierarchy of Debug Information Entries (DIEs). Each DIE has a tag (DW_TAG_xxx) and zero or more attributes (DW_AT_xxx), which give the information relevant to the tag.
Flattening the tree
The tree is flattened into a sequence of DW_TAGs. The tree is flattened depth first, with a list of siblings terminated by a null entry. If finding a sibling is important, the producer might insert a DW_AT_sibling that points to the next sibling.
Main Tags
- Compilation unit: This is the typical C file. E.g. main.c. Typical information: directory, file name, highest and lowest PC, producer (compiler used).
- Subprogram: This is a function. E.g. main, printf. Typical information: name, file, start line, return type, highest and lowest PC, external.
- Formal parameter: use to pass the name and type of parameters to a function. Also indicated where the parameter is stored at. E.g. register, memory address
Type information
Information about the types used in the program. It shows that dwarf had C origins.
- DW_AT_type: The basic data types provided by the compiler. E.g. int, short. Typical information: name, size, encoding. Optional, bit size, bit offset.
- Pointer type: This is a pointer to another type (base or other)
- Const type: Used to add the const qualifier of C.
- DW_TAG_structure_type: structure
- DW_TAG_member
- DW_TAG_union_type : union
- DW_TAG_enumeration_type: enumeration
- DW_TAG_typedef: typedef
- DW_TAG_array_type: array
- DW_TAG_subrange_type: defines the range for an array
- DW_TAG_inheritance: defines class inheritance
Variables are special:
- Variable: with type and name information. Also very important is where the data is stored. This is with the location DIE. This links a set of PC values to a location. A location can be a register or some memory location obtained by some computation. E.g. position in the stack frame.
Dwarf expressions
In several places it is needed to compute an address or some other value. This is accomplished with a stack-based machine that allows calculating very complex expressions. These are built with the DW_OP_xxx operations. There are a lot of DW_OP_xxx operations, covering even control code.
Location information
Location information provides a means to calculate the address of a variable in memory or in a register. Due to the nature of optimizations, the location a variable is stored at can change for different parts of the program.
If the variable has a single location for the lifetime, then this information is placed inline. If the variable has multiple locations depending on the lifetime of the variable, the information is moved to the .debug_loc section. This section holds a list of single locations delimited by start and end addresses.
Data encoding
Data encoding is taken seriously in Dwarf because debugging information can easily exceed the size of the program even if carefully encoded. Suffice to say that it is not meant to be read by the naked eye and that complex state machines are used to encode the information.
Line information
Line information maps the assembly addresses with the original source code. Special compression techniques are used for this.
Elf sections
Dwarf data is split in different sections. When stored in an elf file, the following sections are created:
- .debug_info: Contains all the main DIE blocks.
- .debug_types: Contains all the type DIE blocks.
- .debug_line: Contains the mapping between line numbers and PC addresses
- .debug_ranges: Contains address ranges used in DIE blocks (DW_AT_ranges). I guess it is used to define live ranges for variables.
- .debug_loc: Contains location information used to describe the location variables are stored at (DW_AT_location).
- .debug_str: Contains strings reference by the DIE blocks in the .debug_info section.
- .debug_abbrev: Abbreviations used in the .debug_info section.
- .debug_aranges: Maps between PC and line numbers. I am not sure the relation netween this and the .debug_loc section.
- .debug_frame: Information about the call frame. I guess it is used to unwind the stack.
- .debug_macinfo: Information about macros.
- .debug_pubnames: Information about global functions and variables.
- .debug_pubtypes: Information about global types.
Cover picture by mustamirri.
https://www.ibm.com/developerworks/aix/library/au-dwarf-debug-format/