codman: Add documentation

Provide a description of the purpose of codman and some examples of how
to use it.

Series-to: concept
Cover-letter:
codman: Add a new source-code analysis tool

Add a new tool called 'codman' (code manager) for analysing source code
usage in U-Boot builds. This tool determines which files and lines of
code are actually compiled based on the build configuration.

The tool provides three analysis methods:
- unifdef: Static preprocessor analysis (default)
- DWARF: Debug information from compiled code (-w)
- (experimental) LSP: Language server analysis using clangd (-l)

Codman supports:

- File-level analysis: which files are compiled vs unused
- Line-level analysis: which lines are active vs removed by preprocessor
- Kconfig-impact analysis with -a/--adjust option
- Various output formats: stats, directories, detail, summary

Since there is quite a lot of processing involved, Codman uses parallel
processing where possible.

This tool is admittedly not quite up to my normal code quality, but it
has been an interesting experiment in using Claude to create something
from scratch.

The unifdef part of the tool benefits from some patches I created for
that tool:
- O(1) algorithm for symbol lookup, instead of O(n) - faster!
- support for IS_ENABLED(), CONFIG_IS_ENABLED()

Please get in touch if you would like the patches.

This series also includes a minor improvement to buildman and a tidy-up
of the tout library to reduce code duplication.
END

Signed-off-by: Simon Glass <simon.glass@canonical.com>
Series-links: 1:65
This commit is contained in:
Simon Glass
2025-11-24 06:38:55 -07:00
parent d2772a2359
commit 9d6670f89a
3 changed files with 428 additions and 0 deletions

1
doc/develop/codman.rst Symbolic link
View File

@@ -0,0 +1 @@
../../tools/codman/codman.rst

View File

@@ -101,6 +101,7 @@ Refactoring
checkpatch
coccinelle
codman
qconfig
Code quality

426
tools/codman/codman.rst Normal file
View File

@@ -0,0 +1,426 @@
.. SPDX-License-Identifier: GPL-2.0+
===================
Codman code manager
===================
The codman tool analyses U-Boot builds to determine which source files and lines
of code are actually compiled and used.
U-Boot is a massive project with thousands of files and nearly endless
configuration possibilities. A single board configuration might only compile a
small fraction of the total source tree. Codman can help answer questions like:
* "I just enabled ``CONFIG_CMD_NET``, how much code did that actually add?"
* "How much code would I remove by disabling ``CONFIG_CMDLINE``?
Simply searching for ``CONFIG_`` macros or header inclusions is tricky because
the build logic takes many forms: Makefile rules, #ifdefs, IS_ENABLED(),
CONFIG_IS_ENABLED() and static inlines. The end result is board-specific in any
case.
Codman cuts through this complexity by analysing the actual build artifacts
generated by the compiler:
#. Builds the specified board
#. Parses the ``.cmd`` files to find which source file were compiled.
#. Analyses the source code (with unifdef) or the object files (dwarf tables)
to figure out which files and lines were compiled.
Usage
=====
Basic usage, from within the U-Boot source tree::
./tools/codman/codman.py -b <board> [flags] <command> [command-flags]
Codman operations does out-of-tree builds, meaning that the object files end up
in a separate directory for each board. Use ``--build-base`` to set that. The
default is ``/tmp/b`` meaning that a sandbox build would end up in
``/tmp/b/sandbox``, for eaxmple.
Relationship to LSPs
====================
LSPs can allow you to see unused code in your IDE, which is very handy for
interactive use. Codman is more about getting a broader picture, although it
does allow individual files to be listed. Codman does include a ``--lsp`` option
but this doesn't work particularly well.
Commands
========
The basic functionality is accessed via these commands:
* ``stats`` - Show statistics (default if no command given)
* ``dirs`` - Show directory breakdown
* ``unused`` - List unused files
* ``used`` - List used files
* ``summary`` - Show per-file summary
* ``detail <file>...`` - Show line-by-line analysis of one or more files
* ``copy-used <dir>`` - Copy used source files to a directory
This will build the board and show statistics about source file usage.
Adjusting Configuration (-a)
============================
Sometimes you want to explore "what if" scenarios without manually editing
``defconfig`` files or running menuconfig. The ``-a`` (or ``--adjust``) option
allows you to modify the Kconfig configuration on the fly before the analysis
build runs.
This is particularly useful for **impact analysis**: seeing exactly how much
code a specific feature adds to the build.
Syntax
------
The `CONFIG_` prefix is optional.
* ``-a CONFIG_OPTION``: Enable a boolean option (sets to 'y').
* ``-a ~CONFIG_OPTION``: Disable an option.
* ``-a OPTION=val``: Set an option (``CONFIG_OPTION``) to a specific value.
* ``-a CONFIG_A,CONFIG_B``: Set multiple options (comma-separated).
Examples
--------
**Check the impact of USB:**
Enable the USB subsystem on the sandbox board and see how the code stats change::
codman -b sandbox -a CMD_USB stats
**Disable Networking:**
See what code remains active when networking is explicitly disabled::
codman -b sandbox -a ~NET,NO_NET stats
**Multiple Adjustments:**
Enable USB and USB storage together::
codman -b sandbox -a CONFIG_CMD_USB -a CONFIG_USB_STORAGE stats
Common Options
==============
Building:
* ``-b, --board <board>`` - Board to build and analyse (default: sandbox, uses buildman)
* ``-B, --build-dir <dir>`` - Use existing build directory instead of building
* ``--build-base <dir>`` - Base directory for builds (default: /tmp/b)
* ``-n, --no-build`` - Skip building, use existing build directory
* ``-a, --adjust <config>`` - Adjust CONFIG options (see section above)
Line-level analysis:
* ``-w, --dwarf`` - Use DWARF debug info (most accurate, requires rebuild)
* ``-i, --include-headers`` - Include header files in unifdef analysis
Filtering:
* ``-f, --filter <pattern>`` - Filter files by wildcard pattern (e.g.,
``*acpi*``)
Output control:
* ``-v, --verbose`` - Show verbose output
* ``-D, --debug`` - Enable debug mode
* ``--top <N>`` - (for ``stats`` command) Show top N files with most inactive
code (default: 20)
The ``dirs command`` has a few extra options:
* ``-s, --subdirs`` - Show a breakdown by subdirectory
* ``-f, --show-files`` - Show individual files within directories (with ``-s``)
* ``-e, --show-empty`` - Show directories with 0 lines used
Other:
* ``-j, --jobs <N>`` - Number of parallel jobs for line analysis
How to use commands
===================
The following commands show the different ways to use codman. Commands are
specified as positional arguments after the global options.
Basic Statistics (``stats``)
-----------------------------
Show overall statistics for sandbox build::
$ codman -b qemu-x86 stats
======================================================================
FILE-LEVEL STATISTICS
======================================================================
Total source files: 14114
Used source files: 1046 (7.4%)
Unused source files: 13083 (92.7%)
Total lines of code: 3646331
Used lines of code: 192543 (5.3%)
Unused lines of code: 3453788 (94.7%)
======================================================================
======================================================================
LINE-LEVEL STATISTICS (within compiled files)
======================================================================
Files analysed: 504
Total lines in used files:209915
Active lines: 192543 (91.7%)
Inactive lines: 17372 (8.3%)
======================================================================
TOP 20 FILES WITH MOST INACTIVE CODE:
----------------------------------------------------------------------
2621 inactive lines (56.6%) - drivers/mtd/spi/spi-nor-core.c
669 inactive lines (46.7%) - cmd/mem.c
594 inactive lines (45.8%) - cmd/nvedit.c
579 inactive lines (89.5%) - drivers/mtd/spi/spi-nor-ids.c
488 inactive lines (27.4%) - net/net.c
...
Directory Breakdown (``dirs``)
------------------------------
See which top-level directories contribute code::
codman dirs
Output shows breakdown by directory::
BREAKDOWN BY TOP-LEVEL DIRECTORY
=================================================================================
Directory Files Used %Used %Code kLOC Used
---------------------------------------------------------------------------------
arch 234 156 67 72 12.3 8.9
board 123 45 37 25 5.6 1.4
cmd 89 67 75 81 3.4 2.8
common 156 134 86 88 8.9 7.8
...
For detailed subdirectory breakdown::
codman dirs --subdirs
With ``--show-files``, also shows individual files within each directory::
codman dirs --subdirs --show-files
You can also specify a file filter::
codman -b qemu-x86 -f "*acpi*" dirs -sf
=======================================================================================
BREAKDOWN BY TOP-LEVEL DIRECTORY
=======================================================================================
Directory Files Used %Used %Code kLOC Used
---------------------------------------------------------------------------------------
arch/x86/include/asm 5 2 40 36 0.6 0.2
arch/x86/lib 5 1 20 6 1.2 0.1
acpi.c 65 65 100.0 0
cmd 1 1 100 100 0.2 0.2
acpi.c 216 215 99.5 1
drivers/qfw 1 1 100 93 0.3 0.3
qfw_acpi.c 332 309 93.1 23
include/acpi 5 4 80 91 3.3 3.0
include/dm 1 1 100 100 0.4 0.4
include/power 1 1 100 100 0.2 0.2
lib/acpi 13 3 23 14 3.9 0.5
acpi_writer.c 131 63 48.1 68
acpi_extra.c 181 177 97.8 4
acpi.c 304 304 100.0 0
lib/efi_loader 1 1 100 100 0.1 0.1
efi_acpi.c 75 75 100.0 0
---------------------------------------------------------------------------------------
TOTAL 78 15 19 7 17.5 1.2
=======================================================================================
Detail View (``detail``)
------------------------
See exactly which lines are active/inactive in a specific file::
$ codman -b qemu-x86 detail common/main.c
======================================================================
DETAIL FOR: common/main.c
======================================================================
Total lines: 115
Active lines: 93 (80.9%)
Inactive lines: 22 (19.1%)
1 | // SPDX-License-Identifier: GPL-2.0+
2 | /*
3 | * (C) Copyright 2000
4 | * Wolfgang Denk, DENX Software Engineering, wd@denx.de.
5 | */
...
23 |
24 | static void run_preboot_environment_command(void)
25 | {
26 | char *p;
27 |
28 | p = env_get("preboot");
29 | if (p != NULL) {
30 | int prev = 0;
31 |
- 32 | if (IS_ENABLED(CONFIG_AUTOBOOT_KEYED))
- 33 | prev = disable_ctrlc(1); /* disable Ctrl-C checking */
34 |
35 | run_command_list(p, -1, 0);
36 |
- 37 | if (IS_ENABLED(CONFIG_AUTOBOOT_KEYED))
- 38 | disable_ctrlc(prev); /* restore Ctrl-C checking */
39 | }
40 | }
41 |
Lines with a ``-`` marker are not included in the build.
Unused Files (``unused``)
-------------------------
Find all source files that weren't compiled::
$ codman -b qemu-x86 unused |head -15
Finding all source files......
Found 1043 used source files...
Loading configuration......
Loaded 8913 Kconfig symbols...
Loaded 8913 config symbols...
Analysing preprocessor conditionals......
Excluding 539 header files (use -i to include them)...
Running unifdef on 504 files......
Unused source files (13083):
arch/arc/cpu/arcv1/ivt.S
arch/arc/cpu/arcv2/ivt.S
arch/arc/include/asm/arc-bcr.h
Used Files (``used``)
---------------------
List all source files that were included in a build::
$ codman -b qemu-x86 used |head -15
Finding all source files......
Found 1046 used source files...
Loading configuration......
Loaded 8913 Kconfig symbols...
Loaded 8913 config symbols...
Analysing preprocessor conditionals......
Excluding 542 header files (use -i to include them)...
Running unifdef on 504 files......
Used source files (1046):
arch/x86/cpu/call32.S
arch/x86/cpu/cpu.c
...
Per-File Summary (``summary``)
------------------------------
Shows detailed per-file statistics (requires ``-w`` or ``-l``)::
$ codman -b qemu-x86 summary
==========================================================================================
PER-FILE SUMMARY
==========================================================================================
File Total Active Inactive %Active
------------------------------------------------------------------------------------------
arch/x86/cpu/call32.S 61 61 0 100.0%
arch/x86/cpu/cpu.c 399 353 46 88.5%
arch/x86/cpu/cpu_x86.c 99 99 0 100.0%
arch/x86/cpu/i386/call64.S 92 92 0 100.0%
arch/x86/cpu/i386/cpu.c 649 630 19 97.1%
arch/x86/cpu/i386/interrupt.c 630 622 8 98.7%
arch/x86/cpu/i386/setjmp.S 65 65 0 100.0%
arch/x86/cpu/intel_common/cpu.c 325 325 0 100.0%
...
Copy Used Files (``copy-used``)
-------------------------------
Extract only the source files used in a build::
codman copy-used /tmp/sandbox-sources
This creates a directory tree with only the compiled files, useful for creating
minimal source distributions.
Analysis Methods
================
The script supports several analysis methods with different trade-offs.
Firstly, files are detected by looking for .cmd files in the build. This
requires a build to be present. Given the complexity of the Makefile rules, it
seems like a reasonable trade-off. These directories are excluded:
* tools/
* test/
* scripts/
* doc/
unifdef
-------
For discovering used/unused code, the unifdef mechanism produces reasonable
results. This simulates the C preprocessor using the ``unifdef`` tool to
determine which lines are active based on CONFIG_* settings.
**Note:** This requires a patched version of unifdef that supports U-Boot's
``IS_ENABLED()`` and ``CONFIG_IS_ENABLED()`` macros, which are commonly used
throughout the codebase. It also supports faster operation, reducing run time
by about 100x on the U-Boot code base.
The tools:
1. Reads .config to extract all CONFIG_* symbol definitions
2. Generates a unifdef configuration file with -D/-U directives
3. Runs ``unifdef -k -E`` on each source file to process conditionals, with
``-E`` enabling the IS_ENABLED() support
4. Compares original vs. processed output using line-number information
5. Lines removed by unifdef are marked as inactive
This method Uses multiprocessing for parallel analysis of source files, so it
runs faster if you have plenty of CPU cores (e.g. 3s on a 22-thread
Intel Ultra 7).
The preprocessor-level view is quite helpful. It is also possible to see .h
files using the ``-i`` flag
Since unifdef does fairly simplistic parsing it can be fooled and show wrong
results.
DWARF (``-w/--dwarf``)
----------------------
The DWARF analyser uses debug information embedded in compiled object files to
determine exactly which source lines generated machine code. This is arguably
more accurate than unifdef, but it won't count comments, declarations and
various other features that don't actually generate code.
The DWARF analyser:
1. Rebuilds with ``CC_OPTIMIZE_FOR_DEBUG`` to prevent aggressive inlining
2. For each .o file, runs ``readelf --debug-dump=decodedline`` to get line info
3. Parses the DWARF line number table to map source lines to code addresses
4. Aggregates results across all object files
5. Any source line that doesn't appear in the line table is marked inactive
As with unifdef, this uses multiprocessing for parallel analysis of object
files. It achieves similar performance.
See Also
========
* :doc:`../build/buildman` - Tool for building multiple boards
* :doc:`qconfig`
* :doc:`checkpatch` - Code-style checking tool