Notes on Building Linux From Scratch
As a learning exercise, I built an entire GNU/Linux system from source. This post records my notes and experience of the process.
Installing a modern Linux distribution is quick and easy, but there is a lot going on behind the scenes. The graphical installer requires minimal knowledge of how things work, and many hours of labour by hidden and largely unsung distribution maintainers is required to keep all parts of the system playing nicely together. I have used Linux for a decade and the Debian distribution has been entirely sufficient for my needs; I have never had the time or inclination to configure and maintain the system on a very low level.
The comprehensive and idiot-proof instructions from Linux From Scratch (LFS) present an opportunity to get to grips with the nuts and bolts of Linux.
Building a temporary tool set
I used a spare computer for this project: my old ASUS Eee
PC, a single-core netbook from 2008. I would use a faster machine next time, because the time to compile code
turned out to be a limiting factor. LFS quantifies the approximate time
to build each package using a ‘standard build unit’. On a modern machine,
a typical value is 2 minutes, but the netbook took 15 minutes! For
example, I left gcc
to compile and test overnight, and it was still
chugging away the next morning. Overall, from start to finish, the
process took a week of very on-and-off effort.
A working Linux system is a prerequisite for building LFS, or we have a ‘chicken or egg’ problem where we have no means of compiling the compiler source code. The first task is to make a set of temporary tools that are independent of the existing operating system.
The process to produce a free-standing compiler is convoluted. We compile:
binutils
for the assembler and linker (still depends on host)gcc
compiler using ourbinutils
(still depends on host)glibc
C library using ourgcc
and new system headerslibstdc++
C++ library using ourglibc
binutils
again, using ourglibc
gcc
again, using ourglibc
,libstdc++
,binutils
At every stage, we have to edit the source code using sed
commands because so
many paths are hard coded. This seems a bit inelegant.
Building the rest of the tool set is straightforward and most of the configure
options are self-explanatory. That said, I carelessly skipped over coreutils
and diffutils
, which manifested in failing tests when it came to build the
final system. This shows why installations should always be automated.
Building the system
Once the temporary tool set is ready, we chroot
into the new partition for
the LFS system from the host system. This means that anything on the host
system, outside our tools directory, is inaccessible. We have to remount a few
virtual kernel file systems from the host in order to keep things running.
We compile the final copy of glibc
and tweak our tool set to point to it,
rather than the temporary copy. This requires another obscure sed
command,
but at least we don’t have to go round the gcc
compiling loop for a third
time.
From this point forward, we build and install 68 other packages in a prescribed
order. I also ran all the tests just to make sure I had not made any mistakes.
This was a bit tedious, with a surfeit of sed
commands to tweak source code,
and many non-obvious configure options. I would never be able to find and solve
all of these potential problems myself. The Debian maintainers deserve a lot of
credit for the significantly larger job of making 59000 software packages ‘just
work’ together.
The only slight hiccup I had was a coreutils
test failure, which according to
the test logs provided by LFS was unexpected. Inspecting the test script (written in Perl!) I saw that it was running cut
commands like,
$ echo -n -e 'ab\0cd\0' | cut -z -c1 | xxd -d
00000000: 01100001 00000000 01100011 00000000 a.c.
and checking the output was correct. This example splits a string on a null
delimiter and takes the first character from each field. It seemed that nobody
on the entire internet had the same problem as me, and running the commands
outside the test suite produced the expected results, so I moved on. I think
the issue is something to do with locales and language settings, as the test
script contains some logic to repeat tests with the $LC_ALL
environment
variable set.
Supporting multiple locales in software is non-trivial. Perhaps this is because for early computing history, after US English, all other languages were a total afterthought. But additional complexity to support the huge alphabets of Asian languages, right-to-left scripts, and so on is inevitable.
Configuration
The detail LFS provides about how the Linux system handles hardware is informative:
- The kernel detects a device, and the device driver exports the necessary
information about the device to the
/sys
directory, and make a device node in/dev
; udevd
sets up the device node according to its rules configuration;- If applicable,
udevd
loads a kernel module matching a wildcard under/sys
; - User-space processes can now talk to the device node under
/dev
.
Setting up the Colemak keyboard layout and British English locale is quite involved. The mess of inconsistent naming conventions is ridiculous: ‘ISO-8859-1’, ‘iso8859-1’, ‘iso88591’ or ‘UTF-8’, ‘utf8’.
The last step is to compile a new Linux kernel for the LFS system. The Kernel
maintainers provide an ncurses
menu interface for the million different
configure options available. I put my faith in the defaults, compiled, and
successfully rebooted into my new system.
After a short while I found network access was non-functional. My network card is slightly obscure, and I thought I might need to download some proprietary firmware (which would spoil the ‘built entirely from source’ aspect). Some more research showed that it was actually supported in-kernel. In the configure interface, selecting ‘Atheros devices’ was not sufficient - I needed to select ‘Atheros L1E’ under it in the tree. This was counter intuitive, as I would expect the latter to be a subset of the former.
One compilation later and I was finished!
Conclusion
The built LFS system is minimal but perfectly useable for console-based text editing. In fact, there is noticeably less latency when typing in the console compared to a terminal emulator running under X. Additional instructions, titled Beyond Linux From Scratch are available to guide the user through installing the packages required to do something useful, up to and including an X server and KDE.
The big problem is maintenance. Without a package manager, you have to apply all updates yourself, including bug fixes and security updates. This means subscribing to mailing lists to keep informed of new versions, compiling them and solving any breakages yourself. For a desktop system this has the potential to be an enormous time sink. I myself prefer to let the good people at Debian do this for me, rather than duplicate their efforts.
One realistic application that I can think of is a headless local file or web server, which can just sit on a LAN doing its thing for years without any updating. Without security updates, you wouldn’t want to expose it to the internet, though.
I have learned:
- Cross-compiling is hard;
- Localisation is hard;
- Maintaining a distribution is hard!
Completing LFS consolidated my existing bits and pieces of knowledge about the internal workings of Linux, but the end result is only really useful for educational purposes.