Project Bootstrap seeks to create a minimal portage tree to better understand dependencies between those packages, create packages reflecting such dependencies, and eventually have a cleaner stage build.
The repository for this project may be found at https://github.com/brantgurga/project-bootstrap.
There are two kinds of dependencies that Portage currently supports. There is
RDEPEND when specifies a runtime dependency. A runtime dependency can be merged after a package. For example, if foo.ebuild contains:
This means that Portage can use this order:
On the other hand,
DEPEND specifies a dependency that needs to exist beforehand. Consequently, if foo.ebuild contains:
This means that Portage must use the order:
When we talk about bootstrapping, Portage is used with the
ROOT variable to merge packages to a new root location. The problem is that something you need to build a package such as the compiler does not yet exist in the
ROOT location. Consequently, you can't do something like:
In order to build binutils, you need a compiler, but to build the compiler, at least for the x86 and x86-64 targets, you need binutils. So what do you do about that situation? Do you install binutils first (somehow). Or do you install gcc first (somehow)? I propose a two-part approach to handle this:
 New Dependency Type
The first aspect is having a new dependency type that Portage would support. When doing a
ROOT merge which a normal merge can actually be considered a special case with
ROOT=/, you care about dependencies in the
ROOT=/ environment. In the Autoconf terminology, these are
CHOST dependencies. You also care about what is installed in the build environment. These are
CBUILD dependencies to use the Autoconf terminology. The current
DEPEND mechanism tries to address both which isn't really all that elegant. I therefore propose a
HOSTDEPEND dependency type.
You actually also would have a
TARGET dependency as well. For example, to build a compiler to target x86-64 on an x86 machine, you have a dependency on a binutils that can target x86-64. To build a package that is *hosted* in that *build* environment, you also have a dependency on a *hosted* compiler that *targets* that same *build* architecture. However, it is my thought that it is unnecessary to add a new dependency type for these, and the USE flags or a different package *might* address those types of dependencies. I have not experimented enough to know for sure though.
 New Package Type
Assuming host dependencies and build environment dependencies are addressed, there is now the actual bootstrapping problem. The end goal is to have a final product that has no influence from the host environment. For example, it has already been noted that gcc has a dependency on binutils. However, if you build binutils first, it gets linked with object files from the host environment. To address situation, I propose that we create a chain of toolchain packages such as:
- bootstrap-binutils: binutils built with host tools
- bootstrap-gcc: gcc built with host tools and utilizing bootstrap-binutils
- binutils: binutils built with bootstrap-gcc so that it does not use the object files of the host environment
- gcc: gcc built with bootstrap-gcc and utilizing binutils
So bootstrap-binutils depends on the host environment, but binutils only depends on the build environment.
The basic toolchain consists of gcc and minimal dependencies. The simplest of the dependencies is the linux-headers package which is necessary for glibc. Glibc is probably a more complicated one since it can supply shared libraries as well as the mechanism for loading those shared libraries.
Most packages use libc routines such as
malloc to allocate memory. However, libc needs to get that memory from somewhere. This is where the userspace interface supplied by the Linux kernel comes in. Libc uses interfaces described by the headers to implement the backend functionality of libc in different OS environments. This is how the same libc API can work on Linux as well as Hurd. glibc is doing the work of interfacing with the operating system kernel on the program's behalf so that most programs don't have to think about those differences if they are sticking with the standard C library.
It used to be that glibc would get build using the headers in /usr/src/linux/include directly. However, the kernel folks were against this for various reasons. It then fell to distributions to create packages of sanitized headers with which to compile glibc. The problem here was different distributions doing different things and no standardized mechanism. Nowadays, within the kernel source, there is a mechanism to get the sanitized userspace headers installed through some variation of make headers_install. The nice thing is that since these are just header files, they don't depend on the host environment for object files. Consequently, these only need to be installed to the new ROOT once. They don't really need to be bootstrapped. Whether installed using host tools or ROOT tools, the headers are the same.
Glibc is a strange beast. Software written in C comes in two types. There is hosted C and unhosted C. Unhosted C is basically anything that doesn't use the standard C library. Hosted C is software that does use a C library. Things like the kernel use unhosted C since there is no libc available. The glibc package is interesting though because it provides libc so that is build in an unhosted environment. Kind of silly to link libc to libc, if that's even possible. However glibc also supplies programs like nscd as well as libraries for name resolution. So the libc portion of glibc doesn't have any build environment dependencies besides the headers. However, the extra glibc portions depend on the compiler's object files that provide the connection to a hosted environment. I am unsure at the moment if glibc does any magic to address this or if glibc is something that needs built at least twice to remove the influence of the host environment.
Binutils supplies the assemblers and linkers that gcc uses. It depends on gcc to be compiled though. It also depends on glibc to provide libc.
GCC is interesting in that it had a bootstrapping built in, though this bootstrapping is unavailable is cross-compiling. It's unclear if the whole thing is rebuilt during that bootstrapping or just a certain part. While gcc supports many languages, for the purpose of bootstrapping a stage, handling C is all that matters.
 Bootstrap Order
The goal of the bootstrap ordering is to eliminate build dependencies such as include files and linked libraries from being sought on the host's ROOT.
- Glibc (depends on headers)
- Binutils (depends on glibc)
- GMP (depends on glibc)
- MPFR (depends on gmp and glibc)
- MPC (depends on glibc, gmp, and mpfr)
- C-only GCC (depends on binutils, mpc, mpfr, mpc and Glibc)
It is assumed that the host's tools work properly. The bootstrap toolchain ordering does not attempt to work around host tool issues. The host tools will need to be fixed if there are issues as a result of them in this phase. It does not scale well to try to do something like building binutils then glibc using the built binutils, then rebuilding binutils against glibc. As long as the host tools are working well enough, the influence of host tools is removed once the packages are rebuilt within the chroot environment.
 Further Thinking
In addition to the core toolchain, the tools necessary for operating Portage in a chroot environment need to be necessary. Consideration should also be made for running the test suites since it is critical to ensure that the core toolchain, Portage and its dependencies, and anything necessary for the chroot environment are working properly.