Making the Distribution, Part 2
From Enoch to Gentoo, via minor setbacks and corporate run-ins
First steps to Enoch
In my previous article, I gave you the low-down on my days with the Stampede development team and why I left (to get away from lower-level politically-minded, project-controlling "freaks"). Because of the interference from these meddlesome by-standers, I figured it would be easier to put together my own Linux distribution than to continue improving Stampede under such dirty conditions! Fortunately I took with me a considerable amount of experience based on my (may I say substantial?) work for Stampede, including maintaining several of their packages, designing the initialization scripts, and leading the slpv6 (next-generation package management project).
The distribution I began working on, code-named Enoch, was going to be blazingly fast because it would completely automate the package creation and upgrading process. I have to admit that this was in large part because I was a one-member team and couldn't afford to spend my time on repetitive work that my development box could be automated to do for me. And since I was designing a complete distribution from scratch (rather than "spinning off" from someone like RedHat), I had my work cut out for me and needed all the free time I could scrounge up.
After getting my basic Enoch system up and running, I headed back to irc.openprojects.net and started my own channel called #enoch. From there I gradually assembled a team of about ten developers. In those early days we all hung out on IRC and worked on the distribution in our spare time. As we communally and cooperatively hacked away at it, finding and fixing new bugs, Enoch became more functional and professional every day.
The first roadblock
One inevitable day, Enoch hit its first roadblock. After adding Xfree86, glib, and gtk+, I decided to get xmms (an X11/gtk+-based MP3/CD player app) working. I figured it was time to celebrate with some music! But after installing xmms, I tried to start it... and X locked up! At first I thought xmms locked up because I used insane compiler optimizations ("-O6 -mpentiumpro", in case you were wondering). My first thought, to compile xmms with standard optimizations, didn't solve the problem. So I started looking elsewhere. After spending a full week of development time trying to track down the problem, I got an e-mail from an Enoch user, Omegadan, who was also experiencing xmms lockups.
We corresponded for a while, and after many hours of testing we determined that the problem was a POSIX threads-related issue. For some reason, a pthread_mutex_trylock() call did not return the way it should. As the creator of a distribution, these were the types of bugs I really didn't want to encounter. I counted on the developers to release perfect sources so I could focus on enhancing the Linux experience rather than getting buggy sources to work. Of course I soon learned that this was an unrealistic expectation, and that problems like this will always pop up from time to time.
As it turned out, the problem wasn't with xmms, gtk+, or glib. And it wasn't an issue with Xfree86 3.3.5 not being thread-safe and locking up. Surprisingly, we found the bug in the Linux POSIX threads implementation itself, part of the GNU C library (glibc) version 2.1.2. I was shocked at the time to find that such a critical part of Linux had such a major bug. (And we used a release version of glibc in Enoch, not a prerelease or CVS version!).
So how did we track down the problem? Actually, we never were able to come up with a bug fix, but at one point I stumbled across a couple of e-mails on the glibc developer mailing list from another person who had the same problem. The glibc developer who replied posted a patch that solved the thread problem for us. But I was curious why RedHat 6 (which also used glibc 2.1.2) didn't suffer from this problem since the patch was just posted and RedHat 6 had been available for some time. To find out, I downloaded RedHat's glibc SRPM (source RPM) and took a look at their patches.
RedHat had their own homegrown glibc patch that solved the pthread_mutex_trylock() issue. Apparently they experienced the same problem and created their own custom fix. Too bad they didn't send this patch "upstream" to the glibc developers so it could be shared with the rest of the world. But who knows, maybe RedHat sent the patch upstream and for some reason the glibc developers didn't accept it. Or maybe the thread bug was triggered by a specific combination of compiler and binutils versions, and RedHat never ran into it (although they did have a thread patch in their SRPM). I suppose we'll never know exactly what happened. But I did learn that RedHat SRPMs contain a lot of private bug fixes and tweaks that never seem to make it upstream to the original developers. I'm going to rant about this for a little while.
When you put together a Linux distribution it's really important that any bug fixes you create are sent upstream to the original developers. As I see it, this is one of the many ways that distribution creators contribute to Linux. We're the guys who actually get all these different programs working as a unified whole. We should send our fixes upstream as we unify so that other users and distributions can benefit from our discoveries. If you decide to keep bug fixes to yourself, you're not helping anyone; you're just ensuring that a lot of people will waste time fixing the same problem over and over again. This kind of policy goes against the whole open source ethic and stunts the growth of Linux development. Maybe I should say that it "bugs" us all.
It's unfortunate that some distributions (ahem) aren't as good (RedHat) as others (Debian) about sharing their work with the community.
During the time we were trying to fix the glibc threads problem, I e-mailed Ulrich Drepper (one of the guys at Cygnus who is heavily involved with glibc development). I mentioned the POSIX thread problem we were having, and that Enoch was using pgcc for optimum performance. And he responded with something like this (I'm paraphrasing here): "Our own compiler included with the CodeFusion product has an excellent x86 backend that produces executables far faster than those generated with pgcc." Obviously, I was very interested in testing out this mystery "turbo" compiler the Cygnus guys had created.
I thereupon requested a demo copy of Cygnus Codefusion 1.0 so that I could test it out, and Omegadan and I were amazed to find that this compiler was everything that Ulrich claimed and then some. The x86 backend increased the performance of some of the CPU-intensive executables (like bzip2) by close to 90%! All applications seemed to benefit from at least a 10% real-world performance increase, and all we did was swap out compilers. Enoch even booted 30 - 40% faster. The performance gains were far, far greater than what we gained by switching from gcc to pgcc. Obviously, after experiencing it for ourselves, we wanted to use this compiler for Enoch. Fortunately, the sources were included on the CodeFusion CD and were released under the GPL, so we were fully permitted to use this compiler... or so we thought.
Let the freakiness begin
I sent an e-mail to the marketing manager at Cygnus to let them know our intentions, expecting a "yeah, go for it, thanks for using our compiler" response. Instead the reply was that although we were (technically) allowed to use the Cygnus compiler, we were strongly urged not to use or include the compiler sources with Enoch. I responded by asking why they had released the source under the GPL, if that was the case. It's my guess that if they had a choice, they wouldn't have used the GPL, but because they derived their compiler from egcs (released under the GPL), they had no choice.
This is a good example of a situation where the GPL prevented a company from creating a proprietary product based on open sources. My educated guess is that Cygnus was afraid that if we used their compiler we would undermine their boxed product sales, which would be especially strange because none of their marketing materials (nor the InfoWorld review) mentioned the new compiler included with CodeFusion. CodeFusion was marketed solely as a "development IDE" product, not as a compiler.
In an attempt to put some of their paranoia to rest, I offered to endorse CodeFusion and place the endorsement on our Web site with a link to help spur CodeFusion sales. Personally I didn't think that a "turbo" Enoch would negatively affect their sales, since CodeFusion was marketed as an IDE. But I tried nevertheless to make them happy. The IDE component of CodeFusion was a commercial product, and we had no desire or intention (or right) to distribute it with Enoch.
I e-mailed my (generous?) offer to Cygnus and received another strange response. They wanted authority over all of our "marketing materials" (apparently, this also included the content of our Web site!) Another shocker. The Cygnus marketing team seemed to have no grasp of how the Linux community or the GPL worked, so I decided to cut off communication with Cygnus for the indefinite future. In the mean time, we created a private "turbo" and public "non-turbo" version of Enoch, leaving the final decision for later.
But after several months they integrated the CodeFusion x86 backend into gcc 2.95.2. Now everyone could benefit from the nice new backend, not just the people who knew about the "secret GPL compiler" included on the CodeFusion CD. But we decided to go ahead and use gcc rather than the CodeFusion compiler. In addition to being more stable, gcc 2.95.2 also allowed us avoid Cygnus, which by this time had been purchased by RedHat for a ridiculous sum of money. (Note: the new x86 backend in gcc 2.95.2 is what gave newer Linux distributions the significant speed boost that we all got to experience. It also gave FreeBSD 4.0 a nice speed boost over 3.3.6. Notice the difference?)
On the soapbox
Thanks to this and other experiences, I've learned a lot about for-profit open source companies. There's absolutely nothing bad about being a for-profit open source company. Nor is there anything morally wrong with producing proprietary closed-source software, if that's what you'd like to do. But it doesn't make any sense for open source companies to subvert or refuse to cooperate with the rest of the open source world, either by not supporting the GPL or by any other means. This is a practical point that clearly makes business sense.
Open source companies should realize that the free exchange of ideas and code is what they profit from. By opposing things like the standard GPL practices, they undermine the environment they rely upon to prosper and grow. If open source is the soil from which your business has sprouted, it makes sense to keep the soil healthy.
I understand that there's a temptation to keep at least some information secret for short-term financial gain. Advanced code or special techniques provide a coveted competitive advantage, which could potentially result in increased sales and profit. But if the goal is to be the sole provider of a product, the product should be commercial rather than open source. Open source does not allow for exclusive access to the inner workings of anything. That's what it means.
Back to Enoch
Now, I'll step down from my soapbox and continue my story.
As Enoch became more and more refined, we decided that a name change was in order, and "Gentoo Linux" was born. By this time we had released a couple of versions of Enoch (now Gentoo), and were racing to get to Gentoo Linux version 1.0. Around this time I also decided to upgrade my old Celeron 300 box (overclocked and rock-solid at 450Mhz) to a brand-new Abit BP6 (a dual Celeron board that had just hit the market). I sold my old box and put my dual Celeron 366 system together. After overclocking the processors to something on the order of 500Mhz, I was cruising. But I noticed that my new machine wasn't very stable.
Obviously my first reaction was to go back down to 2x366Mhz. But now I experienced an even stranger problem. As long as my machine kept the CPUs chugging away, the machine didn't lock up. But if I left the machine idle overnight, there was a good probability that the system would lock up completely. Yes, an idle bug -- argh! After some research, I found several other Linux users with the same problem on this particular motherboard. A chip on the BP6 (was it the PCI controller?) seemed to be flaky or out of spec, which caused Linux to lock up at idle.
I was more than a wee bit upset, and because I couldn't afford to order more PC parts, Gentoo development effectively halted. I became more and more pessimistic about Linux and decided to switch over to FreeBSD. Yes, FreeBSD. And that's where I'll end this installment -- see you in Part 3. :)
Read the next article in this series: Making the Distribution, Part 3
Browse all our available articles below. Use the search field to search for topics and keywords in real-time.