Difference between pages "The Gentoo.org Redesign, Part 2" and "The Gentoo.org Redesign, Part 4"

From Funtoo
(Difference between pages)
Jump to navigation Jump to search
 
 
Line 1: Line 1:
{{Article
{{Article
|Subtitle=The Documentation System
|Subtitle=The Final Touch of XML
|Summary=Have you ever woken up in the morning to the realization that your personal development Web site isn't really that great? If so, you're in good company. In this series, Daniel Robbins shares his experiences as he redesigns the www.gentoo.org Web site in March 2001 using technologies like XML, XSLT, and Python. Along the way, you may find some excellent approaches to use in your next Web site redesign. In this, the second installment, Daniel shows off the new documentation system and sets up a daily CVS-log mailing list.
|Summary=Have you ever woken up one morning and suddenly realized that your cute little personal development Web site isn't really that great? If so, you're in good company. In this series, Daniel Robbins shares his experiences as he redesigns the Gentoo Linux Web site in March 2001, using technologies like XML, XSLT, and Python. This article: Daniel completes the conversion to XML/XSLT, fixes a host of Netscape 4.x browser compatibility bugs, and adds an auto-generated XML Changelog to the site.
|Author=Drobbins
|Author=Drobbins
|Previous in Series=The Gentoo.org Redesign, Part 1
|Previous in Series=The Gentoo.org Redesign, Part 3
|Next in Series=The Gentoo.org Redesign, Part 3
}}
}}
If you've read the first installment of my series on the gentoo.org redesign, then you know that I'm the Chief Architect of Gentoo Linux, making me responsible for the Gentoo Linux Web site. And right now, the site leaves a lot to be desired. Yes, it does look somewhat attractive, but when you look beyond the cute graphics you will see that it really doesn't serve the needs of its primary target audience: Gentoo Linux developers, users, and potential users.
=== A new look, but... ===


Last time, I used a user-centric design approach to create a set of priorities for the site, and then used these priorities to create an action plan for revamping gentoo.org. Two things were at the top of the priority list: new developer documentation and a new mailing list to communicate to developers changes made to our CVS repository. While adding the new CVS mailing list was relatively easy (though, as you will see, it was more difficult than I thought), the new developer documentation required a lot of planning and work.
At the end of the previous article, the Gentoo Linux Web site had a completely new look, but there are still some things that aren't quite complete. In this article, the final installment in this series, I finally put those finishing touches on the site, resulting in a fully-functional, refined, and modular XML-based site that's ready for the future. Here's what was missing from the site since the last article:


Not only did I need to create some actual documentation (a task that I had been ignoring for too long), but I also had to choose an official XML syntax that our new master documentation would use. You see, until a few weeks ago, I was creating the documentation in raw HTML. This was definitely a naughty thing to do, because by doing this content was being mixed (the actual information) with presentation (the display-related HTML tags). And what did I end up with? An inflexible mess, that's what. It was hard to edit the actual documentation and extremely difficult to make site-wide HTML improvements.
=== Loose ends ===


In this article, I'll proudly demonstrate the site's new flexible XML documentation solution. But first, I'll recap my experiences in adding the CVS log mailing list to our site.
First, while the site has a completely new look, only the documentation portion of the site is XML-based. The main "category" pages are still in raw HTML and need to be converted to an XML/XSLT solution to make things more maintainable and expandable.


=== Adding the CVS log mailing list ===
Also, my developers have found several problems with the raw HTML itself. The site looks particularly bad when viewed under Netscape 4.77 -- obviously, this is a problem. Also, there are a number of other minor rendering problems that appear in more modern browsers, the most annoying of which is a thin vertical black line that does not extend completely down the entire page, ruining the illusion that the main content area is being spoken by our flying-saucer guy. Also, our documentation pages don't completely match the more refined look of our new main category pages -- clearly something worth updating.


The goal of the CVS log mailing list is to inform developers of new commits made to our CVS repository. Since I already had the mailman mailing list manager (see Resources) installed, I thought that creating this new list would be easy. First, I would simply create the mailing list, then add the proper "hook" to the CVS repository so that e-mails would be automatically generated and sent out, describing the changes to our sources as they happened.
=== The goal ===


I first started researching a special file in my repository's CVSROOT called "loginfo." Theoretically, by modifying this file, I could instruct CVS to execute a script when any commit (and thus, modification) was made to the repository. So I created a special loginfo script and plugged it into my existing repository. And it did indeed send out e-mails to the new "gentoo-cvs" mailing list whenever modifications were made to our sources.
Here's the plan for the final rework of the Gentoo Linux site. First, we'll totally rework the main page HTML, keeping the same overall look, but making the page more browser-compatible. At the same time, we'll add a few presentation-related refinements suggested by our visitors, and also fix browser compatibility problems with our existing "guide" documentation system.


Unfortunately, this solution wasn't all I'd hoped it would be. First of all, it generated lots of e-mail messages -- one for each modified file -- and secondly, the messages were cryptic and sometimes even empty! I quickly removed my loginfo script and put the gentoo-cvs mailing list project on hold. It was clear that CVS's loginfo hook wasn't appropriate for my needs, and I had a hard time tracking down any loginfo-related documentation that could help me solve my problem.
Next, we'll completely move the site over to XML and XSLT. By the end of this article, any change made to the site will be made by modifying XML or XSLT rather than directly editing HTML, which will now be generated automatically with the help of xsltproc. This will make the site a whole lot easier to maintain. Because Gentoo Linux is a community-developed project, this will, in turn, allow our developers (and me) to maintain and improve the site as needed. I'm really excited about this since it will save us a bunch of time and ensure that our visitors are greeted with up-to-date content.


=== cvs2cl.pl ===
=== Compatibility issues ===


Several weeks later I started looking for an alternative to loginfo. This time I did the smart thing and headed over to http://freshmeat.net. There I quickly found just what I was looking for: the incredibly wonderful cvs2cl.pl perl script available from http://red-bean.com (see Resources). Instead of using the loginfo hook, cvs2cl.pl uses the cvs log command to connect directly to the repository and extract the appropriate relevant log information. Also, rather than spitting out relatively cryptic CVS log messages, it does a great job of reformatting everything into a readable ChangeLog format:
While Netscape 4.x is still a very widely used browser, it is difficult for me to decide exactly how many hoops to jump through in order to make the site look better when viewed through this browser. Should I merely ensure that the site is readable (without any major glitches) or should I do everything I can to make sure the site looks absolutely perfect under Netscape 4.x, even if that means using less or no CSS and adding strange compatibility hacks to the existing HTML?


{{file|desc=Output generated by cvs2cl.pl|body=
In the end, I decide to make several major changes to the HTML so that the site will still look quite good under Netscape 4.x without focusing too much on minor bug-related table spacing and font-rendering issues. Here are some of the changes made to the site's HTML to get everything 4.x compatible. (The Gentoo Linux development team has submitted several of these fixes.)
2001-04-09 20:58  drobbins
 
      * app-doc/gentoo-web/files/xml/dev.xml: new fixes
First, Netscape 4.x has a bug that causes CSS background colors of block elements to be displayed incorrectly. For example, here's how a particular portion of a guide document is supposed to be rendered:
2001-04-09 20:47  drobbins
 
      * app-doc/gentoo-web/: gentoo-web-1.0.ebuild,
[[File:L-redesign-07.gif|center|frame| A sample guide document in IE5]]
      files/pyhtml/index.pyhtml, files/xml/gentoo-howto.xml: new gentoo-howto
 
      fixes
And, here is how Netscape 4.x renders this same portion when background colors are specified using CSS:
2001-04-09 20:03  drobbins
 
      * app-doc/gentoo-web/files/xml/dev.xml: typo fix
[[File:L-redesign-08.gif|center|frame|A sample guide document in Netscape 4.7; some fixes are needed]]
2001-04-09 20:02  drobbins
 
      * app-doc/gentoo-web/files/pyhtml/index.pyhtml: little update
This is ugly. To fix it, existing block-level elements, such as this one...
 
{{file|desc=sample paragraph|body=
<p class="note">This paragraph doesn't look so good in 4.x</p>
}}
}}


cvs2cl.pl can also be instructed to generate output in XML format, and in my next article I'll take advantage of this by incorporating an up-to-date ChangeLog into the new developer section of our site.
...were replaced with tables, as follows:
 
{{file|desc=sample table|body=
<table width="100%" border="0" cellpadding="0" cellspacing="0">
        <tr>
                <td bgcolor="#ddffff"><p class="note">
                This looks a whole lot better in 4.x</p></td>
        </tr>
</table>
}}


=== The cvslog.sh script ===
This hack fixes the background-rendering problem. However, this "fix" also requires color information to be included in the HTML, which undermines the benefits of using CSS in the first place. This is an unfortunate situation, especially for fans of CSS like myself, but is required for Netscape 4.x compatibility.


Here's the script I now use to generate the daily ChangeLog e-mails. First, it changes the current working directory to the location of my checked-out CVS repository. Then, it creates $yesterday and $today environment variables that contain the appropriate dates in RFC 822 format. Notice that both date variables have the time set to either "00:00" or midnight. These variables are, in turn, used to create a $cvsdate variable that is then passed to cvs2cl.pl to specify the date range that I'm interested in -- the span of time from yesterday at midnight to today at midnight. Thus, the $cvsdate variable contains a datespec that informs cvs2cl.pl to log only changes made yesterday, but not others.
=== Rebuilding the HTML ===


In addition, I also created a $nicedate variable (used in the mail subject line) and use the mutt mailer (in mailx compatibility mode [see Resources]) to send the e-mail to the gentoo-cvs mailing list:
Now it's time to deal with the black vertical line that doesn't always extend all the way to the bottom of the screen. I have been unable to find a solution to this problem that works in both a 4.x and 5.x browser; every 5.x version has triggered bugs in Netscape 4.x, and every 4.x-compatible version looks horrible in a 5.x browser. So, I decide to simply remove the black line entirely: Finally, the site works in all popular browsers. Next, I will to create a guide-like syntax for creating the main pages.


{{file|name=cvslog.sh|body=
=== Approaching the XML ===
#!/bin/bash
 
cd /usr/portage
Instead of implementing a completely new tagset for the main page, I think it would be a good idea to try to use as many of the "guide" XML documentation tags as possible (see part 2 of this series for more information on the guide XML format). So, I hack away at some new XSL, using my guide XSL as a template for my work. After an hour or two, I have a fully-functional set of XSL transformations for turning a guide-like syntax into an HTML main page. Revision 2 of the new main page looks like this:
cvs -q update -dP
 
yesterday=`date -d "1 day ago 00:00" -R`
[[File:L-redesign-09.gif|center|frame|The new main page revision]]
today=`date -d "00:00" -R`
 
cvsdate=-d\'${yesterday}\<${today}\'
Now that the main page is using a new XML/XSLT backend, I direct my attention to the "guide" system's HTML output. Not only do I need to fix a host of Netscape 4.7 compatibility bugs, but I also need to further update the generated HTML and graphics so that they will match those of my newly-revised main page. Then the idea strikes me: Why not simply tweak my new main page XML/XSL just a little bit so that it can also generate HTML for my documentation? After all, I have just added support for nearly every "guide" XML tag, so that they can also be used for main page content.
nicedate=`date -d yesterday +"%d %b %Y %Z (%z)"`
 
/home/drobbins/gentoo/cvs2cl.pl -f /home/drobbins/gentoo/cvslog.txt -l "${cvsdate}"  
This solution turns out to be really easy to implement. I just tweak the new XSLT file so that it will remove the left-hand "link bar" and perform a few other minor changes to the output HTML when it processes documentation pages. Since most of the XSLT is still the same, I can use a single set of master XSLT templates for both the guide documentation and the category pages:
mutt -x gentoo-cvs -s "cvs log for $nicedate" <\
/home/drobbins/gentoo/cvslog.txt
}}


Using cron, I run this script every night at midnight. Thanks to cvs2cl.pl, my developers now get accurate and readable daily CVS updates.
[[File:L-redesign-10.gif|center|frame|How the new XSL works]]


=== The documentation project ===
Not only do I now have a single set of XSLT templates to maintain, but because both flavors of output HTML are based on the same master document, they now share the same CSS stylesheet. This means that there is no need to "synchronize the look" between two disparate sets of stylesheets and output HTML. And as you can see, the new documentation HTML is a perfect match for the new main page:


Now, for the Gentoo Linux documentation project. Our new documentation system involves two groups of people or target audiences: the documentation creators and the documentation readers. The creators need a well-designed XML syntax that doesn't get in their way; the readers, who couldn't care less about the XML, want generated HTML documentation that is both functional and attractive. The implementation challenge is to put together a complete system that addresses the needs of both audiences. Oh, and I suppose there is a third "audience" -- me, the webmaster and the person designing the new system. Since I'm going to be interacting with the new doc system whenever the site is upgraded, I need it to be reliable and flexible.
[[File:L-redesign-11.gif|center|frame|The new documentation pages perfectly match the new main page]]


=== The Web-ready HTML ===
=== The XML implementation ===


First, let's talk a bit about the Web-ready HTML that'll be generated from my master XML files. To make great, readable documentation, I'll need to have support for the proper XML tags. For example, the ability to insert notes, important messages, and warnings into the body of the document (and have them prominently displayed in the resultant HTML) is a must. Also, I must be able to insert blocks of code, and it would be great if actual user input could somehow be offset from program output. I could even add tags that highlight the source code comments in an alternate color so that the code blocks are more readable.
The actual implementation is quite easy; my existing guide XML syntax requires that every document be part of a single master <guide> element. To add support for main category pages, I create a new master element: <mainpage>. To create a main category page, I place everything inside a <mainpage> element instead of a <guide> element, and the XSLT makes the appropriate changes to the output. Besides this, the only major change required is the addition of an optional <sidebar> element that's used to specify the contents of the floating table on a main category page. The existing <guide> XSLT template looks something like this:


The documents should have a table of contents (with hyperlinks to the appropriate chapters), a synopsis, a revision date, version, and an authors list at the top of the document. And, of course, every document should have a header at the extreme top of the page containing a small Gentoo Linux logo. Clicking on this logo should bring you back to the main Gentoo Linux page. Last but not least, every document should have a footer that contains copyright information, along with a contact e-mail address.
{{file|lang=xml|desc=XSLT template|body=
<xsl:template match="/guide">
        <html>
        <head>
                guide header goes here
        </head>
        <body>
                top part of guide body HTML content goes here
<!--next, we insert our content-->
                <xsl:apply-templates select="chapter" />
                bottom part of guide body HTML content goes here
        </body>
        </html>
</xsl:template>
}}


=== The spiffy new logo ===
If you're not too familiar with XSLT, this template tells an XSLT processor to replace the {{c|<guide> </guide>}} tags with the shell of an HTML document, as well as recursively applying templates to any <chapter> elements (opening/closing tag pairs) inside the <guide> element and inserting the resultant output into the middle of the HTML shell.


This was a hefty list of requirements, and I decided to focus on the most entertaining part first, the new Gentoo Linux logo that would appear in the upper-left corner of every Gentoo Linux document. I used the "g" from the "gentoo" graphic (created using the excellent and free Blender 3D program) on our main page as the basis for the new smaller logo. I tweaked the extrusion settings a bit and then added a chrome environment map. Finally, I positioned the lights and camera just so, and the new logo was complete. After importing it into Xara X (see Resources) and adding some text, this was the result:
So, to add support for the main category pages, I need to specify that a different HTML shell should be used if everything happens to be enclosed in a single <mainpage> element. To do this, I add a new template, as follows:


[[File:L-redesign-02.gif|frame|class=img-responsive|caption=The new Gentoo Linux logo]]
{{file|lang=xml|desc=The new template|body=
<xsl:template match="/mainpage">
        <html>
        <head>
                mainpage header goes here
        </head>
        <body>
                top part of mainpage body HTML content goes here
<!--next, we insert our content-->
                <xsl:apply-templates select="chapter" />
                bottom part of mainpage body HTML content goes here
        </body>
        </html>
</xsl:template>
}}


I used this new logo as inspiration for the rest of the HTML color scheme, using a purplish theme throughout. I made heavy use of cascading style sheets (CSS) to control font attributes and spacing. Once I had a decent HTML prototype in place, I started focusing on the guts of the new documentation -- the new XML syntax. I wanted the syntax to be as simple as possible, so I created just enough XML tags to allow for the proper organization of the document, but no more. Then I started working on the XSLT to transform the XML into the target HTML.
Because nearly every other XML element (from <chapter> all the way on down) produces identical HTML output for both guide and main category pages, almost every other XSLT template can be shared for both types of pages. Thus, we can get along just fine with a single XSLT file that specifies two "HTML shells" and a common set of XML-to-HTML XSLT templates. As always, code reuse is definitely a good thing.


=== The result! ===
=== The Changelog page ===


After much tweaking and a good amount of feedback from one of my developers, the new documentation system reached the point where it was ready for use. I immediately began work on our first new development guide, "The Gentoo Linux Documentation Guide" (xml-guide.html), which contains a complete description of the new XML format. Not only did this allow other developers to begin work on the new-style documentation, but it also served as an excellent example of the new documentation system in action. Be sure to read this guide to get a complete understanding of our new XML syntax.
You'll remember that in Part 2 of this series I mentioned that the cvs2cl.pl CVS Changelog generation script could produce XML output and that I wanted to eventually use this feature as the basis for a daily CVS Changelog page that would appear on the new Web site. Now, with the new XML backend in place, adding the new Changelog page is a piece of cake. Here's an enhanced version of the cvslog.sh script that also takes care of handling the XML-to-HTML conversion:


=== DocBook vs. Guide ===
{{file|lang=bash|desc=Enhanced version of cvslog.sh script|body=
#!/bin/bash
#various paths
HOMEDIR=/home/drobbins
CVSDIR=${HOMEDIR}/gentoo/gentoo-x86
OUTLOG=${HOMEDIR}/gentoo/xmlcvslog.txt
OUTMAIL=${HOMEDIR}/gentoo/cvslog.txt
WEBDIR=/usr/local/httpd/htdocs
XSLTP=/opt/gnome/bin/xsltproc
TMPFILE=${HOMEDIR}/gentoo/xmlcvslog.tmp
USER=drobbins
#if $CVSMAIL is undefined, set it to "yes"
if [ -z "$CVSMAIL" ]
then
        export CVSMAIL="yes"
fi
#the main script
cd $CVSDIR
cvs -q update -dP
yesterday=`date -d "1 day ago 00:00" -R`
today=`date -d "00:00" -R`
cvsdate=-d\'${yesterday}\<${today}\'
nicedate=`date -d yesterday +"%d %b %Y %Z (%z)"`
#generate cvs2cl.pl XML output
/usr/bin/cvs2cl.pl --xml -f $OUTLOG -l "${cvsdate}"
#use sed to remove "xmlns=" from cvs2cl.pl output
/usr/bin/sed -e 's/xmlns=".*"//' $OUTLOG > ${OUTLOG}.2
#convert cvs2cl.pl XML output to guide format using $XLSTP
$XSLTP ${WEBDIR}/xsl/cvs.xsl ${OUTLOG}.2 > $TMPFILE
#convert guide XML output to HTML format using $XLSTP
$XSLTP ${WEBDIR}/xsl/guide-main.xsl
$TMPFILE > ${WEBDIR}/index-changelog.html
#fix perms
chmod 0644 ${WEBDIR}/index-changelog.html
#automatically send cvs mail if $CVSMAIL is set to "yes"
if [ "$CVSMAIL" = "yes" ]
then
        /usr/bin/cvs2cl.pl -f ${OUTMAIL} -l "${cvsdate}"
        mutt -x gentoo-cvs -s "cvs log for $nicedate" > ${OUTMAIL}
fi
}}


If you're working on your own documentation solution, you may also want to consider the DocBook XML and SGML formats (see Resources). DocBook is well-suited for large-scale technical documentation and book projects, is very flexible, and has many (maybe too many) features. In addition, there are a number of existing packages that can be used to convert DocBook XML/SGML to man pages, texinfo files, Postscript, PDF, and, of course, HTML formats.
While this script may look significantly more complicated than the earlier version, it really only contains four or five key additional lines; the rest of the additions are either comments or environment variable definitions.


I didn't choose DocBook because a lightweight XML syntax worked best for Gentoo's needs. Right now, our XML guide syntax has around 20 tags and about 10 attributes. The limited tagset makes guide XML easy to transform into other formats such as HTML, and also ensures a certain level of consistency throughout our entire documentation set, since the format is so simple. Because I have my own XML format, I'll be able to extend the format with new tags as needed. I like having that level of control. I view XML as a technology that should be used by people to structure their data in ways that they find most helpful. In other words, the ability to define our own elements and attributes is a precious thing, and I should take full advantage of it. After all, it's the defining feature of XML.
Here's how the new XML-related parts of the cvslog.sh script work. First, we call cvs2cl.pl and instruct it to generate an XML-based Changelog containing all the files that were modified yesterday. Then, this XML output is run through sed to remove an unneeded xmlns= attribute from the XML. Next, we hand this slightly tweaked XML over to xsltproc and tell it to apply the processing found in cvs.xsl; these instructions transform the XML output from cvs2cl.pl's into a proper guide XML document. Finally, we again use xsltproc to convert this guide XML document into Web-ready HTML, which is piped into our Web server's htdocs directory. The generated HTML Changelog page is complete, and this is the result:


Of course, creating your own XML syntax is not always the best solution, especially when data interchange is important to you. Amid all the XML hype, one thing that is often overlooked is that conversion to and from different XML formats can be extremely difficult. In many cases, the two formats won't be 100% compatible, and you'll have the unpleasant choice of either throwing away data and/or metadata, intentionally avoiding use of certain elements or attributes, or creating a "super-format" that will accommodate the data and metadata from both XML formats. In the documentation world, DocBook is a pretty good choice as a "super-format" because it's so flexible; it can easily accommodate documentation imported from a variety of sources.
[[File:L-redesign-12.gif|center|frame|The automatically generated Changelog page]]


However, DocBook's richness and flexibility can also create problems. For example, there may be hundreds of tags that you may never need, and supporting all these tags in your XSLT can make conversion to other formats more difficult. So, while DocBook is a great container for documentation converted from other formats, your own minimal XML syntax will almost always be easier to convert to other formats.
You might be surprised at the simplicity of the XSLT contained in cvs.xsl. In it, we specify three templates for <changelog>, <entry>, and <file>. We also make reference to a few other tags in the source XML, including <date>, <author>, and <msg> (which cvs2cl.pl uses to specify the CVS committer's comments). cvs.xsl does quite a bit considering that it is only around 35 lines long:


The most important thing is to carefully evaluate any potential solution while keeping the needs of your target audience(s) in mind.
{{file|lang=xml|desc=The cvs.xsl|body=
<?xml version='1.0' encoding="iso-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="iso-8859-1" method="xml" indent="yes"/>
<xsl:template match="/changelog">
        <mainpage id="changelog">
        <title>Gentoo Linux Development Changelog for <xsl:value-of select="entry/date"/></title>
        <author title="script">cvs-xml.xsl</author>
        <standout>
                <title>About the Development Changelog</title>
                <body>
                        This page contains a daily Changelog, listing all modifications made to our
                        CVS tree on <xsl:value-of select="entry/date"/> (yesterday).
                </body>
        </standout>
        <version>1.0.0</version>
        <date><xsl:value-of select="entry/date"/></date>
        <chapter>
                <xsl:apply-templates select="entry"/>
        </chapter>
        </mainpage>
</xsl:template>
<xsl:template match="entry">
        <section>
                <title>Files modified by <xsl:value-of select="author"/> at
                                        <xsl:value-of select="time"/>
                </title>
                <body>
                        <note><xsl:value-of select="msg"/></note>
                        <ul>
                                <xsl:apply-templates select="file"/>
                        </ul>
                </body>
        </section>
</xsl:template>
<xsl:template match="file">
        <li><path><xsl:value-of select="name"/></path>, <xsl:value-of select="revision"/></li>
</xsl:template>
</xsl:stylesheet>
}}


=== Wrapping it up ===
=== Project complete! ===


With the new doc system in place, I converted all our docs to the new format and posted the new docs on our existing site. In addition, I created a link to the gentoo-cvs mailing list subscription page. The key point here is that I integrated these features into the existing site so that users could benefit from the improvements right away.
Since the beginning of the Gentoo Linux Web site redesign, we've created a user-centric action plan, designed a new XML-based documentation system, a new logo, a new look for the site, converted all remaining parts to XML, and added a new XML-based Changelog page. Phew! I hope that you've enjoyed following my progress, and have found ample ideas and inspiration along the way. I've received several requests for more information and code related to the redesign, so I've set up a special Gentoo Linux XML Projects page that contains the most recent XML, XSLT, scripts, and documentation used for www.gentoo.org. In addition to visiting the Projects page, be sure to check out the valuable resources listed below.
{{ArticleFooter}}
{{ArticleFooter}}

Latest revision as of 01:47, January 2, 2015

The Final Touch of XML

Have you ever woken up one morning and suddenly realized that your cute little personal development Web site isn't really that great? If so, you're in good company. In this series, Daniel Robbins shares his experiences as he redesigns the Gentoo Linux Web site in March 2001, using technologies like XML, XSLT, and Python. This article: Daniel completes the conversion to XML/XSLT, fixes a host of Netscape 4.x browser compatibility bugs, and adds an auto-generated XML Changelog to the site.
   Support Funtoo!
Get an awesome Funtoo container and support Funtoo! See Funtoo Containers for more information.

A new look, but...

At the end of the previous article, the Gentoo Linux Web site had a completely new look, but there are still some things that aren't quite complete. In this article, the final installment in this series, I finally put those finishing touches on the site, resulting in a fully-functional, refined, and modular XML-based site that's ready for the future. Here's what was missing from the site since the last article:

Loose ends

First, while the site has a completely new look, only the documentation portion of the site is XML-based. The main "category" pages are still in raw HTML and need to be converted to an XML/XSLT solution to make things more maintainable and expandable.

Also, my developers have found several problems with the raw HTML itself. The site looks particularly bad when viewed under Netscape 4.77 -- obviously, this is a problem. Also, there are a number of other minor rendering problems that appear in more modern browsers, the most annoying of which is a thin vertical black line that does not extend completely down the entire page, ruining the illusion that the main content area is being spoken by our flying-saucer guy. Also, our documentation pages don't completely match the more refined look of our new main category pages -- clearly something worth updating.

The goal

Here's the plan for the final rework of the Gentoo Linux site. First, we'll totally rework the main page HTML, keeping the same overall look, but making the page more browser-compatible. At the same time, we'll add a few presentation-related refinements suggested by our visitors, and also fix browser compatibility problems with our existing "guide" documentation system.

Next, we'll completely move the site over to XML and XSLT. By the end of this article, any change made to the site will be made by modifying XML or XSLT rather than directly editing HTML, which will now be generated automatically with the help of xsltproc. This will make the site a whole lot easier to maintain. Because Gentoo Linux is a community-developed project, this will, in turn, allow our developers (and me) to maintain and improve the site as needed. I'm really excited about this since it will save us a bunch of time and ensure that our visitors are greeted with up-to-date content.

Compatibility issues

While Netscape 4.x is still a very widely used browser, it is difficult for me to decide exactly how many hoops to jump through in order to make the site look better when viewed through this browser. Should I merely ensure that the site is readable (without any major glitches) or should I do everything I can to make sure the site looks absolutely perfect under Netscape 4.x, even if that means using less or no CSS and adding strange compatibility hacks to the existing HTML?

In the end, I decide to make several major changes to the HTML so that the site will still look quite good under Netscape 4.x without focusing too much on minor bug-related table spacing and font-rendering issues. Here are some of the changes made to the site's HTML to get everything 4.x compatible. (The Gentoo Linux development team has submitted several of these fixes.)

First, Netscape 4.x has a bug that causes CSS background colors of block elements to be displayed incorrectly. For example, here's how a particular portion of a guide document is supposed to be rendered:

A sample guide document in IE5

And, here is how Netscape 4.x renders this same portion when background colors are specified using CSS:

A sample guide document in Netscape 4.7; some fixes are needed

This is ugly. To fix it, existing block-level elements, such as this one...

    - sample paragraph
<p class="note">This paragraph doesn't look so good in 4.x</p>

...were replaced with tables, as follows:

    - sample table
<table width="100%" border="0" cellpadding="0" cellspacing="0">
        <tr>
                <td bgcolor="#ddffff"><p class="note">
                This looks a whole lot better in 4.x</p></td>
        </tr>
</table>

This hack fixes the background-rendering problem. However, this "fix" also requires color information to be included in the HTML, which undermines the benefits of using CSS in the first place. This is an unfortunate situation, especially for fans of CSS like myself, but is required for Netscape 4.x compatibility.

Rebuilding the HTML

Now it's time to deal with the black vertical line that doesn't always extend all the way to the bottom of the screen. I have been unable to find a solution to this problem that works in both a 4.x and 5.x browser; every 5.x version has triggered bugs in Netscape 4.x, and every 4.x-compatible version looks horrible in a 5.x browser. So, I decide to simply remove the black line entirely: Finally, the site works in all popular browsers. Next, I will to create a guide-like syntax for creating the main pages.

Approaching the XML

Instead of implementing a completely new tagset for the main page, I think it would be a good idea to try to use as many of the "guide" XML documentation tags as possible (see part 2 of this series for more information on the guide XML format). So, I hack away at some new XSL, using my guide XSL as a template for my work. After an hour or two, I have a fully-functional set of XSL transformations for turning a guide-like syntax into an HTML main page. Revision 2 of the new main page looks like this:

The new main page revision

Now that the main page is using a new XML/XSLT backend, I direct my attention to the "guide" system's HTML output. Not only do I need to fix a host of Netscape 4.7 compatibility bugs, but I also need to further update the generated HTML and graphics so that they will match those of my newly-revised main page. Then the idea strikes me: Why not simply tweak my new main page XML/XSL just a little bit so that it can also generate HTML for my documentation? After all, I have just added support for nearly every "guide" XML tag, so that they can also be used for main page content.

This solution turns out to be really easy to implement. I just tweak the new XSLT file so that it will remove the left-hand "link bar" and perform a few other minor changes to the output HTML when it processes documentation pages. Since most of the XSLT is still the same, I can use a single set of master XSLT templates for both the guide documentation and the category pages:

How the new XSL works

Not only do I now have a single set of XSLT templates to maintain, but because both flavors of output HTML are based on the same master document, they now share the same CSS stylesheet. This means that there is no need to "synchronize the look" between two disparate sets of stylesheets and output HTML. And as you can see, the new documentation HTML is a perfect match for the new main page:

The new documentation pages perfectly match the new main page

The XML implementation

The actual implementation is quite easy; my existing guide XML syntax requires that every document be part of a single master <guide> element. To add support for main category pages, I create a new master element: <mainpage>. To create a main category page, I place everything inside a <mainpage> element instead of a <guide> element, and the XSLT makes the appropriate changes to the output. Besides this, the only major change required is the addition of an optional <sidebar> element that's used to specify the contents of the floating table on a main category page. The existing <guide> XSLT template looks something like this:

    (xml source code) - XSLT template
<xsl:template match="/guide">
        <html>
        <head>
                guide header goes here
        </head>
        <body>
                top part of guide body HTML content goes here
                <xsl:apply-templates select="chapter" />
                bottom part of guide body HTML content goes here
        </body>
        </html>
</xsl:template>

If you're not too familiar with XSLT, this template tells an XSLT processor to replace the <guide> </guide> tags with the shell of an HTML document, as well as recursively applying templates to any <chapter> elements (opening/closing tag pairs) inside the <guide> element and inserting the resultant output into the middle of the HTML shell.

So, to add support for the main category pages, I need to specify that a different HTML shell should be used if everything happens to be enclosed in a single <mainpage> element. To do this, I add a new template, as follows:

    (xml source code) - The new template
<xsl:template match="/mainpage">
        <html>
        <head>
                mainpage header goes here
        </head>
        <body>
                top part of mainpage body HTML content goes here
                <xsl:apply-templates select="chapter" />
                bottom part of mainpage body HTML content goes here
        </body>
        </html>
</xsl:template>

Because nearly every other XML element (from <chapter> all the way on down) produces identical HTML output for both guide and main category pages, almost every other XSLT template can be shared for both types of pages. Thus, we can get along just fine with a single XSLT file that specifies two "HTML shells" and a common set of XML-to-HTML XSLT templates. As always, code reuse is definitely a good thing.

The Changelog page

You'll remember that in Part 2 of this series I mentioned that the cvs2cl.pl CVS Changelog generation script could produce XML output and that I wanted to eventually use this feature as the basis for a daily CVS Changelog page that would appear on the new Web site. Now, with the new XML backend in place, adding the new Changelog page is a piece of cake. Here's an enhanced version of the cvslog.sh script that also takes care of handling the XML-to-HTML conversion:

    (bash source code) - Enhanced version of cvslog.sh script
#!/bin/bash
#various paths
HOMEDIR=/home/drobbins
CVSDIR=${HOMEDIR}/gentoo/gentoo-x86
OUTLOG=${HOMEDIR}/gentoo/xmlcvslog.txt
OUTMAIL=${HOMEDIR}/gentoo/cvslog.txt
WEBDIR=/usr/local/httpd/htdocs
XSLTP=/opt/gnome/bin/xsltproc
TMPFILE=${HOMEDIR}/gentoo/xmlcvslog.tmp
USER=drobbins
#if $CVSMAIL is undefined, set it to "yes"
if [ -z "$CVSMAIL" ]
then
        export CVSMAIL="yes"
fi
#the main script
cd $CVSDIR 
cvs -q update -dP
yesterday=`date -d "1 day ago 00:00" -R`
today=`date -d "00:00" -R`
cvsdate=-d\'${yesterday}\<${today}\'
nicedate=`date -d yesterday +"%d %b %Y %Z (%z)"`
#generate cvs2cl.pl XML output
/usr/bin/cvs2cl.pl --xml -f $OUTLOG -l "${cvsdate}" 
#use sed to remove "xmlns=" from cvs2cl.pl output
/usr/bin/sed -e 's/xmlns=".*"//' $OUTLOG > ${OUTLOG}.2
#convert cvs2cl.pl XML output to guide format using $XLSTP
$XSLTP ${WEBDIR}/xsl/cvs.xsl ${OUTLOG}.2 > $TMPFILE
#convert guide XML output to HTML format using $XLSTP
$XSLTP ${WEBDIR}/xsl/guide-main.xsl 
$TMPFILE > ${WEBDIR}/index-changelog.html
#fix perms
chmod 0644 ${WEBDIR}/index-changelog.html
#automatically send cvs mail if $CVSMAIL is set to "yes"
if [ "$CVSMAIL" = "yes" ]
then
        /usr/bin/cvs2cl.pl -f ${OUTMAIL} -l "${cvsdate}" 
        mutt -x gentoo-cvs -s "cvs log for $nicedate" > ${OUTMAIL} 
fi

While this script may look significantly more complicated than the earlier version, it really only contains four or five key additional lines; the rest of the additions are either comments or environment variable definitions.

Here's how the new XML-related parts of the cvslog.sh script work. First, we call cvs2cl.pl and instruct it to generate an XML-based Changelog containing all the files that were modified yesterday. Then, this XML output is run through sed to remove an unneeded xmlns= attribute from the XML. Next, we hand this slightly tweaked XML over to xsltproc and tell it to apply the processing found in cvs.xsl; these instructions transform the XML output from cvs2cl.pl's into a proper guide XML document. Finally, we again use xsltproc to convert this guide XML document into Web-ready HTML, which is piped into our Web server's htdocs directory. The generated HTML Changelog page is complete, and this is the result:

The automatically generated Changelog page

You might be surprised at the simplicity of the XSLT contained in cvs.xsl. In it, we specify three templates for <changelog>, <entry>, and <file>. We also make reference to a few other tags in the source XML, including <date>, <author>, and <msg> (which cvs2cl.pl uses to specify the CVS committer's comments). cvs.xsl does quite a bit considering that it is only around 35 lines long:

    (xml source code) - The cvs.xsl
<?xml version='1.0' encoding="iso-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="iso-8859-1" method="xml" indent="yes"/>
<xsl:template match="/changelog">
        <mainpage id="changelog">
        <title>Gentoo Linux Development Changelog for <xsl:value-of select="entry/date"/></title>
        <author title="script">cvs-xml.xsl</author>
        <standout>
                <title>About the Development Changelog</title>
                <body>
                        This page contains a daily Changelog, listing all modifications made to our
                        CVS tree on <xsl:value-of select="entry/date"/> (yesterday).
                </body>
        </standout>
        <version>1.0.0</version>
        <date><xsl:value-of select="entry/date"/></date>
        <chapter>
                <xsl:apply-templates select="entry"/>
        </chapter>
        </mainpage>
</xsl:template>
<xsl:template match="entry">
        <section>
                <title>Files modified by <xsl:value-of select="author"/> at 
                                        <xsl:value-of select="time"/>
                </title>
                <body>
                        <note><xsl:value-of select="msg"/></note>
                        <ul>
                                <xsl:apply-templates select="file"/>
                        </ul>
                </body>
        </section>
</xsl:template>
<xsl:template match="file">
        <li><path><xsl:value-of select="name"/></path>, <xsl:value-of select="revision"/></li>
</xsl:template>
</xsl:stylesheet>

Project complete!

Since the beginning of the Gentoo Linux Web site redesign, we've created a user-centric action plan, designed a new XML-based documentation system, a new logo, a new look for the site, converted all remaining parts to XML, and added a new XML-based Changelog page. Phew! I hope that you've enjoyed following my progress, and have found ample ideas and inspiration along the way. I've received several requests for more information and code related to the redesign, so I've set up a special Gentoo Linux XML Projects page that contains the most recent XML, XSLT, scripts, and documentation used for www.gentoo.org. In addition to visiting the Projects page, be sure to check out the valuable resources listed below.


   Note

Browse all our available articles below. Use the search field to search for topics and keywords in real-time.

Article Subtitle
Article Subtitle
Awk by Example, Part 1 An intro to the great language with the strange name
Awk by Example, Part 2 Records, loops, and arrays
Awk by Example, Part 3 String functions and ... checkbooks?
Bash by Example, Part 1 Fundamental programming in the Bourne again shell (bash)
Bash by Example, Part 2 More bash programming fundamentals
Bash by Example, Part 3 Exploring the ebuild system
BTRFS Fun
Funtoo Filesystem Guide, Part 1 Journaling and ReiserFS
Funtoo Filesystem Guide, Part 2 Using ReiserFS and Linux
Funtoo Filesystem Guide, Part 3 Tmpfs and Bind Mounts
Funtoo Filesystem Guide, Part 4 Introducing Ext3
Funtoo Filesystem Guide, Part 5 Ext3 in Action
GUID Booting Guide
Learning Linux LVM, Part 1 Storage management magic with Logical Volume Management
Learning Linux LVM, Part 2 The cvs.gentoo.org upgrade
Libvirt
Linux Fundamentals, Part 1
Linux Fundamentals, Part 2
Linux Fundamentals, Part 3
Linux Fundamentals, Part 4
LVM Fun
Making the Distribution, Part 1
Making the Distribution, Part 2
Making the Distribution, Part 3
Maximum Swappage Getting the most out of swap
On screen annotation Write on top of apps on your screen
OpenSSH Key Management, Part 1 Understanding RSA/DSA Authentication
OpenSSH Key Management, Part 2 Introducing ssh-agent and keychain
OpenSSH Key Management, Part 3 Agent Forwarding
Partition Planning Tips Keeping things organized on disk
Partitioning in Action, Part 1 Moving /home
Partitioning in Action, Part 2 Consolidating data
POSIX Threads Explained, Part 1 A simple and nimble tool for memory sharing
POSIX Threads Explained, Part 2
POSIX Threads Explained, Part 3 Improve efficiency with condition variables
Sed by Example, Part 1
Sed by Example, Part 2
Sed by Example, Part 3
Successful booting with UUID Guide to use UUID for consistent booting.
The Gentoo.org Redesign, Part 1 A site reborn
The Gentoo.org Redesign, Part 2 The Documentation System
The Gentoo.org Redesign, Part 3 The New Main Pages
The Gentoo.org Redesign, Part 4 The Final Touch of XML
Traffic Control
Windows 10 Virtualization with KVM