Structure
Discussion
Anyone who has spent even a short while trying to learn how best to present documents on the web will know that there is a huge amount of information about how to write HTML codes and how to program the various other technologies with which we enfold our data, but there is precious little about how to exploit the inherent power of HTML, to add meaning to the structure our text at the same time as ensuring it is readable, malleable and relatively future-proof.
In recent years the few sane voices in the wilderness (here I cite "Yucca", Zeldman, et al.) have been joined by many more who have decided to leave behind the world of code-bodging and take the straight road to a brighter horizon. (I count myself amongst the ex-bodgers—it felt wrong at the time, but I knew no better and only found out as the 21st Century dawned that there is a Better Way.)
Yet there is something missing: where is the guidance as to how to structure the page; how to order the content such that it is—yes—valid code, but also well structured for the reader and writer as well as the machine?1
The intersection
The content of a web page which we see on the screen (or hear or feel) is at the intersection of several very different environments. There is the text itself, including the structure and order which the author desired to impose on it; there is the code which interweaves the text, amplifying the meaning and structure (or at least so should it); there is the web of connections which is the World Wide Web, linking the text via the code to the text of a myriad other documents; there is the technology which allows the author to create the document and that—sometimes very different—technology which allows the reader to consume it; there are the standards which support the several computer languages used in the code of the document; there are the conventions which have applied to texts for centuries prior to the development of hypertext;
Standard parts
Over the centuries, the structure of reading matter has evolved into a set of parts which is fairly standard. There is no reason why a web page should vary much from this standard set though there are some peculiarities which make some small changes advisable.
The standard part are broadly:
- contents
- introduction
- acknowledgements
- foreword
- summary
- abstract
- text
- conclusion
- footnotes
- bibliography
- notes
Each of these has a specific role to play in the structure of the document, which roles we will discuss shortly.
Headings
Valid XHTML allows headings to be included in pages in any order of level which pleases the author. An <h1></h1> may follow an <h4></h4> with impunity.
But this does nothing to make our lives easier—in fact for anyone wishing to be ordered and organized for the sake of their readers it means that we have to come up with our own rules to follow.
This is especially important if we want to do any automatic processing of HTML documents, such as to generate well-structured contents lists. Such processing requires uniformity and order to a degree not actually required by the XHTML standards.
The rules I suggest for the structure of headings include the following [rough draft]
- Each page should include only one level 1 heading (H1)
- This should be assumed to be the title of the document.
- Subtitles should not be a separate heading
-
We handle subtitles by enclosing them in a
<span>of class "subtitle". - There should be no heading level directly below H1 except H2
- This means that all chapter or section headings are the same level. If they need to be differentiated then a class or id can be employed.
- Headings outside the main flow start again at H2
- If something like a box or table is outside the flow, no matter where it appears in the hierarchy, we revert to level 2 and step down from there if required.
- Never skip a level downwards
- Any heading within the flow which is below another heading in the hierarchy should be only one level down.
