Tags, elements and attributes
Tag basics
To apply structure to text on a web page the text is enclosed in tags to create structural elements. The tags are invisible to the viewer of the page but they do the vital job of telling the user's browser software what to do with the content. In order that the page behaves properly the tags are required to comply with the proper syntax of the mark-up language being used, such as HTML or XHTML.
A tag consists of the element name enclosed in angle brackets (< and >), such as <p> (an opening paragraph tag). All tags must be closed, usually with a closing tag the same as the plain tag but with a slash after the first bracket: </p>.
An element usually has three parts: an opening tag followed by the content and then the closing tag.
In XHTML some elements do not contain any text and are single tags. These are closed by including a space and a slash before the closing bracket: <br />. (Another way to close single tags is to use a separate closing tag such as <br></br>, however this can cause problems for older browsers and although it is "legal" code, it should be avoided.)
There are several classes of tags used in HTML and XML, and many varieties of tag—in the case of pure XML an infinite variety is allowed, but although we have half an eye on XML this variety won't concern us for now.
The most common element is the paragraph:
<p>
This block of text is a paragraph enclosed in paragraph tags.
</p>
It starts with an opening tag <p>, then comes the contents of the paragraph, then there's the tag which closes the paragraph: </p>. Closing tags always start with the "/" character. This makes a paragraph element.
Within a paragraph, other text could be wrapped in tags too:
<p>
This block of text is a <em>paragraph</em> enclosed in <em>paragraph tags</em>, with some words given emphasis.
</p>
Which would render like this:
This block of text is a paragraph enclosed in paragraph tags, with some words given emphasis.
Tags define how the browser will treat the contents between the tags, and it will treat everything the same until it encounters another tag. In the first example above, the browser would display all the text between the <p></p> tags the same, but in the second example the opening <em> tag prompts the browser to change how it treats the following text (visual browsers render it as italic). Once the browser meets the closing </em> tag it goes back to displaying the text in the usual way for that paragraph.
Tag everything
The first rule of tags is that everything gets tagged. The rules of HTML do allow text to appear on the web page whether it is tagged or not, but our principles of good code say that we must not preclude future uses of the content, and untagged content wouldn't work well with XML—so we tag everything.
What this means is that everything has a tag "wrapper". With the exception of white space, there is not even a punctuation mark outside a block-level tag.
Tags are lowercase
The second rule of tags is that all tags are lowercase. There are no exceptions. Uppercase tags are no longer valid mark-up.
Tag varieties
Tags break down into two main classes, and then there are some special cases (like lists and tables). These classes are inline elements and block elements. These are sometimes referred to as inline tags and block-level tags (or just block tags).
- Block-level tags
-
The paragraph tag
<p></p>is a Block-level tag, as are heading tags (such as<h1></h1>), and the<div></div>which is often used to group together paragraphs, headings and pretty much any other items which need to be treated the same way. Block level tags usually create an item which starts on a new line, such as a new paragraph or heading.Some block-level tags are allowed to contain other block-level tags, for instance
<div></div>can contain as many paragraphs, headings and other<div></div>tags as are needed. But paragraphs and headings can never contain block tags. - Inline tags
-
Inline tags are used within block tags to add function, such as the anchor tag. Inline tags can never contain block tags but they can (and often do) contain other inline tags. When they do so the tags must be nested correctly.
Phrase elements are another kind of inline tag, used to strengthen or emphasise text, or add depth in other ways.
Block-level elements build the broad structure supporting the text, and inline elements add detail and function.
Nesting
The third rule of tags is that they are always nested within each other—enclosed between another element's opening and closing tags. This gives the properties from two different tags to the same item of content. In a paragraph where some text is given emphasis, the <em></em> tag is nested inside the paragraph tag.
A tag which isn't properly nested is overlapped. While most browsers tolerate overlapped tags, it is another example of code which is not allowed in XHTML.
Inline tags can be nested too:
<p>
Text can be <em>emphasised</em> or <strong>strong</strong>, or it can be both <em><strong>emphasised and strong</strong></em>.
</p>
Text can be emphasised or strong, or it can be both emphasised and strong.
In this example it wouldn't matter which of the nested tags were inside the other, as long as they are nested properly. The mark-up could be either:
<em><strong>emphasised and strong</strong></em>
or:
<strong><em>emphasised and strong</em></strong>
but not:
<strong><em>emphasised and strong</strong></em>
Attributes
Tags can have attributes, although most tags can be used without attributes. Examples of tags which always have attributes are the inline tags for anchor (<a>) and image (<img />).
An anchor element might look like this:
<a href="index.html">Home page</a>
An image element might look like this (notice that this is a single tag so in XHTML it must close itself):
<img src="picture.jpg" />
The attribute of the anchor tag is the href="index.html" part. It consists of an attribute name (href) and an attribute value (index.html) joined together with an equals sign (=). The attribute name is always lowercase and the attribute value is always enclosed in double quotes. The value can be uppercase or lowercase and can contain spaces. Any non-US ASCII characters in the value need to be encoded with character entities:
<img src="picture.jpg" alt="Punch & Judy" />
Attributes are only ever applied to the opening tag of an element, the closing tag must remain plain.
(see also the separate discussion regarding anchors)
Useful attributes are name, id, title and class. Name can be used to create an anchor to which a link can be addressed, and similarly id can act as a target for a link although it is less well supported by browsers. On any page there can be only one element with the same name or id and it must start with a letter A-Z or a-z. Because each id attribute is unique on the page, it can be used to identify a particular instance of of an element, which can be very useful. In this respect, id is more powerful than name, so both should be used.
The title attribute can be used with many tags in order to provide a "tooltip" pop-up explanation. For instance the <abbr></abbr> (abbreviation) tag can be given a title attribute with the full text of the expanded abbreviation. The link anchor "abbreviation" itself has a title attribute which shows as a tooltip or in the status bar of the browser, or is read aloud by a screen reader.
Class is used to identify a particular kind of item and the same class may be used more than once on the same page. Note that there are particular rules about using class attributes which are too long to go into here. Those classes which are to be used on this site are (will be) listed separately.
Some rules about tags
- Nothing (except white space) should be outside a block-level tag.
- Tag names are always lowercase
- Every opening tag must have a closing tag.
- Single-tag elements must be formed correctly.
- Tags must be nested and not overlap.
