HTML and XHTML Tutorial
What is an HTML File?
- HTML stands for Hyper Text
Markup Language
- An HTML file is a text file
containing small markup tags
- The markup tags tell the Web
browser how to display the page
- An HTML file must have an htm
or html file extension
- An HTML file can be created
using a simple text editor
XHTML is a stricter and cleaner
version of HTML.
What Is XHTML?
- XHTML stands for EXtensible
HyperText Markup Language
- XHTML is aimed to replace
HTML
- XHTML is almost identical
to HTML 4.01
- XHTML is a stricter and
cleaner version of HTML
- XHTML is HTML defined as an XML
application
- XHTML is a W3C Recommendation
XHTML - Why?
XHTML is a combination of HTML and
XML (EXtensible Markup Language).
XHTML consists of all the elements
in HTML 4.01 combined with the syntax of XML.
The following HTML code will work
fine if you view it in a browser, even if it does not follow the HTML rules:<html> <head> <title>This is bad HTML</title> <body> <h1>Bad HTML </body> |
XML is a markup language where everything has to be marked up correctly, which results in "well-formed" documents.
XML was designed to describe data and HTML was designed to display data.
Today's market consists of different browser technologies, some browsers run Internet on computers, and some browsers run Internet on mobile phones and hand helds. The last-mentioned do not have the resources or power to interpret a "bad" markup language.
Therefore - by combining HTML and XML, and their strengths, we got a markup language that is useful now and in the future - XHTML.
XHTML pages can be read by all XML enabled devices AND while waiting for the rest of the world to upgrade to XML supported browsers, XHTML gives you the opportunity to write "well-formed" documents now, that work in all browsers and that are backward browser compatible !!!
How To Get Ready For XHTML
XHTML is not very different from
the HTML 4.01 standard.So, bringing your code up to the 4.01 standard is a good start. Our complete HTML 4.01 reference can help you with that.
In addition, you should start NOW to write your HTML code in lowercase letters, and NEVER skip ending tags (like </p>).
Happy coding!
The Most Important Differences:
- XHTML elements must be properly
nested
- XHTML elements must always be closed
- XHTML elements must be in lowercase
- XHTML documents must have one
root element
XHTML Elements Must Be Properly Nested
In HTML, some elements can be improperly
nested within each other, like this:<b><i>This text is bold and italic</b></i> |
In XHTML, all elements must be properly nested within each other, like this:
<b><i>This text is bold and italic</i></b> |
Note: A common mistake with nested lists, is to forget that the inside list must be within <li> and </li> tags.
This is wrong:
<ul> <li>Coffee</li> <li>Tea <ul> <li>Black tea</li> <li>Green tea</li> </ul> <li>Milk</li> </ul> |
This is correct:
<ul> <li>Coffee</li> <li>Tea <ul> <li>Black tea</li> <li>Green tea</li> </ul> </li> <li>Milk</li> </ul> |
Notice that we have inserted a </li> tag after the </ul> tag in the "correct" code example.
XHTML Elements Must Always Be Closed
Non-empty elements must have an
end tag.This is wrong:
<p>This is a paragraph <p>This is another paragraph |
This is correct:
<p>This is a paragraph</p> <p>This is another paragraph</p> |
Empty Elements Must Also Be Closed
Empty elements must either have
an end tag or the start tag must end with />
.This is wrong:
A break: <br> A horizontal rule: <hr> An image: <img src="happy.gif" alt="Happy face"> |
This is correct:
A break: <br /> A horizontal rule: <hr /> An image: <img src="happy.gif" alt="Happy face" /> |
XHTML Elements Must Be In Lower Case
The XHTML specification defines
that the tag names and attributes need to be lower case. This is wrong:
<BODY> <P>This is a paragraph</P> </BODY> |
<body> <p>This is a paragraph</p> </body> |
XHTML Documents Must Have One Root Element
All XHTML elements must be nested
within the <html> root element. All other elements can have sub
(children) elements. Sub elements must be in pairs and correctly nested within
their parent element. The basic document structure is:<html> <head> ... </head> <body> ... </body> </html> |
Mandatory XHTML Elements
All XHTML documents must have a
DOCTYPE declaration. The html, head and body elements must be present, and the
title must be present inside the head element.This is a minimum XHTML document template:
<!DOCTYPE Doctype goes here> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Title goes here</title> </head> <body> </body> </html> |
Note: The DOCTYPE declaration is not a part of the XHTML document itself. It is not an XHTML element, and it should not have a closing tag.
The XHTML standard defines three
Document Type Definitions.
The most common is the XHTML
Transitional.
<!DOCTYPE> Is Mandatory
An XHTML document consists of
three main parts:- the DOCTYPE
- the Head
- the Body
<!DOCTYPE ...> <html> <head> <title>... </title> </head> <body> ... </body> </html> |
The DOCTYPE declaration should always be the first line in an XHTML document.
An XHTML Example
This is a simple (minimal) XHTML
document:<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> <head> <title>simple document</title> </head> <body> <p>a simple paragraph</p> </body> </html> |
The DOCTYPE declaration defines the document type:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
The rest of the document looks like HTML:
<html> <head> <title>simple document</title> </head> <body> <p>a simple paragraph</p> </body> </html> |
A public identifier is a document processing construct in SGML and XML.
In HTML and XML, a public identifier is meant to be universally unique within its application scope. It typically occurs in a Document Type Declaration.
A public identifier is meant to identify a document type that may span more than one application. A system identifier is meant for a document type that is used exclusively in one application.
In the following Document Type Declaration, the public identifier is
-//W3C//DTD XHTML 1.0
Transitional//EN
:<!DOCTYPE
html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
The 3 Document Type Definitions
- DTD specifies the syntax of a
web page in SGML.
- DTD is used by SGML
applications, such as HTML, to specify rules that apply to the markup of
documents of a particular type, including a set of element and entity
declarations.
- XHTML is specified in an SGML
document type definition or 'DTD'.
- An XHTML DTD describes in
precise, computer-readable language, the allowed syntax and grammar of
XHTML markup.
- STRICT
- TRANSITIONAL
- FRAMESET
XHTML 1.0 Strict
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
XHTML 1.0 Transitional
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
Use this when you need to take advantage of HTML's presentational features and when you want to support browsers that don't understand Cascading Style Sheets.
XHTML 1.0 Frameset
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"> |
Use this when you want to use HTML Frames to partition the browser window into two or more frames.
How W3Schools Was Converted To XHTML
W3Schools was converted from HTML
to XHTML the weekend of 18. and 19. December 1999, by Hege Refsnes and Ståle
Refsnes.To convert a Web site from HTML to XHTML, you should be familiar with the XHTML syntax rules of the previous chapters. The following steps were executed (in the order listed below):
A DOCTYPE Definition Was Added
The following DOCTYPE declaration
was added as the first line of every page:<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
Note that we used the transitional DTD. We could have chosen the strict DTD, but found it a little too "strict", and a little too hard to conform to.
A Note About The DOCTYPE
Your pages must have a DOCTYPE
declaration if you want them to validate as correct XHTML.Be aware however, that newer browsers (like Internet Explorer 6) might treat your document differently depending on the <!DOCTYPE> declaration. If the browser reads a document with a DOCTYPE, it might treat the document as "correct". Malformed XHTML might fall over and display differently than without a DOCTYPE.
Lower Case Tag And Attribute Names
Since XHTML is case sensitive, and
since XHTML only accepts lower case HTML tags and attribute names, a general
search and replace function was executed to replace all upper case tags with lowercase
tags. The same was done for attribute names. We have always tried to use lower
case names in our Web, so the replace function did not produce many real
substitutions.
All Attributes Were Quoted
Since the W3C XHTML 1.0
Recommendation states that all attribute values must be quoted, every page in
the web was checked to see that attributes values were properly quoted. This
was a time-consuming job, and we will surely never again forget to put quotes
around our attribute values.
Empty Tags: <hr> , <br> and
<img>
Empty tags are not allowed in
XHTML. The <hr> and <br> tags should be replaced with <hr />
and <br />.This produced a problem with Netscape that misinterpreted the <br/> tag. We don't know why, but changing it to <br /> worked fine. After that discovery, a general search and replace function was executed to swap the tags.
A few other tags (like the <img> tag) were suffering from the same problem as above. We decided not to close the <img> tags with </img>, but with /> at the end of the tag. This was done manually.
The Web Site Was Validated
After that, all pages were
validated against the official W3C DTD with this link: XHTML Validator. A few more errors were found and edited manually. The most
common error was missing </li> tags in lists. Should we have used a converting tool? Well, we could have used TIDY.
Dave Raggett's HTML TIDY is a free utility for cleaning up HTML code. It also works great on the hard-to-read markup generated by specialized HTML editors and conversion tools, and it can help you identify where you need to pay further attention on making your pages more accessible to people with disabilities.
The reason why we didn't use Tidy? We knew about XHTML when we started writing this web site. We knew that we had to use lowercase tag names and that we had to quote our attributes. So when the time came (to do the conversion), we simply had to test our pages against the W3C XHTML validator and correct the few mistakes. AND - we have learned a lot about writing "tidy" HTML code
An XHTML document is validated
against a Document Type Definition.
Validate XHTML With A DTD
An XHTML document is validated
against a Document Type Definition (DTD). Before an XHTML file can be properly
validated, a correct DTD must be added as the first line of the file.The Strict DTD includes elements and attributes that have not been deprecated or do not appear in framesets:
!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"The Transitional DTD includes everything in the strict DTD plus deprecated elements and attributes:
!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"The Frameset DTD includes everything in the transitional DTD plus frames as well:
!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"This is a simple XHTML document:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> <head> <title>simple document</title> </head> <body> <p>a simple paragraph</p> </body> </html> |
0 comments:
Post a Comment