A simple HTML document is illustrated in Figure 4-1.
<TITLE>The World-Wide Web</TITLE> <H1>About The World-Wide Web</H1> <P>The World-Wide Web is a <EM>distributed multimedia hypertext</EM> system.</P>Figure 4-1 A Simple HTML Document.
Structural elements in the document are identified by start and end markup tags. For example the <TITLE> and </TITLE> tag is used to specify the title of the document, which is often displayed by a client. The <H1> and </H1> tag is used to define the first level heading. Clients will normally display headers differently from the body text: for example, a graphical client could display the header using a larger or different font, whereas a text-based client could display a header as centred text or in all capitals.
Figure 4-1 also illustrates the <EM> container. Text held in the container (which is defined by the <EM> start tag and the </EM> end tag) will be emphasised in some way. A graphical browser could render the emphased text by displaying it in italics, whereas a browser with audio capabilities for the visually impaired could render the emphasis by a change in the tone of the voice output.
Figure 4-1 also shows the paragraph container. It is important to understand that the <P> tag is part of a paragraph container and is no longer a paragraph separator (as many people mistakenly believe). If the </P> is not used the existence of the next <P> tag will imply a </P>. In future versions of HTML it will be possible to specify paragraph attributes: for example <P ALIGN=Centred>.
Although browsers will display the HTML document shown in Figure 4-1, for reasons of performance and upwards compatibility it is strongly recommended that HTML documents contain additional elements including the <HTML>, <HEAD> and <BODY> tags, as shown in Figure 4-2.
<HTML> <HEAD> <TITLE>The World-Wide Web</TITLE> </HEAD> <BODY> <H1>About The World-Wide Web</H1> <P>Information about the World-Wide Web is available <A HREF="http://info.cern.ch/hypertext/WWW/TheProject.html"> at CERN</A>.</P> </BODY> </HTML>Figure 4-2 A Simple HTML Document.
The <HTML> container is used to define the extent of the HTML document. Within the HTML document there are two other containers: <HEAD> and <BODY>. The <HEAD> container provides information about the document itself. This can include the title of the document (as illustrated) copyright information, keywords and expiry dates (for use by caching software). It is important to make use of the <HEAD> tag since, for example, an automatic indexing program which wishes to index the title of HTML documents can parse only the information contained in the <HEAD> container. If the <HEAD> container is not present the entire document may have to be parsed, which will place unnecessary extra load on the server.
Figure 4-2 also illustrates the use of the anchor <A> container. This tag is used to provide hypertext links. In the example the text at CERN which is contained between the <A> and </A> tags will be highlighted in some way by the browser. Selecting this highlighted phrase will cause the client to send a request for http://info.cern.ch/hypertext/WWW/TheProject.html This request will use the http protocol and will be sent to the server running on the system at info.cern.ch
HTML Authoring Tools
Initially information providers on the World-Wide Web used standard
editors such as vi and emacs to create HTML documents. As WWW grew in
popularity authoring tools were developed to assist information providers.
This section describes the following authoring tools which are available for
the Microsoft Windows environment: HTML Assistant,
HTML Hyperedit, HTMLEd and InContext Spider.
HTML Assistant
HTML Assistant is a simple authoring tool which can be used to create
and edit HTML documents. Frequently Asked Questions about HTML Assistant is
available at the URL
http://cs.dal.ca/ftp/htmlasst/htmlafaq.html
HTML Assistant is available at the URL
ftp://ftp.cica.indiana.edu/pub/pc/win3/misc
In the UK it is available at the URL
ftp://src.doc.ic.ac.uk/packages/WWW/tools/editing/ms-windows/html-assistant
HTML Hyperedit
HTML Hyperedit (which was developed using the Toolbook authoring system)
not only provides an environment for producing HTML documents, but also
contains a tutorial which gives an introduction to HTML. HTML Hyperedit is
available at the URL
ftp://info.curtin.edu.au/pub/internet/mswindows/hyperedit
In the UK it is available at the URL
ftp://src.doc.ic.ac.uk/packages/WWW/tools/editing/ms-windows/win-htmledit
HTMLEd
HTMLEd is a simple authoring tool which can be used to create HTML
documents. In the UK it is available at the URL
ftp://src.doc.ic.ac.uk/packages/WWW/tools/editing/ms-windows/
Further information about InContext Spider is available at the URL "http://www.incontext.ca/
Figure 4-7 HotDog
Word Processing Tools
HTML Assistant and HTML Hyperedit are self-contained authoring tools.
Another approach is to develop authoring tools which work within a word
processing environment. These tools are normally implemented as macros for
popular word processing packages, such as Word For Windows or WordPerfect.
This section describes three tools which have been developed for use within
Word For Windows: the GT_HTML, CU_HTML and ANT_HTML
macros.
Word processing tools have the advantage that they provide a consistent environment for existing users of word processors. However they do have their disadvantages. Because they are normally implemented as macros, they can be very slow, especially when used with large or complicated documents. There is also a danger that HTML markup which is embedded as hidden text could cause conflicts with other word processing tools if, for example, the word processed document was used by other users.
GT_HTML
One of the first word processing macros which could be used to create
HTML documents was the GT_HTML macro. This macro, written for Word For
Windows, was developed at the Georgia Technical Research Institute. In the UK
the software is available at the URL
ftp://src.doc.ic.ac.uk/packages/WWW/tools/editing/macros/ms-winword
CU_HTML
CU_HTML is a template designed to work within Word For Windows. The
template was written by Anton Lam
(mailto:anton-lam@cuhk.hk)
The software is
available at the URL
ftp://ftp.cuhk.hk/pub/www/windows/util
ANT_HTML
ANT_HTML is a template designed to work within Word For Windows 6.0.
The template was written by Jill Swift
(mailto:jswift@freenet.fsu.edu)
The software is available at the URL
ftp://ftp.einet.net/einet/pc/ANT_HTML.ZIP
Figure 4-10 The ANT_HTML Macro.
Figure 4-11 Internet Assistant
Browser Editing Tools
Another approach to editing HTML documents is provided by browsers which
are integrated with editing tools. The Arena browser enables an external
editor to be invoked to edit the displayed HTML document. Figure 4-12
illustrates the Arena browser used in conjunction with the Emacs editor.
Figure 4-12 Editing A Document From Arena.
HTML Document Conversion Tools
Authoring tools are normally used to create new HTML documents.
Document conversion tools, on the other hand, can be used to convert existing
documents to HTML format.
LaTeX2html
One of the first sophisticated document conversion tools to be developed
was the LaTeX2html conversion program. This program was written by Nikos
Drakos, Computer Based Learning Unit, University of Leeds. It set the
standard for document converters, providing a wide range of feature
including:
Figure 4-13 A Document Converted Using LaTeX2html.
LaTeX2html is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/tools/translators/latex2html Further information is available at the URL http://cbl.leeds.ac.uk/nikos/doc/www94/www94.html
RTFtohtml
The RTFtohtml conversion program enables RTF files (which can be
produced by word processing packages such as Word For Windows) to be converted
to HTML. The program was written by Chris Hector (Cray) based on RTF parsing
software developed by Paul DuBois.
RTFtohtml is available as a command line tool for a number of Unix platforms. In addition an Apple Macintosh implementation is available. A beta version of an MSDOS implementation was announced in November 1994.
An extension of the RTFtohtml program is known as RTFtoweb. This provides a number of additional features, including creation of hypertext links at user defined section breaks. Figure 4-14 illustrates a document on Exploring The World-Wide Web Using Mosaic For Windows which is available at the URL http://www.leeds.ac.uk/ucs/docs/tut50/tut50.html
Figure 4-14 Document Converted Using RTFtoweb.
In Figure 4-14 it should be noted that the document is automatically split into a number of files. A hypertext table of contents is automatically generated. Chevrons (>> and <<) are also generated automatically which can be used to move to the next or previous section.
Further information about RTFtohtml is available at the URL ftp://ftp.cray.com/src/WWWstuff/RTF/rtftohtml_overview.html The software is available at the URL ftp://ftp.cray.com/src/WWWstuff/RTF/latest/ In the UK it is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/tools/translators/rtftohtml
RTFtoweb is available at the URL
ftp://ftp.rrzn.uni-hannover.de/pub/unix-local/misc/rtftoweb/html/rtftoweb.html
HTML Quality Tools
The HTML specification states that "HTML parsers should be liberal
except when verifying code. HTML generators should generate strictly
conforming HTML." Put simply this means that browsers should be capable of
displaying documents which contain invalid HTML, but HTML authoring tools and
document converters should generate HTML which conforms strictly to the
standard.
A number of HTML validation tools are available which can validate HTML documents. A number of popular tools are described below.
HoTMetal
HoTMetaL is an HTML authoring tool and validator. It will provide
feedback if it encounters invalid HTML, as illustrated in Figure 4-15.
HoTMetaL is available for the X and Microsoft Windows platforms. Two versions of the software are available: a public domain version and a licensed version. HoTMetaL Pro, the licensed version, can be used to import and validate an existing document. The public domain version will give an error and refuse to load a document which contains invalid HTML.
HoTMetaL is available at the URL
ftp://src.doc.ic.ac.uk/packages/WWW/Mosaic/html/hotmetal
Weblint
A tool called weblint can be used to check for invalid HTML
documents. This software is available from the URL
ftp://ftp.khoros.unm.edu/pub/perl/www/weblint-1.000.tar.gz
In the UK it is available at the URL
ftp://src.doc.ic.ac.uk/packages/WWW/tools/weblint
SGMLS
sgmls is a tool which can be used to validate SGML documents. It is
available at the URL
ftp://sgml1.ex.ac.uk/pub/SGML/sgmls/
The sgmls software is used in a
number of HTML validation services, such as those mentioned above. Information
on installing sgmls and also pgmls (an SGML mode for emacs) is available
at the URL
http://web.nexor.co.uk/users/mak/doc/html/sgml-lib/html-sgml.html
HTML Validation Service
An HTML validation service is available at the URL
http://www.hal.com/~markg/WebTechs/validation-form.html
This service makes use of HTML forms and a CGI script which runs a HTML
validation
program. The service can be used to check HTML syntax by entering the HTML
markup to be checked. It can also be used to check an existing HTML document
by entering the URL of the document.
Figure 4-16 HTML Validation Service.
A variation on this service is available at the URL http://www.cc.gatech.edu/grads/j/Kipp.Jones/HaLidation/validation-form.html
These services make of the sgmls validation program.
The software can be installed on your local Unix system. It is available at the URL ftp://ftp.hal.com/pub/CGI/check-html.tar.Z
HTML Check Toolkit
The HTML Check Toolkit is another HTML validation program. The software
can be installed using a WWW browser. The installation service, illustrated
below, is based on the EIT Webmaster Starter's Kit. HTML Check Toolkit is
available at the URL
http://www.hal.com/~markg/HaLSoft/html-check/
Figure 4-17 Installing The Check_HTML Script.
Review of HTML Tools
Before choosing HTML authoring tools, document converters or quality
tools for institutional use the following issues should be considered:
Support Who wrote the software - an experienced software developer or a student as part of a computer project? Will the software continue to be developed and supported?
Quality Does the software produce valid HTML?
Functionality What facilities does the software provide?
Other Issues If the software is based on a word processing package, what happens if the word processed document needs to be used by another word processor?
Writing Style
Writing styles for WWW documents are still developing. However there
are a number of guidelines which can be provided: