• 4 HTML
  • 5 Graphics
  • 6 Searching And Indexing
  • 7 WWW Servers
  • 8 Extending WWW
  • 9 Utilities
  • 10 Legal and Ethical Issues
  • 11 CWISes And WWW
  • 12 Teaching And Learning On WWW
  • 13 Collaboration On WWW
  • 14 Libraries And WWW
  • 15 Future Developments
  • Appendix 1Mailing Lists

    1 Introduction

    Aim of this Document

    This handbook, Running A World-Wide Web Service, has been funded by the Advisory Group On Computer Graphics (AGOCG) through the Support Initiative for Multimedia Applications (SIMA) to provide support for UK academic institutions who wish to run a World-Wide Web service. The objectives of the document are to provide the reader with:

    The handbook also gives a number of examples of the use of WWW in the following areas:

    The handbook also provides pointers to a variety of sources of further information including:

    Target Audience

    This document is intended primarily for the UK academic community. It should be suitable for computing service or administrative staff responsible for managing a World-Wide Web service, and for academic staff who wish to run a departmental service.

    2 About The World-Wide Web

    History

    The World-Wide Web (which is often referred to as W3, the Web or, as used in this document, WWW) is a distributed multimedia hypertext system. What is meant by this?

    Distributed: information on WWW may be located on computer systems around the world.

    Multimedia: the information held on WWW can include text, graphics, sound and even video.

    Hypertext: access to the information is available using hypertext techniques, which typically involve using a mouse to select highlighted phrases or images. Once a phrase or image is selected it can result in information being retrieved from around the world.

    The World-Wide Web was initially developed by Tim Berners-Lee and Robert Cailliau of CERN Laboratories, Geneva to provide an infrastructure for particle physicists throughout Europe to share information. Since the physicists were located in various organisations and used a variety of computer systems and applications software (including various word processing and text markup programs for producing reports) the World-Wide Web was developed using the client-server architecture, which ensured cross-platform portability.

    Client-Server Architecture

    The World-Wide Web is based on the client-server architecture which is illustrated in Figure 2-1.


    Figure 2-1 WWW Client-Server Architecture.

    The end user accesses the World-Wide Web using a browser client, typically on a desktop machine such as a PC, Macintosh or Unix workstation. The client will display hypertext links in some manner, such as underlining the links. By selecting a link (by clicking a mouse button with a graphical client, typing the number following the link using a simple text-based client or using speech or foot pedals, for example, with browsers for disabled users) a request is sent over the network (which could be a local network, a national network such as JANET, or over the global network which can be referred to as the Internet). The request is sent to a World-Wide Web server, which typically runs on a powerful computer system. The server will retrieve the file which has been requested and deliver it to the client.

    Once the client has started to retrieve the file it can display it on the local machine. If the client cannot display the file (many clients, for example, cannot view video clips) the client can pass the file on to an external viewer which can process the file.

    This is a very simple overview of the WWW client-server architecture. Many other features are available: for example the server could send a message to the client, saying that the user is not authorised to access the file. However an understanding of this model will help you to see how the WWW can develop.

    Early Browsers

    One of the first browsers to be developed was the CERN command line browser. This can be accessed by using the command:

    telnet telnet.w3.org

    from a computer system which runs the telnet software. An example of use of the CERN command line browser is illustrated below.

    
    telnet telnet.w3.org
    
                     Welcome to the World-Wide Web
    THE WORLD-WIDE WEB
     
    This is just one of many access points to the web, the universe of
    information available over networks. To follow references, just type the
    number then hit the return (enter) key.
     
    The features you have by connecting to this telnet server are very
    primitive compared to the features you have when you run a W3 "client"
    program on your own computer. If you possibly can, please pick up a client
    for your platform to reduce the load on this service and  experience the
    web in its full splendor.
     
    For more information, select by number:
     
    A list of available W3 client programs[1]
    Everything about the W3 project[2]
    Places to start exploring[3]
    The First International WWW Conference[4]
     
    This telnet service is provided by the WWW team at the European Particle
    Physics Laboratory known as CERN[5]
    [End]
    1-5, Up, Quit, or Help:
    
    Figure 2-2 The CERN Command Line Browser.

    Notice that in the CERN command line browser in order to select a hypertext link you need to type the number which follows the link.

    The CERN command line browser is a very simple client. The first WWW browser was developed by Tim Berners-Lee, the father of the World-Wide Web, for the NeXT system. However the NeXT hardware was not a commercial success and is no longer manufactured. One of the earliest graphical browsers was the Viola client which was developed for the X windows environment. Viola is illustrated in Figure 2-3.


    Figure 2-3 The Viola Client.

    Notice that in the Viola client the hypertext links are identified by the use of underlining.

    Growth In Popularity

    As shown in Figure 2-4 use of the WWW has grown tremendously since 1993. This chart, which compares the growth of WWW with a simpler distributed information system known as Gopher, is available at the URL ftp://ftp.isoc.org/isoc/charts/networks-gifs (the term URL will be explained later in this chapter). Much of this growth in popularity was due to the release of browsers for the X Windows, PC and Macintosh environments by the National Center For Supercomputing Applications (NCSA) at the University of Illinois.


    Figure 2-4 Growth In Popularity of WWW.

    Since CERN's remit was research in particle physics the WWW development team realised that they needed to involve other organisations in WWW development work. The involvement of NCSA in the WWW development programme resulted in the NCSA Mosaic For X, which was released in early 1993. An illustration of a pre-release version of Mosaic For X is shown in Figure 2-5.


    Figure 2-5 A Pre-release Version Of NCSA Mosaic For X.

    As can be seen from Figure 2-5 NCSA Mosaic For X provides access to a number of types of resources, including WAIS, Gopher, FTP, Usenet, Hytelnet, TeXinfo, X.500 and Whois resources. NCSA Mosaic was developed by a group of programmers at NCSA led by Marc Andreessen. NCSA Mosaic For X became such a success because:

    In November 1993 NCSA released versions of Mosaic for the Microsoft Windows and Apple Macintosh environment. These browsers, which are freely available to the academic community, provided access to WWW for people who did not have access to Unix and X-Windows systems.

    Examples of Usage

    A number of examples of how the World-Wide Web is currently being used are given below. These are just a few examples of the many thousands of WWW services which are currently available.

    Publishing Research Information

    Figure 2-6 illustrates how CERN (the European Particle Physics Laboratory) makes its technical papers available on the World-Wide Web. The URL for the paper illustrated is http://www1.cern.ch/ALICE/ENGINEERING/engineering.html


    Figure 2-6 Example of Scientific Information Held At CERN.

    Campus Wide Information Systems

    The Honolulu Community College Campus Wide Information System (CWIS) was the first multimedia CWIS on the World-Wide Web. The URL for this CWIS is http://www.hcc.hawaii.edu/


    Figure 2-7 The Honolulu Community College CWIS.

    Teaching Applications

    The Globewide Network Academy (GNA) won a Best of the Web 1994 award for the Introduction to Object Oriented Programming Using C++ distributed teaching application. The URL for this application is http://uu-gna.mit.edu:8001/uu-gna/text/cc/index.html


    Figure 2-8 A Distributed Teaching Application.

    Publicity

    The School of Computer Studies at the University of Leeds was one of the first departments to use the multimedia capabilities of WWW to market its courses to potential students. The URL for this application is http://agora.leeds.ac.uk/WWW/MSc/MSc_text/leeds.html


    Figure 2-9 University of Leeds Prospectus Information.

    Virtual Libraries

    Many virtual libraries, art galleries and exhibitions are available on the World-Wide Web. One of the first was the Vatican exhibition. The URL for this virtual exhibition is http://sunsite.unc.edu/expo/vatican.exhibit/Vatican.exhibit.html


    Figure 2-10 The Vatican Exhibition.

    Commercialisation Of WWW

    The World-Wide Web is increasingly being used by commercial companies. For example the URL for the Pizza Hut ordering service is http://www.pizzahut.com/


    Figure 2-11 A Commercial Application On WWW.

    Government Use Of WWW

    The World-Wide Web is also being used by governmental agencies. For example the URL for the CCTA is http://www.open.gov.uk/


    Figure 2-12 The CCTA Government Information Service.

    Terminology

    The following terms are used in this document:

    Browser: An interactive program which is used to access information held on the World-Wide Web.

    Client: Often used as a synonym for browser. A client is the software which normally runs on the local desktop machine (such as a PC, Apple Macintosh or Unix workstation). The client sends requests to the server software.

    Server: Software which is used to deliver information to a client. Note that this term can also refer to the computer system on which the server software is running.

    URL: Uniform Resource Locator. Can be regarded as the address of a file on the World-Wide Web. It includes the protocol (rules) for retrieving the file, the domain (name) of the computer system on which the server software runs and the file name to be retrieved. For example the URL http://www.w3.org/hypertext/WWW/TheProject.html uses the http protocol to retrieve the file TheProject.html in the directory /hypertext/WWW from the computer called www.w3.org

    HTML: Hypertext Markup Language. The native language for documents held on the World-Wide Web. HTML is an SGML (Standard Generalised Markup Language) application.

    HTTP: Hypertext Transport Protocol. The protocol (set of rules) used to define the communications between the client and WWW server software.

    Note that these terms are, for reasons of clarity, in some cases over-simplified.


    3 World-Wide Web Browsers

    In order to access the World-Wide Web you will need to use a browser (or client). A wide range of clients are available for many different platforms: although the Mosaic client is very popular you should not think that Mosaic is the World-Wide Web.

    Publicly Available Telnet Browsers

    A number of browsers are publicly available which can be accessed using the telnet protocol. These include:

    These browsers can be accessed by giving the command telnet address (for example telnet dir.mcc.ac.uk) In some cases you will automatically be logged in, in other cases you must enter a username (which is often lynx).

    An example of the use of the telnet browser at the Radcliffe Science Library at Oxford University is illustrated in Figure 3-1.

    
    telnet rsl.ox.ac.uk 
    
           Radcliffe Science Library & Bodleian Library WWW Server (p1 of 6)
     
         RADCLIFFE SCIENCE LIBRARY & BODLEIAN LIBRARY WWW SERVER
     
                UNIVERSITY OF OXFORD
     
       [IMAGE]
     
    Welcome! At present this WWW server is still feeling its way. This
    page is intended primarily as a starting point for Oxford users
    wishing to explore Internet services and information sources. From
    this home page you can also access some of our Local WWW applications
    which are for the most part still under development. For newcomers to
    the Web, one good introduction is Entering the World-Wide-Web: A Guide
    to Cyberspace by Kevin Hughes. Another is CERN's WWW FAQ (list of
    Frequently Asked Questions).
     
    Apologies to our regular Lynx users. We have phased out the old Lynx
    opening page and you will now commence with this one. If you would
    like to voice your opinions or your feelings, please feel free to use
    the comments form below.
     
    ________________________________________________________________
    -- press space for more, use arrow keys to move, '?' for help, 'q' to quit
    
    Figure 3-1 The Client At Radcliffe Science Library.

    It should be noted that the organisations running these publicly available clients do not guarantee to provide the service on a long term basis.

    Email Readers

    For users who do not have full Internet connectivity it is possible to retrieve files from WWW using electronic mail.

    To use the service at the email address webmail@www.ucc.ie send the message GO url where url is the URL of the file you require. Note that, the turnaround time of this server seems to average about 1 week and this server only handles http urls (e.g. no FTP, etc.)

    To use the service at the email address agora@w3.mail.org send the message send url to the address.

    Text-Based Browsers

    The browser illustrated in Figure 3-1 is a text-based browser (which is sometimes referred to as a command-line browser). Text-based clients run on a text-based operating system environment (e.g. DOS rather than Microsoft Windows, or Unix rather than X Windows). Command line clients place less demands on the local computer system, but do not provide the ease-of-use or range of functionality provided by graphical clients.

    Lynx

    The most widely-used text-based browser is probably Lynx. Lynx was developed at the University of Kansas, originally for Unix. An example of the Unix implementation is illustrated in Figure 3-1.

    Lynx has been ported to the MS DOS environment. DosLynx, as the implementation is known, will run on a PC with 512 K of RAM, running MS DOS 3 or later. It provides access to the World-Wide Web from an entry level PC which has the appropriate networking capability. DosLynx is illustrated in Figure 3-2.


    Figure 3-2 DOS Lynx.

    Availability

    The Lynx browser software is available at the URL ftp://ftp2.cc.ukans.edu/pub/WWW/ In the UK it is also available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/lynx

    The DosLynx software is available at the URL ftp://ftp2.cc.ukans.edu/pub/WWW/DosLynx/

    Details of the system requirements for DosLynx are available at the URL ftp://ftp2.cc.ukans.edu/pub/WWW/DosLynx/readme.htm A Listserv mailing list exists at the address DosLynx-Dev@ukanaix.cc.ukans.edu for the distribution of DosLynx related information, updates and development discussions. To subscribe send an email request to listserv@ukanaix.cc.ukans.edu to be added to the list. All new releases will be announced on this list.

    NCSA Browsers

    The NCSA Mosaic browser is available for the X Windows, Microsoft Windows and Apple Macintosh environments.

    NCSA Mosaic For X

    Although it was not the first graphics browser, NCSA Mosaic For X helped to popularise the Web. At the time of writing version 2.4 is available, although a beta version of 2.5 is also available (which includes support for a number of new features including hierarchical hotlists).


    Figure 3-3 NCSA Mosaic For X.

    NCSA Mosaic For Windows and the Macintosh

    If NCSA Mosaic For X helped to popularise the Web, NCSA Mosaic For Windows and for the Macintosh made it available to a much larger number of people.


    Figure 3-4 NCSA Mosaic For Windows and the Macintosh.

    Availability

    The NCSA Mosaic browser software for the X, Microsoft Windows and Apple Macintosh platforms is available at the URL ftp://ftp.ncsa.uiuc.edu/pub/Web/ In the UK it is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/Mosaic/

    Further information about NCSA Mosaic For Windows is available at the URL http://www.ncsa.uiuc.edu/SDG/Software/WinMosaic/HomePage.html Further information about NCSA Mosaic For the Macintosh is available at the URL http://www.ncsa.uiuc.edu/SDG/Software/MacMosaic/ Further information about NCSA Mosaic For X is available at the URL http://www.ncsa.uiuc.edu/SDG/Software/XMosaic/

    Cello Browser

    Cello was one of the first WWW browsers to be developed for the Microsoft Windows environment. It was written by Thomas R Bruce of the Legal Information Institute, Cornell University.


    Figure 3-5 The Cello Browser.

    Availability

    The Cello browser software for Microsoft Windows is available at the URL ftp://ftp.law.cornell.edu/pub/LII/Cello/

    EINet Browsers

    EINet have developed the WinWeb and MacWeb browsers for the PC and Apple Macintosh platforms.


    Figure 3-6 The WinWeb and MacWeb Browsers.

    Availability

    The EINet browsers software for the X, Microsoft Windows and Apple Macintosh environments are available at the URL ftp://ftp.einet.net/einet/

    Netscape Browsers

    Netscape Communications Corporation (MCOM) was set up by Jim Clark, founder of Silicon Graphics. MCOM recruited the developers of NCSA Mosaic to develop a WWW browser. A beta release of Netscape was released in October 1994. It generated a tremendous amount of interest, because of its speed and functionality. However it also caused concern, since it included extensions to the HTML standard which had not been part of the HTML standardisation process.


    Figure 3-7 The Netscape Browser for Windows and the Macintosh.

    Availability

    The Netscape browser for the X, Microsoft Windows and Apple Macintosh environments is available at the URL ftp://ftp.mcom.com/ In the UK it is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/Netscape/ Further information is available from the URL http://home.mcom.com/home/welcome.html

    Air Mosaic Browsers

    Air Mosaic is another commercial browser which is based on the NCSA Mosaic source code.


    Figure 3-8 The Air Mosaic Browser For Windows.

    Availability

    An evaluation copy of the Air Mosaic browser software for the X, Microsoft Windows and Apple Macintosh environments is available at the URL ftp://ftp.spry.com/demo/ Further information is available at the URL http://www.spry.com/

    GWHIS Browsers

    GWHIS is a commercial WWW browser marketed by Quadralay. GWHIS (Global-Wide Help and Information System) consists of a WWW browser, an application program interface (API) for integrating GWHIS into applications and a search engine.


    Figure 3-9 The GWIS Browser For X Windows.

    Availability

    An evaluation copy of the GWHIS browser software for the X, Microsoft Windows and Apple Macintosh environments is available at the URL ftp://ftp.quadralay.com/pub/gwhis Further information is available at the URL http://www.quadralay.com/

    Emissary Browser

    New browsers which are being developed have increased functionality, such as providing integrated email, Usenet and file management capabilities in addition to WWW access. Emissary is an example of such a browser.


    Figure 3-10 The Emissary Browser

    Availability

    Further information is available at the URL http://www.twg.com/emissary/emissnews.html

    Other Browsers

    Many other browsers are available or are currently being developed. Some of the browsers are aimed at the business community. Of particular interest to the academic community are the Internet browsers which are being developed by Microsoft (for inclusion with Windows 95), IBM, Apple and Novell.

    Microsoft's Internet Explorer will be available for the Windows 95 platform.


    Figure 3-11 The Internet Explorer Browser.

    Future Developments

    A browser known as Arena is currently being developed which will handle HTML 3. HTML 3 is a new version of HTML which contains a number of facilities which are not available in the current version (HTML 2), including table handling and mathematical formulae.


    Figure 3-11 The Arena Browser.

    Availability

    Arena is currently a beta program. It can be obtained from the URL ftp://ftp.w3o.org/ In the UK it is available at the URL ftp://src.doc.ic.ac.uk/packages/www/arena

    HotJava

    The HotJava browser (developed by Sun) represents a new generation of browser technology. The HotJava browser is capable of downloading and executing programs written using an object-oriented language called Java, as well as rendering HTML documents.


    Figure 3-13 The HotJava Browser.

    Browser Validation

    A validation suite for testing the functionality of browsers is being developed. Further information is available at the URL http://www.w3.org/hypertext/WWW/Test/

    Figure 3-14 illustrates a browser validation service which is available at the URL http://www.uark.edu/~wrg/


    Figure 3-14 A Browser Validation Service.

    Further Information

    A list of browsers is available at the URL http://www.w3.org/hypertext/WWW/Clients.html

    Another list, which includes a brief summary of known bugs. is available at the URL http://www.hotwired.com/browsers.html

    A third list is available at the URL http://www.charm.net/~web/Vlib/Users/Clients.html

    A comparison of browsers is available at the URL http://www.osf.org/~kiniry/projects/web/browser_comparison.html

    A browser tuneup for figuring out the particular quirks and oddities of browsers and may be useful for developers and consultants is available at the URL http://www.eit.com/goodies/tuneup/

    A review of browsers is available at the URL http://www.cnet.com/Central/Features/Browser/

    Conclusions

    Which is the best browser? There is no longer a simple answer to this. The growth in the number of browsers, the different licensing arrangements and different areas they address is making it difficult to adopt an institutional policy on choosing a browser. At the time of writing the Netscape browser looks very attractive. However it was developed primarily to address the needs of commercial users, many of whom requested greater control over the appearance of HTML pages in order to reflect a corporate identity. Will Netscape, however, be as quick to support mathematical equations, which will be of interest to most academic institutes? Will it be the best browser for providing control over external applications - an area which is likely to be of interest to academics who wish to develop distributed teaching materials?

    Perhaps the only conclusion to be made at this point is that academic institutions should avoid being locked in to a particular browser.


    4 HTML

    About HTML

    Native documents on the World-Wide Web are written in HTML, the HyperText Markup Language. HTML defines the structural elements in a document (such as headers, citations, addresses, etc.), layout information (bold and italics), the use of inline graphics together with the ability to provide hypertext links.

    A simple HTML document is illustrated in Figure 4-1.

    
    <TITLE>The World-Wide Web</TITLE>
    <H1>About The World-Wide Web</H1>
    <P>The World-Wide Web is a <EM>distributed multimedia
    hypertext</EM> system.</P>
    
    Figure 4-1 A Simple HTML Document.

    Structural elements in the document are identified by start and end markup tags. For example the <TITLE> and </TITLE> tag is used to specify the title of the document, which is often displayed by a client. The <H1> and </H1> tag is used to define the first level heading. Clients will normally display headers differently from the body text: for example, a graphical client could display the header using a larger or different font, whereas a text-based client could display a header as centred text or in all capitals.

    Figure 4-1 also illustrates the <EM> container. Text held in the container (which is defined by the <EM> start tag and the </EM> end tag) will be emphasised in some way. A graphical browser could render the emphased text by displaying it in italics, whereas a browser with audio capabilities for the visually impaired could render the emphasis by a change in the tone of the voice output.

    Figure 4-1 also shows the paragraph container. It is important to understand that the <P> tag is part of a paragraph container and is no longer a paragraph separator (as many people mistakenly believe). If the </P> is not used the existence of the next <P> tag will imply a </P>. In future versions of HTML it will be possible to specify paragraph attributes: for example <P ALIGN=Centred>.

    Although browsers will display the HTML document shown in Figure 4-1, for reasons of performance and upwards compatibility it is strongly recommended that HTML documents contain additional elements including the <HTML>, <HEAD> and <BODY> tags, as shown in Figure 4-2.

    
    <HTML>
    <HEAD>
    <TITLE>The World-Wide Web</TITLE>
    </HEAD>
    <BODY>
    <H1>About The World-Wide Web</H1>
    <P>Information about the World-Wide Web is available 
    <A HREF="http://www.w3.org/hypertext/WWW/TheProject.html"> at
    CERN</A>.</P>
    </BODY>
    </HTML>
    
    Figure 4-2 A Simple HTML Document.

    The <HTML> container is used to define the extent of the HTML document. Within the HTML document there are two other containers: <HEAD> and <BODY>. The <HEAD> container provides information about the document itself. This can include the title of the document (as illustrated) copyright information, keywords and expiry dates (for use by caching software). It is important to make use of the <HEAD%gt; tag since, for example, an automatic indexing program which wishes to index the title of HTML documents can parse only the information contained in the <HEAD> container. If the <HEAD> container is not present the entire document may have to be parsed, which will place unnecessary extra load on the server.

    Figure 4-2 also illustrates the use of the anchor <A> container. This tag is used to provide hypertext links. In the example the text at CERN which is contained between the <A> and </A> tags will be highlighted in some way by the browser. Selecting this highlighted phrase will cause the client to send a request for http://www.w3.org/hypertext/WWW/TheProject.html This request will use the http protocol and will be sent to the server running on the system at www.w3.org

    HTML Authoring Tools

    Initially information providers on the World-Wide Web used standard editors such as vi and emacs to create HTML documents. As WWW grew in popularity authoring tools were developed to assist information providers. This section describes the following authoring tools which are available for the Microsoft Windows environment: HTML Assistant, HTML Hyperedit, HTMLEd and InContext Spider.

    HTML Assistant

    HTML Assistant is a simple authoring tool which can be used to create and edit HTML documents. Frequently Asked Questions about HTML Assistant is available at the URL http://cs.dal.ca/ftp/htmlasst/htmlafaq.html HTML Assistant is available at the URL ftp://ftp.cica.indiana.edu/pub/pc/win3/misc In the UK it is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/tools/editing/ms-windows/html-assistant


    Figure 4-3 HTML Assistant.

    HTML Hyperedit

    HTML Hyperedit (which was developed using the Toolbook authoring system) not only provides an environment for producing HTML documents, but also contains a tutorial which gives an introduction to HTML. HTML Hyperedit is available at the URL ftp://info.curtin.edu.au/pub/internet/mswindows/hyperedit In the UK it is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/tools/editing/ms-windows/win-htmledit


    Figure 4-4 HTML HyperEdit

    HTMLEd

    HTMLEd is a simple authoring tool which can be used to create HTML documents. In the UK it is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/tools/editing/ms-windows/


    Figure 4-5 HTMLEd.

    InContext Spider

    InContext Spider is a more sophisticated HTML authoring tool for Microsoft Windows which provides support for HTML 3 features, such as tables.


    Figure 4-6 InContext Spider.

    Further information about InContext Spider is available at the URL "http://www.incontext.ca/

    HotDog

    HotDog is another sophisticated authoring tools for Microsoft Windows which provides supprot for HTML 3 features, such as tables and forms. as as well as Netscape's HTML extensions.


    Figure 4-7 HotDog

    Word Processing Tools

    HTML Assistant and HTML Hyperedit are self-contained authoring tools. Another approach is to develop authoring tools which work within a word processing environment. These tools are normally implemented as macros for popular word processing packages, such as Word For Windows or WordPerfect. This section describes three tools which have been developed for use within Word For Windows: the GT_HTML, CU_HTML and ANT_HTML macros.

    Word processing tools have the advantage that they provide a consistent environment for existing users of word processors. However they do have their disadvantages. Because they are normally implemented as macros, they can be very slow, especially when used with large or complicated documents. There is also a danger that HTML markup which is embedded as hidden text could cause conflicts with other word processing tools if, for example, the word processed document was used by other users.

    GT_HTML

    One of the first word processing macros which could be used to create HTML documents was the GT_HTML macro. This macro, written for Word For Windows, was developed at the Georgia Technical Research Institute. In the UK the software is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/tools/editing/macros/ms-winword


    Figure 4-8 The GT_HTML Macro.

    CU_HTML

    CU_HTML is a template designed to work within Word For Windows. The template was written by Anton Lam ( mailto:anton-lam@cuhk.hk ) The software is available at the URL ftp://ftp.cuhk.hk/pub/www/windows/util


    Figure 4-9 The CU_HTML Macro.

    ANT_HTML

    ANT_HTML is a template designed to work within Word For Windows 6.0. The template was written by Jill Swift ( mailto:jswift@freenet.fsu.edu ) The software is available at the URL ftp://ftp.einet.net/einet/pc/ANT_HTML.ZIP


    Figure 4-10 The ANT_HTML Macro.

    Internet Assistant

    Internet Assistant is the name of Microosft's tempate designed to work within Word For Windows. The software is available at the URL http://www.microsoft.com/


    Figure 4-11 Internet Assistant

    Browser Editing Tools

    Another approach to editing HTML documents is provided by browsers which are integrated with editing tools. The Arena browser enables an external editor to be invoked to edit the displayed HTML document. Figure 4-12 illustrates the Arena browser used in conjunction with the Emacs editor.


    Figure 4-12 Editing A Document From Arena.

    HTML Document Conversion Tools

    Authoring tools are normally used to create new HTML documents. Document conversion tools, on the other hand, can be used to convert existing documents to HTML format.

    LaTeX2html

    One of the first sophisticated document conversion tools to be developed was the LaTeX2html conversion program. This program was written by Nikos Drakos, Computer Based Learning Unit, University of Leeds. It set the standard for document converters, providing a wide range of feature including:

    Figure 4-13 illustrates a document which has been converted by the LaTeX2html conversion program.


    Figure 4-13 A Document Converted Using LaTeX2html.

    LaTeX2html is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/tools/translators/latex2html Further information is available at the URL http://cbl.leeds.ac.uk/nikos/doc/www94/www94.html

    RTFtohtml

    The RTFtohtml conversion program enables RTF files (which can be produced by word processing packages such as Word For Windows) to be converted to HTML. The program was written by Chris Hector (Cray) based on RTF parsing software developed by Paul DuBois.

    RTFtohtml is available as a command line tool for a number of Unix platforms. In addition an Apple Macintosh implementation is available. A beta version of an MSDOS implementation was announced in November 1994.

    An extension of the RTFtohtml program is known as RTFtoweb. This provides a number of additional features, including creation of hypertext links at user defined section breaks. Figure 4-14 illustrates a document on Exploring The World-Wide Web Using Mosaic For Windows which is available at the URL http://www.leeds.ac.uk/ucs/docs/tut50/tut50.html


    Figure 4-14 Document Converted Using RTFtoweb.

    In Figure 4-14 it should be noted that the document is automatically split into a number of files. A hypertext table of contents is automatically generated. Chevrons (>> and <<) are also generated automatically which can be used to move to the next or previous section.

    Further information about RTFtohtml is available at the URL ftp://ftp.cray.com/src/WWWstuff/RTF/rtftohtml_overview.html The software is available at the URL ftp://ftp.cray.com/src/WWWstuff/RTF/latest/ In the UK it is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/tools/translators/rtftohtml RTFtoweb is available at the URL ftp://ftp.rrzn.uni-hannover.de/pub/unix-local/misc/rtftoweb/html/rtftoweb.html

    HTML Quality Tools

    The HTML specification states that "HTML parsers should be liberal except when verifying code. HTML generators should generate strictly conforming HTML." Put simply this means that browsers should be capable of displaying documents which contain invalid HTML, but HTML authoring tools and document converters should generate HTML which conforms strictly to the standard.

    A number of HTML validation tools are available which can validate HTML documents. A number of popular tools are described below.

    HoTMetal

    HoTMetaL is an HTML authoring tool and validator. It will provide feedback if it encounters invalid HTML, as illustrated in Figure 4-15.


    Figure 4-15 HoTMetaL.

    HoTMetaL is available for the X and Microsoft Windows platforms. Two versions of the software are available: a public domain version and a licensed version. HoTMetaL Pro, the licensed version, can be used to import and validate an existing document. The public domain version will give an error and refuse to load a document which contains invalid HTML.

    HoTMetaL is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/Mosaic/html/hotmetal

    Weblint

    A tool called weblint can be used to check for invalid HTML documents. This software is available from the URL ftp://ftp.khoros.unm.edu/pub/perl/www/weblint-1.000.tar.gz In the UK it is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/tools/weblint

    SGMLS

    sgmls is a tool which can be used to validate SGML documents. It is available at the URL ftp://sgml1.ex.ac.uk/pub/SGML/sgmls/ The sgmls software is used in a number of HTML validation services, such as those mentioned above. Information on installing sgmls and also pgmls (an SGML mode for emacs) is available at the URL http://web.nexor.co.uk/users/mak/doc/html/sgml-lib/html-sgml.html

    HTML Validation Service

    An HTML validation service is available at the URL http://www.hal.com/~markg/WebTechs/validation-form.html This service makes use of HTML forms and a CGI script which runs a HTML validation program. The service can be used to check HTML syntax by entering the HTML markup to be checked. It can also be used to check an existing HTML document by entering the URL of the document.


    Figure 4-16 HTML Validation Service.

    A variation on this service is available at the URL http://www.cc.gatech.edu/grads/j/Kipp.Jones/HaLidation/validation-form.html

    These services make of the sgmls validation program.

    The software can be installed on your local Unix system. It is available at the URL ftp://ftp.hal.com/pub/CGI/check-html.tar.Z

    HTML Check Toolkit

    The HTML Check Toolkit is another HTML validation program. The software can be installed using a WWW browser. The installation service, illustrated below, is based on the EIT Webmaster Starter's Kit. HTML Check Toolkit is available at the URL http://www.hal.com/~markg/HaLSoft/html-check/


    Figure 4-17 Installing The Check_HTML Script.

    Review of HTML Tools

    Before choosing HTML authoring tools, document converters or quality tools for institutional use the following issues should be considered:

    Support Who wrote the software - an experienced software developer or a student as part of a computer project? Will the software continue to be developed and supported?

    Quality Does the software produce valid HTML?

    Functionality What facilities does the software provide?

    Other Issues If the software is based on a word processing package, what happens if the word processed document needs to be used by another word processor?

    Writing Style

    Writing styles for WWW documents are still developing. However there are a number of guidelines which can be provided:

    Finding Out More About HTML

    This document does not provide an in-depth tutorial on HTML. Many WWW resources are available which give details on writing HTML. Some of these are listed below:


    5 Graphics

    The World-Wide Web is, of course, a graphical system. This section describes how graphical objects can be incorporated in an HTML document, how external graphical files can be used and how to create and use interactive maps. The section also considers the performance aspects of using graphics.

    HTML Graphical Tags

    Inline images are defined in an HTML document using the <IMG> tag. For example:

    <IMG SRC="portrait.gif">

    The full syntax for the <IMG> tag is:

    <IMG SRC="source file" ALT="textual description" ALIGN="option">

    The SRC attribute is used to specify the URL of the graphical file. At the time of writing graphical files should normally be in GIF format, although support for other graphical file formats may be available in certain browsers. The SRC attribute is mandatory.

    The ALT attribute is used to specify text which should be displayed by a browser which cannot display graphics, or a browser which has the display of inline images option switched off. Use of the ALT attribute is highly recommended.

    The ALIGN attribute can take the values TOP, MIDDLE or BOTTOM. It is used to define whether the top, middle or bottom of the graphic should be aligned with the text. Use of the ALIGN attribute is optional.

    Using External Viewers

    You can use the <A> anchor tag to refer to a graphical file. When the link is selected the graphical file is normally passed to a graphical viewer (such as xv or LVIEW) for displaying.

    One common use of the <A> tag is to provide a link to a large colour graphic from a small thumbnail image. For example:

    <A HREF="full-image.jpeg"><IMG SRC="thumbnail.gif" ALT="Portrait of John Smith"></A>

    It is also possible to use this technique to provide links from thumbnail images to video clips. For example:

    <A HREF="fluidflow.mpeg"><IMG SRC="fluidflow.thumb.gif" ALT="Video clip of fluid flow"></A>

    Active Maps

    An active map (also sometimes refered to as a clickable image) is an inline image in an HTML document. An area of the image can be selected, usually by clicking with a mouse. The coordinates of the image that has been selected are sent to a program which can then process the information. An active map can be used to provide a graphical menu, in which selecting a menu option will retrieve a specified HTML document. Active maps can also be used in developing teaching and learning software - for example a medical student could be asked to click on an area of an xray which shows a cancerous growth. If an incorrect area is selected a HTML document giving further information can be displayed.

    An active map can be specified as shown in the HTML document below.

    Please select an area of the xray showing cancerous growths.
    <A HREF="cgi-bin/htimage/xray.config">
    <A IMG SRC="xray.gif" ISMAP></A>
    Figure 5-1 HTML Document Containing Markup For An Active Map.

    The file xray.config will contain the coordinates of regions in the image, as illustrated below.

    default error.html
    rectangle (100,100) (500,500) cancer.html
    circle (50,50) 25 homepage.html
    Figure 5-2 Configuration File For Active Map.

    When the user clicks on an area of the image the coordinates are sent to the cgi-bin/htimage CGI program. The name of the configuration file for the image (in this case (xray.config) is also sent to this program. The htimage program will then retrieve the HTML document specified in the configuration file. If, for example, the user has clicked in a circle defined by the centre at position 50,50 with a radius of 25, the file homepage.html will be sent to the browser. If the user has clicked in a rectangle with vertices at the position 100,100 and 500,500 the file cancer.html will be sent to the browser.

    Mapedit

    Mapedit is an editor for creating image map files. Image map files are a feature of NCSA and CERN servers; they enable you to turn a GIF image into a clickable map by designating areas using polygons and circles within the GIF and specifying a destination URL for each area. The software is not public domain. Commercial users must pay a licence fee; non-profit and educational users are asked to send the author a postcard. The software is available from the URL ftp://sunsite.unc.edu/pub/packages/infosystems/WWW/tools/mapedit In the UK it is available at the URL ftp://src.doc.ic.ac.uk/packages/WWW/mapedit Mapedit was written by Thomas Boutell (mailto:boutell@netcom.com).


    Figure 5-3 MapEdit.

    Graphical Tools

    Paintshop Pro

    Paintshop Pro is an example of a Microsoft Windows tool which can be used to manipulate graphics files for use on WWW. Paintshop Pro can be used to convert file formats, to reduce colour depth and to convert colours.


    Figure 5-4 Paintshop Pro.

    The image being manipulated by Paintshop Pro contains information for 256 colour (as shown in the bottom left of the screen). The colour depth of the image should be reduced to decrease the size of the file to an appropriate level (e.g. a line drawing should not contain 256 colours), and thus reduce the network traffic when the image is retrieved on WWW.

    Other Graphical Tools

    The GIF Construction Set for Windows is a powerful collection of tools to work with multiple-block GIF files. It will allow you to assemble GIF files containing image blocks, plain text blocks, comment blocks and control blocks. It includes facilities to manage palettes and merge multiple GIF files together. It will make the extensions of the GIF specification work for you. Among its other functions, GIF Construction Set for Windows can:

    The software is available from the URL http://www.north.net/alchemy/gifcon.html

    A Gif Transparentifier service is available at the URL http://www.galcit.caltech.edu/~ta/tgif/tgif.html and http://www.vrl.com/Imaging/

    Imagizer can generate high-quality thumbnail images, among other things, on-the-fly. It is available for SunOS, Solaris, and HPUX, and soon for Windows and NT. Further details are available at the URL http://pc.inrird.com/imagizer.html

    San Diego Supercomputer Center's imtools package converts many file formats, including GIFs.

    ImageMagick is a multi-purpose raster converter and manipulation package. The convert program handles many file formats including GIF. The software is available at the URL ftp://ftp.x.org/contrib

    MAP_MARKER is a tool for generating clickable image maps. Further information is available from the URL http://www.dl.ac.uk/CBMT/mapmarker/HOME.html

    Appropriate Use Of Graphics

    Novice information providers may be tempted to fill their HTML documents with inline graphical images. More experienced computer users will remember the large numbers of poorly designed paper documents which were produced once desktop publishing packages became widely used.

    Before making use of graphics you should consider the following points:

    Look at the following URL. See how long it takes for the information to be delivered. Note that if you retry the URL it is likely to be quicker if it is cached by your client or by your server (if your server supports caching).

    http://www.leeds.ac.uk/ucs/people/BKelly/uniras94/uk_logos.html


    Figure 5-5 UK University Logos.

    This page contains pointers to logos on institutional UK university WWW servers. Details of the numbers of colours and the file size are also provideed.

    Further Information

    A tutorial on imagemaps is available at the URL http://wintermute.ncsa.uiuc.edu:8080/map-tutorial/image-maps.html

    A good example of use of graphics is the Xerox Parc Map viewer which is available at the URL http://pubweb.parc.xerox.com/map

    Information on transparent and interlaced GIFs, including pointers to useful graphical tools, is available at the URL http://dragon.jpl.nasa.gov/~adam/transparent.html


    6 Searching And Indexing

    The tremendous growth in the numbers and extent of information services on WWW has made net-surfing an ineffective way of finding useful information. Fortunately sophisticated indexing tools are being developed. Figure 6-1 shows a page which contains pointers to a number of searching tools.


    Figure 6-1 A Collection Of WWW Search Engines.

    A collection of WWW search engines is available at the URL http://cui.unige.ch/meta-index.html Some of the main searching tools are listed below:

    CUI WWW Catalog
    http://cuiwww.unige.ch/cgi-bin/w3catalog
    Yahoo
    http://www.yahoo.com/
    Globewide Network Acadamy
    http://uu-gna.mit.edu:8001/cgi-bin/meta/ EINet's Galaxy
    http://galaxy.einet.net/
    Aliweb
    http://web.nexor.co.uk/public/aliweb/aliweb.html
    Lycos
    http://fuzine.mt.cs.colorado.edu/mlm/lycos-all.html
    World-Wide Web Worm
    http://www.cs.colorado.edu/home/mcbryan/WWWW.html
    WebCrawler
    http://webcrawler.cs.washington.edu/WebCrawler/WebQuery.html
    RBSE URL
    http://rbse.jcs.nasa.gov/eichmann/urlsearch.html
    Nikos
    http://www.rns.com/cgi-bin/nomad
    Jumpstation Robot
    http://www.stir.ac.uk/jsbin/js
    World-Wide Web Wanderer
    http://www.mit.edu:8001/cgi/wandex

    Robots, Spiders and Worms

    During 1993 many WWW users discovered resources by net-surfing: going to one WWW server, exploring what was available, and then following links to other WWW servers. A number of software developers produced software which automated this process, so that a program went from server to server, indexing information, such as contents of the <TITLE> tag or the contents of server home pages. Such programs became known as robots or spiders; one robot was called WWWW, the World-Wide Web Worm.

    There are a number of problems with this approach to global indexing:

    A number of these issues have been addressed. Martijn Koster's Guidelines For Robots, which is available at the URL http://web.nexor.co.uk/mak/doc/robots/robots.html provides guidelines for developers of robots.

    A list of robots is kept at the URL http://web.nexor.co.uk/mak/doc/robots/active.html

    Aliweb

    Aliweb (Archie Like Indexing In The Web) provides another approach to the indexing of WWW resources. With Aliweb each site is responsible for indexing files. The server administrator is responsible for choosing the files to be indexed.

    Further information about Aliweb is available at the URL http://web.nexor.co.uk/aliweb/doc/aliweb.html The paper ALIWEB - Archie-Like Indexing In the Web, which was presented at the WWW 94 conference in CERN, is available at the URL http://web.nexor.co.uk/mak/doc/aliweb-paper/paper.html

    SWISH

    SWISH, which stands for Simple Web Indexing System for Humans, was announced on 16 November 1994. It is a program that allows you to index your Web site and search for files using keywords in a fast and easy manner. Documentation is available at the URL http://www.eit.com/software/swish/swish.html The software is available at the URL ftp://ftp.eit.com/pub/web.software/swish/

    WAIS

    WAIS (Wide Area Information Server) is another mechanism for indexing resources. WAIS is used by the Computing Service, University of Leeds to index its documents and newsletters. An example of how the WAIS server and WAIS indexing software is used is given below.

    The command:

    waisserver -p 210 -d /apps/info/WWW/WAIS

    is used to start the WAIS server software. The -p 210 argument specifies the name of the port on which the server runs while the -d argument gives the name of the directory which will contain WAIS databases. Note that since the WAIS server will normally be running continuously it will normally be initiated by the system administrator.

    Newsletters are indexed by giving the command

    waisindex -export -d /apps/info/WWW/ucs/newsletter/wais-sources/computing-service-newsletter -T HTML *.html

    The name of the WAIS database is computing-service-newsletter This long name is used since a single directory is used for all WAIS databases - it will save confusion if other departments wish to index their own departmental newsletters.

    The WAIS database can be accessed by a dedicated WAIS client or by a WWW browser which contains support for the WAIS protocol. The WAIS database can be accessed by giving the URL wais://www.leeds.ac.uk/computing-service-newsletter

    WAIS Utilities

    A number of utilities are available which can post-process the output from WAIS.

    wais.pl is a CGI script which is distributed with the NCSA httpd server.

    Son of wais.pl is a CGI script which is based on the wais.pl script.

    SFGate is a CGI script which interfaces to WAIS servers. SFGate provides a forms interface which can be used to access a number of WAIS databases. It is available at the URL http://ls6-www.informatik.uni-dortmund.de/SFgate/SFgate.html A demonstration is available at the URL http://ls6-www.informatik.uni-dortmund.de/SFgate/multiple.html

    wwwwais is a small ANSI C program that acts as gateway between waisq or waissearch (programs that search WAIS indexes) and a forms-capable World-Wide Web browser. With the freely distributable freeWAIS package, this program, and your local Web site, you can:

    Documentation is at the URL http://www.eit.com/software/wwwwais/wwwwais.html

    You can FTP the source and related files from the URL ftp://ftp.eit.com/pub/web.software/wwwwais/

    You can see how it looks at the URL http //www.eit.com/cgi-bin/wwwwais

    A WAIS Application

    One interesting application of the use of WAIS is the multimedia archive prototype developed by Andy Walker, formerly of the CBL/Multimedia Unit, University of Leeds. The prototype was developed to investigate the feasibility of providing an archive of multimedia objects for use in CBL applications by members of the University of Leeds.

    A directory is created for each multimedia object. The directory contains the multimedia object itself (e.g. a graphical file, video clip or sound file) together with a keyword file which describes the object. The keyword files are indexed using WAIS. A WWW browser which supports forms is used to run a CGI script. The CGI script invokes the waisq command to search the WAIS database. The output from waisq is then used to create a HTML file which contains pointers to thumbnail images of matching multimedia objects.


    Figure 6-2 Multimedia Archive.

    Which WAIS?

    A number of WAIS servers are available. The freeWAIS software is currently used at the University of Leeds. This software is maintained by CNIDR, the Clearinghouse For Networked Information Discovery and Retrieval. The freeWAIS software, however, is based on the 1988 version of the Z39.50 protocol. An implementation of WAIS based on the 1992 version of Z39.50 is also believed to be available from CNIDR. freeWAIS is available at the URL ftp://ftp.cnidr.org/pub/NIDR.tools/freewais

    freeWAIS-sf is an implementation of WAIS developed at Dortmund University. It is available at the URL ftp://ls6-www.informatik.uni-dortmund.de/pub/wais/freeWAIS-0.2-sf-beta.tar.gz

    CNIDR Isite

    CNIDR Isite is an integrated software package including a text indexer, search engine and Z39.50 communication tools to access databases. Isite includes the CNIDR ZDist, Isearch and Search API distributions.

    A mailing list has been established to discuss Isite. To join, send an -mail message to listserv@vinca.cnidr.org with the body of the message as subscribe ISITE-L your name To post messages to the list, send to isite-l@vinca.cnidr.org.

    Further information is available at the URL http://vinca.cnidr.org/software/Isite/Isite.html

    Further Information

    A tutorial on Mosaic and WAIS is available at the URL http://wintermute.ncsa.uiuc.edu:8080/wais-tutorial/wais.html

    A WAIS overview is available at the URL http://www.w3.org/hypertext/Products/wais/sources/Overview.html

    A list of resources about the Z39.50 information discovery protocol is available at the URL http://ds.internic.net/z3950/z3950.html


    7 WWW Servers

    If you wish to make information available you will need to run a WWW server. The server software is known as httpd - the hypertext transport protocol daemon. Just as there are many WWW browsers available there are also many servers, including ones for Unix, MS Windows, Windows NT and the Apple Macintosh.

    This section gives an example of how to install and run a server for the Microsoft Windows environment. The section then goes on to illustrate a number of server management issues which are based on the CERN server for the Unix platform.

    Example of Installing A Server On A PC

    An example illustrating how easy it is to install a WWW server is given below. The example assumes that you have access to a networked PC.

    Connect to the NCSA server software from the anonymous FTP server at ftp.ncsa.uiuc.edu Then change directory to /Web/httpd/Uni/ncsa_httpd/contrib/winhttpd Finally retrieve the file whtp13p1.zip An example of how to do this using the FTP software is illustrated below.

    ftp src.doc.ic.ac.uk
    image
    cd /Web/ncsa/httpd/Windows/winhttpd
    get whtpp13p1.zip

    Create a directory called C:\HTTPD on the C: drive of your PC and then move to the directory using the CD \HTTPD command. Then uncompress the file by giving the command:

    PKZUNIP -D WHTPP13P1.ZIP

    The -D option will preserve the directory structure from the compressed file.

    Run Microsoft Windows and create a program icon using the New option on the File menu. The icon should point to the file C:\HTTPD\HTTPD.EXE

    Set the time zone in the AUTOEXEC.BAT file so that TZ=GMT.

    Run the server program. The window shown below should be displayed.


    Figure 7-1 Running The Windows HTTPD Server.

    Run a World-Wide Web browser and then enter a URL containing the IP address of your PC. For example if your PC has an IP address of 192.11.1.1 you should enter the address:

    http://192.11.1.1/

    The following diagram illustrates NCSA Mosaic for X accessing a server running on a PC.


    Figure 7-2 Accessing The MS Windows HTTPD Server.

    This example is meant to illustrate the installation of a WWW server. In practice the server software is likely to run on a more robust system than a PC running MS DOS, such as a Unix or Windows NT system.

    Server Configuration Files

    World-Wide Web server software will normally have a configuration file which is used to:

    As WWW develops, additional features will be provided in the server software and the configuration files are likely to grow in complexity. An example of a simple configuration file is shown below.

    map / file:/apps/WWW/homepage.html
    map /* file:/apps/WWW/*
    pass file:/apps/WWW/*
    fail *
    Figure 7-3 A Simple httpd.conf Configuration File

    Figure 7-3 shows a simple configuration file for the CERN httpd server. Line 2 specifies that files located under the directory /apps/WWW should be available to the WWW server software. Line 1 specifies that file /apps/WWW/homepage.html is the default file to be displayed when the WWW server is accessed.

    protection prot-proxy { # Part 1
    serverid www.leeds.ac.uk
    mask @(129.11.*.*)
    }
    protect http:* prot-proxy # Part 2
    protect gopher:* prot-proxy
    protect ftp:* prot-proxy
    protect wais:* prot-proxy

    pass http:* # Part 3
    pass gopher:*
    pass ftp:*
    pass wais:*

    Exec /cgi-bin/ucs/* /apps/WWW/cgi-bin/ucs/* # Part 4
    Exec /cgi-bin/bionet/* /apps/WWW/cgi-bin/bionet/*
    Exec /cgi-bin/bmb/* /apps/WWW/cgi-bin/bmb/*

    map / file:/apps/WWW/homepage.html # Part 5
    map /* file:/apps/WWW/*
    pass file:/apps/WWW/*
    fail *

    AccessLog /var/adm/httpd.log # Part 6
    LogFormat Common
    LogTime LocalTime

    Caching On # Part 7
    CacheRoot /usr/info/WWW_cache
    CacheSize 300
    CacheAccessLog /var/adm/httpd_cache.log

    # Delete files from cache after specified number of days
    CacheClean http:* 10 Days
    CacheClean gopher:* 10 Days
    CacheClean wais:* 10 Days
    CacheClean ftp:* 10 Days

    # Don't cache local files # Part 8
    NoCaching http://*.leeds.ac.uk/*

    # If a file hasn't been accessed within the last specified
    # number of days delete from cache
    CacheUnused * 5 days
    CacheUnused http://info.cern.ch/* 10 days
    CacheUnused http://www.ncsa.uiuc.edu/* 10 days

    # ensure dynamically changing documents are only kept for short
    # periods e.g. one modified 10 days ago will only last 2 days
    CacheLastModifiedFactor 0.2

    # If a file was retrieved more than 5 days ago do a
    # 'conditional get' request to the source server to check
    # that it hasn't been updated in the meantime.
    CacheRefreshInterval http://* 5 days
    CacheRefreshInterval gopher://* 5 days
    CacheRefreshInterval ftp://* 5 days

    # CacheDefaultExpiry ensures that Gopher and FTP files are
    # cached. The default is 0 which is what we want for http
    # documents with neither an expiry nor a last-modified stamp.
    CacheDefaultExpiry ftp://* 5 days
    CacheDefaultExpiry gopher://* 5 days

    # Remove unwanted cached files daily at 3 am (garbage collection).
    Gc On
    GcDailyGc 3:00
    Figure 7-4 A httpd.conf Configuration File

    Figure 7-4 shows another configuration file (this is for illustrative purposes - some options may have been superseded). The various features are summarised below:

    Parts 1 and 2 provides a mechanism for ensuring that the proxy gateway cannot be accessed from outside the local domain. Without these options it would be possible for a browser on an external system to use the proxy gateway to gain access to files which are restricted to local use.

    Part 3 passes requests for the httpd, gopher, wais and ftp protocols.

    Part 4 specifies the location for CGI files.

    Part 5 specifies the area of the filestore which can be accessed by the server.

    Part 6 describes the location and format of the server log file.

    Part 7 specifies that server caching is to be available, and gives the location of the cache and the cache log files, together with the size (in Mbytes) of the cache.

    Part 8 specifies the purging frequency for files in the cache.

    An example of a typical httpd.log file is shown below.

    abc.cs.xyz.edu - - [21/Nov/1994:21:58:58 +0000] "GET /music.html HTTP/1.0" 200 4375<> gps0 - - [21/Nov/1994:21:59:48 +0000] "GET / HTTP/1.0" 200 2782
    abc_pc99.leeds.ac.uk - - [21/Nov/1994:21:59:47 +0000] "GET http://www.leeds.ac.uk/ HTTP/1.0" 200 2782
    abc.nt.com - - [21/Nov/1994:22:00:03 +0000] "GET /music/NetInfo/MusicFTP/ftp_sites.html HTTP/1.0" 200 13175
    Figure 7-5 Example of a httpd.log File.

    Note that the names of the machines accessing files from the server have been altered in the diagram. This has been done because it could be argued that such information should be confidential.

    Caching

    Many clients provide client-side caching. This means that if you retrieve a file and then retrieve another file, when you return to the initial file it will be retrieved from the client's cache, thus saving a subsequent network transfer.

    A number of servers also support caching by the server. This is illustrated in Figure 7-6.


    Figure 7-6 Caching By The Server.

    Caching can improve the performance of a WWW service by ensuring that frequently requested files will tend to be stored in the local cache. There is, of course, a danger that if the file on the remote server is updated then an out-of-date file will be retrieved from the cache. In practice, however, httpd server software which supports caching can deal with this issue by, for example, looking at the date of the file on the remote server and, if the remote file is newer than the file in the cache, replacing the file in the cache with the new version of the file.

    [Proxy Information]

    http_proxy: www.leeds.ac.uk
    gopher_proxy: www.leeds.ac.uk
    wais_proxy: www.leeds.ac.uk
    Figure 7-7 Client Configuration File To Support Caching.

    It order for a client to make use of a cache on a server, the client configuration file (e.g. the MOSAIC.INI file) must be suitably configured. Figure 7-7 illustrates the relevant options for the MOSAIC.INI file.

    Accesses of the cache are recorded in the cache log file. A typical log file is illustrated in Figure 7-8.

    xyz_pc77.leeds.ac.uk - - [21/Nov/1994:00:43:35 +0000] "GET http://white.nosc.mil/gif_images/NM_Sunrise_s.gif HTTP/1.0" 200 18673
    xyz_pc77.leeds.ac.uk - - [21/Nov/1994:00:43:38 +0000] "GET http://white.nosc.mil/gif_images/glacier_s.gif HTTP/1.0" 200 6474
    xyz_pc77.leeds.ac.uk - - [21/Nov/1994:00:43:40 +0000] "GET http://white.nosc.mil/gif_images/rainier_s.gif HTTP/1.0" 200 18749
    Figure 7-8 The httpd_cache Log File.

    Note that the names of the machines accessing files from the cache have been altered in the diagram. This has been done because it could be argued that such information should be confidential.

    Caching Strategies

    As well as using a local server cache, it is also possible to use a national caching service. The Unix HENSA service at the University of Kent at Canterbury run a national caching service. To use this service the local client should define www.hensa.ac.uk as the proxy. Another national caching service is available at sunsite.doc.ic.ac.uk Further information is available at the URL http://src.doc.ic.ac.uk/WWW-Cache.html

    An institution will need to decide whether to use a caching service and, if so, whether to have caching services running on a number of departmental system, to have an institutional caching service, or to use the national caching service at HENSA. In the future it may be possible to chain caches. The possibility in the long term of having institutional, metropolitan, national and continental caches should be considered.

    Proxy Gateways

    In many academic institutions off-campus access to the Internet is restricted to authorised computers. Depending on the institution's local policy, authorisation may be restricted to computers located in offices in which there is an individual who is responsible for use of the machine. Such a policy may be enforced in order to provide some means of security against hacking remote services. However this policy would appear to prevent students from accessing remote information services from computers in open access cluster areas.

    In practice there is a technique known as proxy gateways which can be used to provide access to services off-campus, without compromising local security. With a proxy gateway a trusted system (typically a Unix system which is more secure to hacking than a desktop machine) will have Internet access. Machines in open access clusters can point to the proxy gateway, which will then retrieve information from off-campus services.

    It should be noted that with increasing usage of Internet services such as the World-Wide Web, the author believes that the provision of security mechanisms, such as proxy gateways, will be increasingly important.

    Further Information

    Further information on caching and proxies is available at the following URLs:

    Security

    The httpd server also handles a number of security issues. It is common practice to restrict access to a certain area of the filestore. For example if the server configuration files contains the lines:

    map /* file:/apps/WWW/*
    pass file:/apps/WWW/*
    fail *
    Figure 7-9 Server Configuration File.

    then clients will only be able to access files held under the directory /apps/WWW/.

    Note This statement refers to clients running on remote machines. If the client is running on the same machine as the server, the client will normally be able to access files on the server to which it has read access.

    Additional levels of security can also be specified:

    The method of implementing such security tends to be server dependent, and will not be described in this document.

    A WWW Security FAQ is available at the URL http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq

    SHTTP

    Secure NCSA httpd is a World-Wide Web (WWW) server supporting transaction privacy and authentication for Secure WWW clients over the Internet using the Secure HyperText Transfer Protocol (S-HTTP). Fuirther information is available at the URL http://www.commerce.net/software/Shttpd/Docs/manual.html

    Netscape and Security

    The SSL protocol has been submitted to the IETF as an Internet Draft. Netscape is actively pursuing the standardization of SSL within the framework of the IETF standards process and is also working with industry consortium groups to ensure that open and interoperable security standards exist now and in the future. Further information is available at the URL http://home.mcom.com/newsref/std/SSL.html

    Summary of Server Software

    A brief summary of server software is given below. This summary is based on Thomas Boutell's WWW FAQ.

    Unix Servers

    CERN httpd

    Information about CERN's server is available at the URL http://www.w3.org/hypertext/WWW/Daemon/Status.html

    NCSA httpd

    NCSA's server is available at the URL ftp://ftp.ncsa.uiuc.edu/Web/ncsa_httpd

    Apache httpd

    The Apache server is available at the URL http://www.apache.org/

    EIT httpd

    EIT have created a Webmaster Starter's Kit which installs their server using a forms interface from a WWW browser. Further information is available at the URL http://wsk.eit.com/wsk/doc/

    GN Gopher/http

    The GN server can serve both WWW anbd Gopher clients. It may be useful for sites wishing to migrate from Gopher to WWW, although it does not have the server-script capabilities of the CERN and NCSA servers. Further information is available at the URL http://hopf.math.nwu.edu/

    Plexus perl server

    The Plexus server is written in Perl. Further information is available at the URL http://bsdi.com/server/doc/plexus.html

    WebWorks Enterprise server

    This is a commercial server marketed by Quadralay Inc. Further information is available at the URL http://www.quadralay.com/products/WebWorks/Server/index.html

    Netsite Communication Server and Netsite Commercial Server

    These servers have been developed by Netscape Communications Corporation. Further information is available at the URL http://home.mcom.com/MCOM/products_docs/server.html

    Macintosh Servers

    MacHTTP

    Information about the MacHTTP server for the Apple Macintosh is available at the URL http://www.biap.com/ A tutorial on installing a Mac HTTP server is available at the URL http://web66.coled.umn.edu/Cookbook/contents.html

    Novell Netware Servers

    httpdnlm

    The httpd NLM server for Novell Netware is available at the URL ftp://ftp.glaci.com/pub/netware/http/

    Microsoft Windows and Windows NT Servers

    https

    HTTPS is a Windows NT server developed at Edinburgh University which runs on Intel, MIPS and Alpha CPUs. It is available at the URL ftp://emwac.ed.ac.uk/pub/https/

    NCSA httpd For Windows

    The NCSA httpd for Windows server provides most of the features of the Unix version, including scripts (which generate pages on the fly). It is available at the URL ftp://ftp.ncsa.uiuc.edu/Web/ncsa_httpd/contrib/

    SerWeb

    SerWeb is a Microsoft Windows server. It is available at the URL ftp://emwac.ed.ac.uk/pub/serweb/

    Web4Ham

    Web4Ham is a Microsoft Windows server. It is available at the URL ftp://ftp.informatik.uni-hamburg.de/pub/net/winsock/

    Server Strategies

    An institution needs to decide on its server hardware strategy. For example, should it support:

    1. A central server
    2. A number of departmental servers
    If option 2 is chosen then how is indexing across servers to be achieved, and what caching strategy is to be adopted? What are the skills levels needed by the server administrator? An institution needs to recognise that adopting a server strategy is more than simply installing the server software.

    Which Server?

    The most widely used servers are probably those developed at CERN and NCSA for the Unix platform. Unix is probably the best platform for running an institutional WWW service, since it is a mature, pre-emptive multi-tasking operating system. In addition, Unix provides a wide range of tools which can be used to assist in system administration. Servers are available for the PC and Macintosh platform, but, due to the inherent deficiencies in the operating system environments which are currently used on the platform, such servers are probably not recommended if you wish to run a large-scale, stable service.

    Servers have been developed for the Windows NT environment. This may provide a robust operating system environment which can be used for providing a WWW server on an Intel platform.

    Further Information

    Further information about HTTP is available at the URL http://www.w3.org/hypertext/WWW/Protocols/

    Information about HTTP/NG is available at the URL http://www.w3.org/hypertext/WWW/Protocols/HTTP-NG/http-ng-status.html

    The HTTP/1.0 specification has been submitted as an Internet-Draft and is available for comment at the following URLs: http://www.ics.uci.edu/pub/ietf/http/draft-fielding-http-spec-00.txt and ftp://www.ics.uci.edu/pub/ietf/http/draft-fielding-http-spec-00.txt

    The document Setting up a World-Wide Web Server, which is available at the URL http://scholar.lib.vt.edu/reports/Servers-web.html , gives advice on setting up a server.

    A collection of utilities intended especially for WWW system administrators is available at the URL ftp://src.brunel.ac.uk/WWW/managers/

    A list of server software is available at the URL http://www.w3.org/hypertext/WWW/Daemon/Overview.html

    A list of server software is available at the URL http://www.charm.net/~web/Vlib/Providers/Servers.html

    A list of server software is available at the URL http://www.yahoo.com/Computers/World_Wide_Web/HTTP_Servers/

    A hypermail archive of the HTTP-WG mailing list is available at the URL http://www.ics.uci.edu/pub/ietf/http/hypermail/

    A WWW server comparison chart is available at the URL http://sunsite.unc.edu/boutell/faq/chart.html

    A review of WWW servers is available at the URL http://wais.wais.com:80/techweb/iw/521/21olweb.htm

    A review of MacHTTP is available at the URL http://www.ziff.com/~macweek/mw_webedge/webedge.html


    8 Extending WWW

    External Viewers

    Access to WWW can be achieved by using a client such as NCSA Mosaic to display HTML documents and inline images in GIF format. However the World-Wide Web is an extensible system: clients can access information which is in other formats than HTML.

    When a client receives a file from a server it checks on the file type. If the file type indicates that it is an HTML document, the file will be displayed by the browser. Otherwise the browser's configuration file can specify an external viewer which can be used to display the file. A list of widely used external viewers is given in Table 8-1.

    File Format         Viewer                             
    JPEG                LVIEW (MS Windows) xv (X Windows)  
    Postscript          Ghostview                          
    DVI                 xdvi (X Windows)                   
    MPEG                mpeg_play (X Windows and MS        
                        Windows)                           
    
    Table 8-1 Popular Viewers.

    The association between the file type and the viewer is given in the browser's configuration file. A typical configuration file for Mosaic for Windows is given in Figure 8-1.

    [Viewers]
    TYPE0="audio/wav"
    TYPE1="application/postscript"
    TYPE2="image/gif"
    TYPE3="image/jpeg"
    TYPE4="video/mpeg"
    TYPE5="video/quicktime"
    TYPE6="video/msvideo"
    TYPE7="application/x-rtf"
    TYPE8="audio/x-midi"
    TYPE9="audio/basic"
    TYPE10="image/x-action"
    TYPE11="application/x-w3launch"
    application/postscript="L:\winapps\ghost\gsview %ls"
    application/x-w3launch="n:\windept\bmb\w3launch\w3launch %ls"
    image/gif="L:\winapps\mosaic2\lview %ls"
    image/x-action="n:\windept\bmb\action25\playact %ls"
    image/jpeg="L:\winapps\mosaic2\lview %ls"
    video/mpeg=""
    video/quicktime=""
    video/msvideo=""
    audio/wav=""
    audio/x-midi="mplayer %ls"
    application/x-rtf="write %ls"
    Figure 8-1 Part of a MOSAIC.INI File.

    Running Client Applications

    If a Postscript file is retrieved from a WWW server the browser program normally responds "I don't know what to do with a Postscript file - but I know a program that does. I'll pass the Postscript file on to the Ghostview program". If, for example, an Excel spreadsheet is retrieved from a WWW server the client could be configured to respond "I don't know what to do with an Excel spreadsheet file - but I know a program that does. I'll pass the spreadsheet file on to the Excel program". This technique extends the functionality of the World-Wide Web from acting as a distributed file viewer to acting as a distributed program manager.

    Security Implications

    Unfortunately there are a number of security concerns with such an approach. For example an application developed using the Toolbook authoring system could be delivered using WWW. The application could then be launched using a local copy of Toolbook. The application could have a button marked Start. Clicking this button could then result in files held on the local machine being deleted! Even associating a word processed document with Word For Windows holds dangers, as many Microsoft applications, including Word For Windows, support the use of macros, including autostart macros, which could also cause files to be deleted.

    As a general principle there are dangers in automatically invoking applications from WWW clients.

    Implementing Security - W3Launch

    There are security problems in using a WWW browser to download and run software from the Internet. It is generally not considered wise to configure a browser so that it recognises file types which contain programs. Jon Maber, Biochemistry and Molecular Biology, University of Leeds has developed a launching program for the Bionet TLTP project which provides a simple and secure method of launching only authorised software.

    Further details on the W3Launch program is available at the URL http://www.leeds.ac.uk/bionet/student/pre-stud.htm


    Figure 8-2 W3Launch.

    It should be noted that W3Launch is an application developed at the University of Leeds - it is not part of WWW itself.

    Server-side Extensions

    Example

    The previous section described how it is possible to run applications on the client machine. It is also possible to run software on the server. A simple application running on the server is shown in Figure 8.3.

    #!/bin/sh
    echo Content-type: text/html
    echo
    if [ $# = 0 ]
    then
    echo "<HEAD>"
    echo "<!-- Script written by Brian Kelly --!>"
    echo "<TITLE>Search University Phone Directory</TITLE>"
    echo "<ISINDEX>"
    echo "</HEAD>"
    echo "<BODY>"
    echo "<H1>Phone Directory</H1>"
    echo "Enter surname of the person you are searching for.<P>"
    echo "Script written by <A HREF=http://www.leeds.ac.uk/
    ucs/people/BKelly/bk.html>Brian Kelly</A>."
    echo "</BODY>"
    else
    echo "<HEAD>"
    echo "<TITLE>Results Of Search</TITLE>"
    echo "</HEAD>"
    echo "<BODY>"
    echo "<H1>Results of Search for $* </H1>"
    echo "<PRE><TT>"
    grep -i "$*" /apps/data/Telephone_Directory
    echo "</PRE></TT>"
    echo "</BODY>"
    fi
    Figure 8-3 Script To Generate An HTML Document.

    The program, which is a C shell script which runs on the Unix server system, can be executed by selecting the URL http://www.leeds.ac.uk/cgi-bin/ucs/phone

    When the URL is selected since no arguments are provided, the first part of the if statement is run. This will generate the following HTML document:

    <HEAD>
    <!-- Script written by Brian Kelly --!>
    <TITLE>Search University Phone Directory</TITLE>
    <ISINDEX>
    </HEAD>
    <BODY>
    <H1>Phone Directory</H1>
    Enter surname of the person you are searching for.<P>
    Script written by <A HREF=http://www.leeds.ac.uk/
    ucs/people/BKelly/bk.html>Brian Kelly</A>.
    </BODY>
    Figure 8-4 Virtual HTML Document.

    The <ISINDEX> tag generates a search dialogue box. The HTML document is rendered as shown below.


    Figure 8-5 Running The Script.

    When text is entered in the Search box and the <Enter> key pressed, the script in Figure 8.4 is executed again. This time, since the program will be given an argument, the second part of the if statement will be executed. This will generate the HTML tags and then invoke the Unix grep command to search a file for lines containing the search string.


    Figure 8-6 Output From The Script.

    CGI Programs

    The example described above is known as a CGI program. CGI stands for the Common Gateway Interface. It is a standard which has been adopted by a number of server developers (primarily developers of the CERN and NCSA server software) for running programs on the server machine.

    Further information on CGI is available at the locations given below:

    A definition of CGI is available at the URL http://hoohoo.ncsa.uiuc.edu/cgi/

    Examples of the use of CGI programs are available at the URL http://hoohoo.ncsa.uiuc.edu/cgi/examples.html or http://paulina.elkraft.unit.no/ncsa/cgi/overview.html

    A tutorial on CGI is available at the URL http://www.charm.net/~web/Tutorial/CGI/

    A tutorial on Learn How To Write CGI Forms is available at the URL http://www.catt.ncsu.edu/users/bex/www/tutor/index.html

    The Web Developer's Virtual Library: CGI is available at the URL http://www.charm.net/~web/Vlib/Providers/CGI.html

    A CGI Programmer's Reference is available at the URL http://www.halcyon.com/hedlund/cgi-faq/

    An archive of useful CGI programs is available at the URL ftp://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/cgi/

    Pointers to CGI resources are available at the URL http://www.yahoo.com/Computers/World_Wide_Web/CGI_Common_Gateway_Interface/

    Forms

    Forms are often used to collect information from a user which is used as input to a CGI program. A description of forms is given below.

    Creating A Form

    A form consists of areas of the screen in which the user can input data. The data is sent to the HTTP server, which can run a script or program to process the data in some way. One common use of forms is to provide feedback on a WWW service. Input to the form can be emailed to the service administrator. Forms can also be used to input search criteria to be input to a search engine, or to specify parameters for distributed teaching and learning services.

    A form is defined by the <FORM ...> and </FORM> HTML tags. The <FORM> tag has the syntax:

    <FORM METHOD="method" ACTION="url">

    For example:

    <FORM METHOD="post" ACTION="http://leeds.ac.uk/ucs/cgi-bin/myscript">

    will send the input data to be processed by myscript.

    An example of a form is shown below:

    <TITLE>Fill-Out Form Example #7</TITLE>
    <H1>Fill-Out Form Example #7</H1>
    This is another fill-out form example, with toggle buttons. <P>
    <HR>
    <FORM METHOD="POST" ACTION="http://hoohoo.ncsa.uiuc.edu/htbin-post/post-query">
    <H2>Godzilla's Pizza -- Internet Delivery Service, Part II</H2>
    Type in your street address: <INPUT NAME="address"> <P>
    Type in your phone number: <INPUT NAME="phone"> <P>
    Which toppings would you like? <P>
    <OL>
    <LI> <INPUT TYPE="checkbox" NAME="topping" VALUE="pepperoni">
    Pepperoni.
    <LI> <INPUT TYPE="checkbox" NAME="topping" VALUE="sausage"> Sausage.
    <LI> <INPUT TYPE="checkbox" NAME="topping" VALUE="anchovies">
    Anchovies.
    </OL>
    How would you like to pay? Choose any one of the following: <P>
    <OL>
    <LI> <INPUT TYPE="radio" NAME="paymethod" VALUE="cash" CHECKED> Cash.
    <LI> <INPUT TYPE="radio" NAME="paymethod" VALUE="check"> Check.
    <LI> <I>Credit card:</I>
    <UL>
    <LI> <INPUT TYPE="radio" NAME="paymethod" VALUE="mastercard"> Mastercard.
    <LI> <INPUT TYPE="radio" NAME="paymethod" VALUE="visa"> Visa.
    <LI> <INPUT TYPE="radio" NAME="paymethod" VALUE="americanexpress">
    American Express.
    </UL>
    </OL>
    Would you like the driver to call before leaving the store? <P>
    <DL>
    <DD> <INPUT TYPE="radio" NAME="callfirst" VALUE="yes" CHECKED> <I>Yes.</I>
    <DD> <INPUT TYPE="radio" NAME="callfirst" VALUE="no"> <I>No.</I>
    </DL>
    To order your pizza, press this button: <INPUT TYPE="submit"
    VALUE="Order Pizza">. <P>
    </FORM>
    Figure 8-7 HTML Document Defining A Form.

    This example is available at the URL http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/example-7.html

    The way in which form is displayed is illustrated below.


    Figure 8-8 Using A Form.

    Processing A Form

    Once the form is submitted the data which has been entered is appended to the end of the URL given in the ACTION attribute of the FORM tag. This information is then processed by the script.

    Further Information About Forms

    Forms tutorials are available at the URL http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html, http://hoohoo.ncsa.uiuc.edu/docs/cgi/forms.html, http://www.webcom.com/html/tutor/forms/start.html and http://kuhttp.cc.ukans.edu/info/forms/forms-intro.html

    A forms testing suite is available at the URL http://www.research.digital.com/nsl/formtest/home.html


    9 Utilities

    A number of useful utility programs have been developed which will assist systems managers and information providers.

    w3new is a program which will extract a list of URLs from the Mosaic client hotlist file or extract URLs from a HTML document. It will then retrieve the modification dates for each document listed and output a HTML file with the URLs sorted by their last modification date.

    Information about the program is available at the URL http://www.stuff.com/~bcutter/home/programs/w3new/w3new.html The utility was written by Brooks Cutter (mailto:bcutter@stuff.com).

    wusage is a WWW server usage meter which produces weekly activity reports in HTML. In addition it provides graphical displays of server usage.

    Further information is available at the URL http://siva.cshl.org/wusage.html The software is available from the URL ftp://isis.cshl.org/pub/wusage wusage was written by Thomas Boutell (mailto:boutell@netcom.com).

    getstats (formerly called getsites) is a versatile WWW server log analyser. It is available at the URL http://www.eit.com/software/getstats/getstats.html

    weblint is a Unix utility for checking the syntax of HTML documents. The checks include illegally nested, overlapped, unclosed and obsolete tags. Further details are available at the URL http://www.khoros.unm.edu/staff/neilb/weblint.html The software can be obtained from the URL ftp://ftp.khoros.unm.edu/pub/perl/www/. The utility was written by Neil Bowers, Khoral Research Inc. (mailto:neilb@khoros.unm.edu) The email list weblint@khoros.unm.edu provides announcements of new versions of the software.

    Verify_links is a robot which performs link verification. Further information is available at the URL http://wsk.eit.com/wsk/dist/doc/admin/webtest/verify_links.html

    MOMspider (Multi-Owner Maintenance spider) is a tool which can be used to help information providers and system managers to maintain links to documents. MOMspider is available at the URL http://www.ics.uci.edu/WebSoft/MOMspider/

    The following utilities are available at the URL ftp://src.doc.ic.ac.uk/pub/packages/WWW/tools/

    checkweb looks for dead links in your Web

    html+tables.shar creates preformatted text tables from HTML+ Table definitions

    mosaic-wais-cli.pl does a WAIS search using Mosaic from the command line

    newslist/ compiles an HTML page of links to all newsgroup on your server

    simon/ URL database to replace NCSA Mosaic's Hotlist

    test-cgi/ sets up HTTP environment for a CGI script

    url-get.pl a perl script which brings in any document given its URL

    w3get.pl retrieves a HTML page named by a URL and all HREFs and IMGs in it

    A list of software tools produced by EIT is available at the URL http://www.eit.com/software/ The software includes:

    The Webmaster's Starter Kit, which simplifies setting up a Web site, and also includes several utilities to help you maintain and develop your Web site.

    WWWeasel, a Web publishing tool that includes a full-featured HTML editor and document management capabilities.

    Hypermail

    Hypermail is a program that converts a file of email messages to a WWW form. It is available at the URL http://gummo.stanford.edu/html/hypermail/hypermail.html


    Figure 9-1 Use of the Hypermail Utility.

    Figure 9-1 illustrates use of Hypermail. Messages posted to the web-support Mailbase list are converted to WWW format using Hypermail. Hypermail enables the messages to be linked (by threads) in date, subject or author order. Hypermail; archives for Mailbase lists are available at the URL http://www.mailbase.ac.uk/hypermail/lists.html

    Webify

    Webify is a Unix utility which can convert output from desktop presentation packages, such as Powerpoint, into a form suitable for accessing on WWW. Webify converts a Postscript file into a set of linked HTML files contauing inline images. Example of output from Webify is shown in Figure 9-2 (taken from the URL http://www.leeds.ac.uk/ucs/courses/course_material.html ).
    Figure 9-2 Output from Webify.

    Further information about Webify is available from the URL http://cag-www.lcs.mit.edu/~ward/webify/webifydoc/

    Webwatch

    WebWatch is a tool for keeping track of changes in selected WWW documents. Using an HTML document referencing URLs on the Web, WebWatch produces a filtered list, containing only those URLs that have been modified since a given time. WebWatch generates a local HTML document that contains links to only those documents which were updated after the given date. You can load this document into any Web browser and use it to navigate to the updated documents (as illustrated below).


    Figure 9-3 Use of the Webwatch Utility.

    Further information about Webwatch is available at the URL http://www.specter.com/users/janos/specter/


    10 Legal and Ethical Issues

    Is your WWW service legal? Who is legally responsible for the contents of a WWW service? Is pornography acceptable on a WWW service? If not, who defines what is pornographic and what is art? How do you reconcile control over the contents of a WWW server with intellectual freedom?

    The author does not know the answer to these questions. Fortunately WWW is attracting the interests of lawyers, philosophers and artists who are starting to address these issues. Many of the papers which have been published address issues which affects WWW providers in the USA. The American Constitution, and in particular the amendment on free speech, means that much of the work published in the USA in this area is not relevant to the UK.

    Liability

    It could be argued that the contents of a WWW service are the responsibility of the organisation which runs the service. So if an undergraduate has been granted permission to publish information and publishes libellous information the University may be legally responsible. An editorial in the Times Higher Education Supplement suggested that if the organisation has published guidelines covering acceptable and unacceptable use the organisation will have a strong defence if a case is brought to law.

    Computer Misuse Act

    It is likely that any material which incites, encourages or enables others to gain unauthorised access to a computer system would be found illegal under this act.

    Pornography

    Are pictures of naked women acceptable on a WWW service? It could be argued that similar guidelines which govern the contents of a University library should be developed for the WWW. Are pictures of naked women acceptable in books in the university library? The answer is probably "yes", especially if the university has a fine art department. Similar arguments could be made for textual pornography.

    However even the most liberal individual is likely to be offended by some of the pornography which is believed to be available on the Internet. In addition UK legislation on computer pornography is likely to be introduced shortly. This could mean that universities have a legal obligation to concern itself with computer pornography.

    Copyright, Designs and Patents Acts

    In general the Copyright, Designs and Patents acts require that the permission of the owner of the intellectual property must be sought before any use of it is made whatsoever.

    A WWW manager may have the responsibility to ensure that copyright materiel is not made available unless the copyright holder has granted permission. This may affect research papers which have been submitted for publication. It may also affect the use of photographs, drawings and maps, for which the copyright may be owned, for example, by the photographer or the organisation which commissioned the photograph.

    Data Protection Act

    Information about individuals which is available on WWW may have to be registered with the Data Protection Officer. The information provider may have to abide by regulations to ensure the accuracy of the information.

    Equality Of Access To Information

    WWW can provide global access to a wide range of information services. However including large logos and graphical icons on pages can act as a barrier to access to the information, especially for readers in developing countries with limited network access. In some developing countries access may be provided over local telephones lines. A health worker in a hospital in Africa who wishes to retrieve information about public health services may have to pay the additional costs in retrieving unnecessary graphics. If the local telephone company is owned by a multinational telephone corporation then accessing the information will result in a transfer of money from the developing country to the multinational corporation.

    Advertising

    As shown in Figure 10-1 some WWW service providers have sponsors for their pages. Is this currently acceptable within the UK academic community? Should it be acceptable?


    Figure 10-1 The "What's New On Mosaic" Page.

    JANET Acceptable Use Policy

    UK universities which make use of JANET (the Joint Academic Network) must abide by the rules and regulations governing the use of JANET. The following point should be noted.

    JANET may be used for any legal activity in furtherance of the aims and policies of a connected organisation, subject to a number of rules. For example, the following uses are not permitted on JANET:

    What Is Your WWW Service For?

    Formulating an institutional acceptable use policy for WWW information providers may not be a simple task. There are likely to be lively discussions over censorship and control. The formulation of the policy will be helped if the institution has a clear idea of what it expects from its WWW service. Is it:

    FurtherInformation

    An interactive document called Sex, Censorship and the Internet is available at the URL http://www.eff.org:80/CAF/cafuiuc.html This document asks questions such as should universities carry alt.sex Usenet groups and should students be punished for using vulgarities on the Net. The document provides pointers to case studies.

    Information about the Data Protection Act is available at the URL http://www.open.gov.uk/dpr/dprhome.htm

    Andrew Charlesworth from the Department Law at the University of Hull has set up a Mailbase list for discussion of UK and European issues in Intellectual Property law, with special references to the impact of information technology and the Internet. To subscribe to the law-ipr Mailbase list send the message subscribe law-ipr firstname lastname to mailbase@mailbase.ac.uk

    The CCTA Collaborative Open Groups (COGS) offers support for legal issues. Further information is available at the URL http://www.open.gov.uk/cogs/ To subscribe to a COG mailing list send the message subscribe mailid list to the address listserv@ccta.gov.uk For example J.Brown@leeds.ac.uk would send the msessage subscribe J.Brown@leeds.ac.uk legal to subscribe to the legal list.

    Andrew Charlesworth, Director of the Information Law and Technology Unit at the University of Hull gave a presentation about the legal issues of the World Wide Web at a workshop on WWW A Strategic Tool for UK Higher Education held at Loughborough University in February 1995. His paper is available at the URL http://info.mcc.ac.uk/CGU/SIMA/WWW/legal.html

    A list of URLs for the Codes of Practice and Guidelines for the establishment and continuing operation of UK H.E. academic WWW sites is available at the URL http://cspmserver.gold.ac.uk/guidance.html


    11 CWISes And WWW

    WWW is an ideal system for developing a campus (or community) wide information system (CWIS). The world's first multimedia CWIS was developed at the Honolulu Community College and officially announced at the end of May 1993. It is available at the URL http://www.hcc.hawaii.edu


    Figure 11-1 CWIS At HCC.

    The HCC CWIS was developed to support its goal of becoming the "Technological Training Centre of the Pacific". The most important aspects of developing and managing an effective CWIS are managerial and not technical. Formulating the objectives of a CWIS, resourcing it and developing a training programme are key issues which an institution needs to address.

    Finding Out More

    Papers by Judy Hallman about CWISes are available at the URL ftp://sunsite.unc.edu/pub/docs/about-the-net/cwis/cwis-l and ftp://sunsite.unc.edu/pub/docs/about-the-net/cwis/hallman.txt

    Polly-Alida Farrington's listing of CWISes is available at the URL http://www.rpi.edu/Internet/cwis.html

    Lists of (global) CWISes are available at the URLs http://www.rpi.edu/Internet/cwis.html and http://kawika.hcc.hawaii.edu/ws94/cwis.html

    The CWIS-L Listserv mailing list provides a forum for the discussion of topics related to campus-wide information systems. To subscribe send the message SUB CWIS-L your name to the address LISTSERV@MSU.EDU

    The Universities and Colleges Teaching, Learning and Information Group (UCTLIG) have produced a CWIS Manager's Handbook which addresses many CWIS management issues. The handbook is available at the URL http://www.ox.ac.uk/uctlig/cwis/

    A Framework for Administering NASA's Web Information Hypermedia is available at the URL http://naic.nasa./gov/www-framework.html


    12 Teaching And Learning On WWW

    Although WWW was initially used as a distributed multimedia system techniques such as CGI scripts meant that interaction could be built into WWW applications. Much of the interest in the WWW within the academic community is based on its potential for developing distributed teaching and learning software rather than simply delivering information.

    Examples of Teaching And Learning On WWW

    An early example of a distributed multimedia teaching prototype was developed by Ben Whitaker, School of Chemistry, University of Leeds in 1993. As can be seen in Figure 12-1 this prototype is a simple hypertext application. It is of interest because it illustrates how distributed teaching applications can be developed.


    Figure 12-1 Early Example Of A Distributed Multimedia Teaching Application.

    A more sophisticated teaching application was developed by the School of Chemistry in conjunction with Imperial College. The example illustrated in Figure 12-2 makes use of a chemistry MIME type.


    Figure 12-2 Using a MIME Chemistry Type.

    In this example the WWW client is configured to associate the MIME type with the RasMol program. For example in NCSA Mosaic For X the line:

    chemical/x-pdb; rasmol %s

    is included in the .mailcap file. When a URL with the extension .pdb is selected the file will be downloaded and the Rasmol program launched, as illustrated in Figure 12-2.

    Further information on this project is available at the URL http://chem.leeds.ac.uk/Project/MIME.html

    The Globewide Network Academy (GNA) is a consortium of educational and research organisations. Its mission is to provide a central organisation in which students, teachers, scholars and researchers can meet and interact. Further information about GNA is available at the URL http://uu-gna.mit.edu:8001/uu-gna/

    Mark Cox, Department of Industrial Technology, University of Bradford presented a paper at the Mosaic and the Web conference on Robotic Telescopes: An Interactive Exhibit on the Web. This paper is available at the URL http://www.eia.brad.ac.uk/mark/wwwf94/wwwf94.html

    Mark also has a collection of pointers to hardware control services over the Web which is available at the URL http://www.eia.brad.ac.uk/mark/fave-inter.html

    A Virtual Frog Dissection Kit has been developed at the LBL. It is available at the URL http://george.lbl.gov/ITG.hm.pg.docs/Whole.Frog/Whole.Frog.html


    Figure 12-3 Frog Dissection.

    CD ROM Facilities

    Providing teaching and learning services on WWW does not necessarily deny access to those who do not have a network connection. Teaching and learning services developed on WWW can be transferred to a CD ROM and used on a standalone system. Such systems are typically developed so that there is a closed set of links. The files (which could include HTML documents, image, sound and video files) and the WWW browser software can then be transferred onto a CD ROM. This approach provides an updateable service for users with network connectivity together with a fixed service for users with access to a PC or Macintosh with a CD ROM player.

    National Resources

    A number of TLTP (Teaching and Learning Technology Programme), CTI (Computers in Teaching Initiative) and ITTI (Information Technology Training Initiative) projects are using WWW to disseminate information about their projects or, in some cases, to deliver their courseware.

    CTISS is available at the URL http://www.ox.ac.uk/cti/

    CTI Centre For Biology is available at the URL http://www.liv.ac.uk/ctibiol.html

    CTI Centre For Chemistry is available at the URL http://www.liv.ac.uk/ctichem.html

    CTI Centre For Law is available at the URL http://crocus.csv.warwick.ac.uk/WWW/law/default.html

    CTI Centre For Psychology is available at the URL http://ctipsych.york.ac.uk/

    CTI Centre For Sociology is available at the URL http://lorne.stir.ac.uk/departments/cti_centre/

    CTI Centre For Textual Studies is available at the URL http://www.ox.ac.uk/depts/humanities/

    BioNet Project is available at the URL http://www.leeds.ac.uk/bionet.html

    CLIVE Project is available at the URL http://www.vet.ed.ac.uk/

    Insurrect Project is available at the URL http://av.avc.ucl.ac.uk/

    Institute Of Computer Based Learning, Heriot-Watt is available at the URL http://www.icbl.hw.ac.uk/

    INTERACT Project is available at the URL http://medusa.eng.cam.ac.uk/~interact/

    Interactive Learning Centre, University of Southampton is available at the URL http://ilc.ecs.soton.ac.uk/welcome.html

    ITTI is available at the URL http://www.hull.ac.uk/Hull/ITTI/homepage.html

    PsyCLE Project is available at the URL http://ctipsych.york.ac.uk/Psycle/PsyCLEinfo.html

    STILE Project is available at the URL http://indigo.stile.le.ac.uk/

    TLTP is available at the URL http://www.icbl.hw.ac.uk/tltp/

    TLTP Archaeology Consorteum is available at the URL http://www.brad.ac.uk/acad/archsci/homepage.html

    TLTP Mathematical Project is available at the URL http://othello.ma.ic.ac.uk/

    Further Information

    Further information about a mailing list for teaching and learning is available at the URL http://tecfa.unige.ch/edu-ws94/ws.html

    Pointers to global uses of WWW for teaching are available at the URL http://wwwhost.cc.utexas.edu/world/instruction/index.html

    Harry Kriz's paper "Teaching and Publishing in the World Wide Web" is available at the URL http://learning.lib.vt.edu/webserv/webserv.html


    13 Collaboration On WWW

    WWW was originally envisaged by Tim Berners-Lee as a groupware tool. In practice it grew in popularity as a publishing tool. However software developers are now working on tools which will facilitate collaboration on WWW. A brief summary of some of the collaborative tools is given below.

    Asynchronous Systems

    WIT

    WIT, the WWW Interactive Talk system, was announced shortly after the WWW 94 conference in CERN. WIT can be accessed at the URL http://www.w3.org/wit


    Figure 13-1 WIT.

    Access To Usenet

    The Netscape browser can be used to post to Usenet newsgroups.


    Figure 13-2 Posting To Usenet News.

    Hypermail

    Hypermail is a utility which can be used to convert mail archives to hypertext format on WWW. Further details are available at the URL http://gummo.stanford.edu/html/hypermail/hypermail.html A example of a hypermail archive is illustrated below.


    Figure 13-3 A Hypermail Archive.

    Mailserv

    Mailserv provides a forms interface to a number of mailing list servers. The software is available at the URL http://iquest.com/~fitz/www/mailserv/ The software was written by Patrick M Fitzgerald (mailto:pmfitzge@iquest.com)


    Figure 13-4 The Mailserv Interface To Mailing List Servers.

    Synchronous Systems

    WebChat

    WebChat is a real-time, multimedia chatting application for the Web. WebChat allows visitors at your Web site to engage in live conversation. Users can incorporate images, video and audio clips, and "hotlinks" into their chat. WebChat is available at the URL http://www.irsociety.com/webchat.html


    Figure 13-5 Webchat.

    MONET

    One interesting application of a multimedia desktop conferencing systems is MONET (Meeting on the Network) which is described in Applications of Mosaic in Health Care Delivery by Srivasa et al. This paper, which was presented at the Mosaic and The Web conference, is available at the URL http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/MedTrack/srivasa/artemis.html


    Figure 13-6 MONET.

    At the time of writing many of these services are experimental. However, given the rapid growth of WWW and the extent of development work which is going on, such services may be mainstream in the near future.

    Virtual Conferences

    One form of collaboration within the academic community is through conferences, workshops and seminars. Whenever the author gives a paper at a conference or is involved in running a workshop or a course he makes his papers, OHP foils, etc. available on WWW under his personal page (sometimes referred to as a vanity page).

    About 200 of the papers which were given at the second WWW conference, Mosaic and The Web, were available on WWW before the conference began. Perhaps one important question which the academic community should be addressing is whether it should be the standard practice for conference proceedings to be made available on WWW.

    Further Information

    A collection of WWW collaborative projects is available at the URL http://union.ncsa.uiuc.edu/HyperNews/get/www/collaboration.html

    Articles on Towards Standards for an Interactive Web is available at the URL http://www.geom.umn.edu/hypernews/get/interactive/index.html

    Examples of conference proceedings available on WWW is given in Appendix 5.


    14 Libraries And WWW

    University Libraries should have a strong interest in WWW developments. This handbook provides a overview of the World-Wide Web which should be of interest to libraries which are considering using WWW.

    Example Of A Gateway To A Library Catalogue

    In the UK many university library catalogues are held in proprietary systems with old-fashioned user interfaces. It may be possible, however, to use WWW to provide an interface to the library catalogue which is consistent with other information services on WWW. At the University of Leeds a backup copy of the library catalogue is kept on a central Unix system using the BRS free text retrieval system. A gateway program, has been developed by Terry Screeton, Computing Service which provides access to the Library catalogue. This gateway is available at the URL http://www.leeds.ac.uk/library/cats/backup.html


    Figure 14-1 Gateway To A Library Catalogue.

    In Figure 14-1 a form is completed. The term Internet is used as a search term. Once the form is submitted the data is sent to a CGI program. In this case the CGI program is a C program which invokes the BRS free text retrieval system. The output from the BRS program is then processed to generate the appropriate HTML markup. The output from the search is illustrated in Figure 14-2.


    Figure 14-2 Output From The Library Catalogue.

    Resources

    Datalib provides an interface to a number of online information service hosted at Edinburgh University. It can be accessed at the URL http://datalib.ed.ac.uk/

    SALSER is an online information service about serials held in Scottish academic and research libraries. It can be accessed at the URL http://salser.ed.ac.uk/

    The Clearinghouse for subject-oriented Internet resource guides is available at the URL http://http2.sils.umich.edu/~lou/chhome.html

    The EINet Galaxy collection of online resources is available at http://galaxy.einet.net/galaxy.html

    The CERN Virtual Library is available at the URL http://info.cern.ch/hypertext/DataSources/bySubject/Overview.html

    The Boulder Community Network service is available at the URL http://bcn.boulder.co.us/ Its policy statement is available at the URL http://bcn.boulder.co.us/bcn/policy.html The policy statement includes a bill of rights, a freedom to read statement and a freedom to view statment.

    A list of Innovative Internet Applications in Libraries is available at the URL http://frank.mtsu.edu/~kmiddlet/libweb/innovate.html

    The following Library resources may also prove useful:

    Finding Out More

    Web4Lib is a mailing list aimed at library-based WWW managers and developers. To subscribe to the list email listserv@library.berkeley.edu with the message SUBSCRIBE Web4Lib yourname.

    Eric Morgan's article on Libraries and the Web in Public Access Computer Systems Review, 5(6) 1994:5-26 is available at the URL http://www.lib.ncsu.edu/staff/morgan/www-and-libraries.html


    15 Future Developments

    This handbook describes how to run a WWW service using the technology which is available today. However the technology is developing so rapidly that it is important that WWW managers and information providers are aware of developments which may happen sooner rather than later.

    Uniform Resource Identifiers

    Uniform Resource Locators (URLs) describe the location of a resource on the Internet and the protocol which is used to access the resource. An object on WWW may be available in many locations: for example popular browsers, such as NCSA Mosaic, are available from anonymous FTP servers in many locations around the world. The mirroring of files helps to minimise network traffic over busy links, such as the trans-Atlantic link. Mirroring also reduces the load on the central server. Uniform Resource Names (URNs) will provide a mechanism for uniquely identifying a resource. In the future it is likely that a browser will request a URN rather than a URL. A URN to URL resolver will locate the nearest object (nearest in network terms).

    Uniform Resource Characteristics (URCs) will provide meta-information about a document. This information could include information about the author, keywords, expiry dates (for caching servers), copyright and cost information. URCs could also provide information about the quality of the document. For example a seal of approval (SOAP) could be given by university publications group which confirms, by the use of a digital signature, that the document is a PhD thesis.

    Uniform Resource Agents (URAs) will provide tools to search for information on the Internet. For further information see the URL http://nic.nordu.net:80/ftp/internet-drafts/draft-ietf-uri-ura-00.txt

    Uniform Resource Identifiers (URIs) includes URLs, URIs and URCs. The URI specification is available as RFC 1630. The mailing list uri@bunyip.com is used to discuss URIs. Send email to uri-request@bunyip.com to subscribe to this list. Archives of the list are available at the URL http://www.acl.lanl.gov/URI/archive/uri-archive.index.html

    New Facilities

    CCI

    NCSA Mosaic For X (version 2.5) provides support for CCI (Common Client Interface). This will provide a standard mechanism by which WWW browsers can communicate with external programs. A number of demonstrations of this facility are available, including a slideshow program, which instructs Mosaic to display URLs which are specified in a file. A program called xwebteach provides a mechanism by which a teacher can control the display of Mosaic on students; machines. Further information about the CCI specification is available at the URL http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/cci-spec.html

    W3A

    W3A (World-Wide Web Applets) is a proposal for a standard API for dynamically linking applets (which can be defined as a piece of software that can be attached to a host program such as a WWW browser). Further information is available at the URL http://www.let.rug.nl/~bert/W3A/W3A

    Appendix 1 Mailing Lists

    This section contains information on mailing lists and Usenet groups on topics related to the World-Wide Web.

    Please note that before sending a message to any of these lists you should listen to the discussions first and, where possible, read the information about the list. You should not send simple questions about, say, installing Mosaic on your home PC to a list for developers of the WWW protocols.

    Usenet News

    The following Usenet newsgroups are available:

    Archives of These Groups

    Archives of the www-announce, net-happenings mailing lists and comp.infosystems.www.* Usenet groups are available at the URL http://cair-archive.kaist.ac.kr/Archive/Announce/

    CERN Mailing Lists

    To join a list at CERN send electronic mail to list-request@w3.org with the line subscribe list your name.

    For example if John Smith wanted to subscribe to the www-talk list he would send the message subscribe www-talk John Smith to the address www-talk-request@w3.org

    An overview of CERN mailing lists is available at the URL http://www.w3.org/hypertext/WWW/Mail/

    www-html

    www-html is for technical discussions of the HyperText Markup Language HTML. Design discussions only, please, not newcomer questions.

    This list is archived at the URL http://www.w3.org/hypertext/WWW/Archive/www-html/ and at the URL http://www.eit.com/www.lists/

    www-lib

    www-lib is for technical discussions about architecture and new features, exchange of diffs, bug reports etc for Library of Common Code.

    www-style

    Discussion of HTML style sheets to support standardization and implementations. This list is archived at the URL http://asearch.mccmedia.com/www-style/

    www-talk

    www-talk is for technical discussion for those developing WWW software or with that deep an interest. (Please keep this to WWW technical design only. Not general questions from non-developers, which should go to the newsgroup, nor for HTML topics which should go to www-html.)

    This list is archived at the URL http://www.w3.org/hypertext/WWW/Archive/www-html/ and at the URL http://www.eit.com/www.lists/

    A form for subscribing and unsubscribing to the mailing list is available at the URL http://rohan.sdsu.edu/cgi-bin/terjen/listmaster/form?list=www%2dtalk&server=www%2dtalk%2drequest%40mail%2ew3%2eorg&subscribe=subscribe&unsubscribe=unsubscribe&post=www%2dtalk%40mail%2ew3%2eorg&languages=English&public=on&description=WWW+Developers+Talk&help=Help

    Obsolete Lists

    The following lists are now obsolete, but the archives may still be available.

    www-announce

    www-announce was for anyone interested in WWW, its progress, new data sources, new software releases. This list is archived at the URL http://www.w3.org/hypertext/WWW/Archive/www-announce/

    www-rdb

    www-rdb is for discussion of gatewaying relational databases into WWW. This list is archived at the URL http://www.w3.org/hypertext/WWW/Archive/www-rdb

    www-proxy

    www-proxy is for technical discussion about WWW proxies, caching, and future directions. This list is archived at the URL http://www.w3.org/hypertext/WWW/Archive/www-proxy

    Other Mailing Lists

    Note that a number of mailing list archives are available at the URL http://asearch.mccmedia.com/menus/2091.html

    atmwww-l

    atmwww-l is an open and unmoderated discussion of the impact of Asynchronous Transfer Mode (ATM) technology and networking on the World-Wide-Web. To subscribe to the list send the message subscribe atmwww-l your name to the address listserv@cmuvm.csv.cmich.edu To send messages to the atmwww-l discussion list, email: atmwww-l@cmuvm.csv.cmich.edu

    cello-l

    cello-l is a discussion list for users of the Cello WWW browser. To subscribe to the list send the message sub cello-l your name to the address listserv@cornell.edu Further information is available at the URL ftp://ftp.law.cornell.edu/pub/LII/Cello/default.htm Archives of the list are available at the URL gopher://gopher.law.cornell.edu:70/11/listservs/cello

    html-wg

    html-wg is a mailing list for an IETF working group which is discussing developments of HTML. To subscribe email html-wg-request@oclc.org with the message SUBSCRIBE html-wg yourname An archive of the list is available at the URL http://www.ics.uci.edu/pub/ietf/html/

    http-wg

    The HTTP working group (http-wg)will work on the specification of the Hypertext Transfer Protocol (HTTP). HTTP is a data access protocol currently run over TCP and is the basis of the World-Wide Web. The initial work will be to document existing practice and short-term extensions. Subsequent work will be to extend and revise the protocol. Directions which have already been mentioned include:

    To subscribe email http-wg-request@cuckoo.hpl.hp.com with the message SUBSCRIBE http-wg yourname An archive of the list is available at the URL http://www.ics.uci.edu/pub/ietf/http/hypermail/

    libwww-perl

    libwww-perl is a library of Perl4 packages which provides a simple and consistent programming interface to the World-Wide Web. This library is being developed as a collaborative effort to assist the further development of useful WWW clients and tools.

    A mailing list has been established for technical discussion about libwww-perl, including problem reports, interim fixes, suggestions for features, and contributions. The mailing list address is libwww-perl@ics.uci.edu and administrivia (including subscribe requests) should be sent to libwww-perl-request@ics.uci.edu

    A Hypermail archive of the mailing list is also available at the URL http://www.ics.uci.edu/WebSoft/libwww-perl/archive/

    mosaic-l

    mosaic-l is a Listserv list for the NCSA Mosaic WWW browser. To subscribe send the message subscribe mosaic-l firstname lastname to the address listserv@uicvm.uic.edu

    NOTE This list is now believed to be defunct since it was being used for basic Mosaic questions, rather than providing a forum for Mosaic developers.

    MacHTTP-talk

    MacHTTP-talk is a mailing list for MacHTTP users has been set up. It provides an open forum for any questions, answers, suggestions, announcements, etc. about the MacHTTP server software. To subscribe to the list send a mail message to the address MajorDomo@academ.com containing the message subscribe MacHTTP-talk firstname lastname

    Further information is available at the URL http://www.uth.tmc.edu/mac_info/machttp/mailing_list.html

    moo-www

    moo-www is a mailing list to discuss links between MUDS, in particular systems based on Pavel Curtis's MOO server, and the World-Wide Web. Subjects for discussion include:

    The list is at moo-www@maths.tcd.ie Subscription requests should go to moo-www-request@maths.tcd.ie

    Netscape

    Netscape is a Listserv list for the Netscape WWW browser. This list is for the purpose of discussing features and bugs contained in this new browser, as well as the new tags Netscape implements. To subscribe send the message subscribe netscape firstname lastname to the address listserv@irlearn.ucd.ie

    Quality

    Quality is a mailing list for the discussion of quality issues. To subscribe to the list send the message subscribe quality to the address listmanager@naic.nasa.gov.

    An archive is available at the URL http://naic.nasa.gov/naic/archives

    PDF-L

    PDF-L is intended to be an electronic discussion area for devotees of Adobe Acrobat software, a place to ask questions, share ideas and exchange information. To subscribe to the mailing list, send email to Majordomo@binc.net with the following command in the body of your email message SUBSCRIBE PDF-L Your_name (e.g. SUBSCRIBE PDF-L Glenn Gernert ) To post messages to all subscribers on the list, send email to PDF-L@emrg.com

    unite

    unite is a Mailbase list which can be used for discussions about a User Network Interface To Everything. Based in the UK with an international membership. To subscribe email mailbase@mailbase.ac.uk with the message join unite yourname

    The UNITE archives are available at the URL http://mailbase.ac.uk/pub/lists/unite

    Web4Lib

    Web4Lib is a list for Library-based WWW managers and developers. To subscribe email listserv@library.berkeley.edu with the message SUBSCRIBE Web4Lib yourname

    web-support

    web-support is a Mailbase list which can be used for discussions about WWW issues. Based in the UK. To subscribe email mailbase@mailbase.ac.uk with the message join web-support yourname

    The archives are available at the URL http://mailbase.ac.uk/pub/lists/web-support

    WebServer-NT

    The WebServer-NT mailing list is intended as a forum where users of Windows NT can discuss World-Wide Web server issues. Likely topics might include (but are not limited to):

    To subscribe, send message to webserver-nt-request@mailserve.process.com and in the message body type SUBSCRIBE webserver-nt

    To get help on the mailserver commands put HELP in your message body To receive a list of the available mailing list put LISTS in the body To receive a list of subscribers in a list put SEND/LIST webserver-nt in the body.

    www-buyinfo

    Discussions of issues of commercial transactions of information via the Web take place on the www-buyinfo mailing list. To subscribe send the message subscribe www-buyinfo to the address www-buyinfo-request@allegra.att.com

    The archives are held at the URL http://www.research.att.com/www-buyinfo/about.html

    www-courseware

    www-courseware is a list dedicated to courseware on WWW. To subscribe send mail to www-courseware-request@eit.com containing the message subscribe

    An archive of the list is held at the URL http://www.eit.com/mailinglists/www-courseware/archive/

    wwww-literature

    This is a list dedicated to literature on the WWW. To subscribe send mail to www-literature-request@eit.com containing the message subscribe

    An archive of the list is held at the URL http://www.eit.com/mailinglists/www-literature/archive/

    www-managers

    The aim of this list is to provide a high signal-to-noise, quick turn-around forum for managers of WWW servers and sites to get answers to specific questions about the setup and maintenance of http servers and clients. The mailing list is managed by a utility called majordomo. To subscribe send the message subscribe www-managers to the address majordomo@lists.stanford.edu An archive of the list is available at the URL http://www-archive.stanford.edu/lists/mlists.html

    www-security

    www-security is a list to discuss different methods of providing a secure WWW service. The list will focus on how to secure HTTP and/or HTTP-like protocols to provide privacy, user authentication, service certifications and document checking (digital signatures).

    To subscribe send mail to www-security-request@nsmx.rutgers.edu containing the message subscribe www-security

    An archive of the list is held at the URL http://www.verity.com/www-security.html

    Information about the www-security list is also available at the URL http://www-ns.rutgers.edu/www-security/index.html

    www-speed

    The www-speed list is dedicated to the proposition that the web is just too darned slow, and that some of its key components have inherent performance problems that cannot be dealt with without changes to protocols. Topics appropriate to the list are:

    The list address is www-speed@tipper.oit.unc.edu The request address is www-speed-request@tipper.oit.unc.edu

    www-vrml

    VRML (the Virtual Reality Markup Language) is an evolving specification for a platform-independent definition of 3-dimensional spaces within the World-Wide Web. It is designed to combine the best features of virtual reality, networked visualization, and the global hypermedia environment of the World-Wide Web.

    To subscribe to the Virtual Reality Markup Language (VRML) list send mail to majordomo@wired.com containing the message subscribe www-vrml

    Further information is available at the URL http://www.wired.com/vrml/

    www@unicode.org

    www@unicode.org is intended for indepth technical discussions of the possibility of modifying the WWW protocols to support Unicode. It is going along the same lines as some of the Unicode discussions on www-talk, just a more focused group with no other WWW issues. If interested in joining this list, send email to www-request@unicode.org with a subject line of subscribe, and a message body of subscribe www@unicode.org your name

    Appendix 2 WWW Resources

    A wide range of resource materials about the World-Wide Web are available on the World-Wide Web. A number are listed below.

    WWW Online Resources

    The World-Wide Web Developer's Library is available at the URL http://www.stars.com/

    Spider's Web is available at the URL http://gagme.wwa.com/~boba/spider.html

    Yahoo is available at the URL http://www.yahoo.com/Computers/

    Computers: World-Wide Web is available at the URL http://www.yahoo.com/Computers/World_Wide_Web/

    One World is available at the URL http://oneworld.wa.com/htmldev/devpage/dev-page1.html

    Web Weaver's Page is available at the URL http://www.nas.nasa.gov/NAS/WebWeavers/

    WebStars: Astrophysics in Cyberspace is available at the URL http://guinan.gsf.nasa.gov/

    Pointers to WWW resources (Toronto University) is available at the URL http://www.utirc.utoronto.ca/

    PC Week's pointers to WWW resources is available at the URL http://www.upcweek.ziff.com/~pcweek/pointers.html

    Oslonet is available at the URL http://www.oslonet.no/html/demo/WWWinfo/html

    CGI Programmer's Reference is available at the URL http://www.halcyon.com/hedlund/cgi-faq/

    The WWW Locator Guide is available at the URL http://groucho.gsfc.nasa.gov/Code_520/locator/locator.html

    A list of World Wide Web FAQs and Guides is available at the URL http://cuiwww.unige.ch/OSG/FAQ/www.html

    A Guide to HTML Authoring & Web Resources is available at the URL http://www.library.nwu.edu/resources/www/

    World-Wide Web "How-To" Resources and Guides is available at the URL http://lcweb.loc.gov/global/www.html

    Daniel LaLiberte's list of WWW resources is available at the URL http://union.ncsa.uiuc.edu:80/HyperNews/get/www.html

    Mecklermedia's Guide to Information and Resources on the Internet is available at the URL http://www.mecklerweb.com:80/webguide/resource.htm

    A Primer for Creating Web Resources is available at the URL http://www-slis.lib.indiana.edu/Internet/programmer-page.html

    WWW Icons and Clip Art

    A list of online resources of icons and clip art which can be used to produce HTML documents containing graphics is given below. Note, however, that before using graphics in HTML documents you should be aware of the additional loads which will be placed on network and servers.

    ftp://ftp.cica.indiana.edu/pub/win3/icons

    http://white.nosc.mil/images.html

    http://guinan.gsfc.nasa.gov/Alan/Richmond.html

    http://www.cli.di.unipi.it/iconbrowser/icons.html

    http://www.jsc.nasa.gov/~mccoy/Icons.index.html

    http://www.cs.yale.edu/homes/sjl/clipart.html

    WWW Conferences

    Conference proceedings from the first WWW conference, WWW '94, held at CERN on 25-27 May 1994 are available at the URL http://www.elsevier.nl/

    Further information about the second WWW conference Mosaic and The Web, held at Chicago on 17-20 October 1994 is available at the URL http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/ A searchable index of the papers is available at the URL http://www.verity.com/spidersearch.html

    The third WWW conference was held at Darmstadt, Germany on 10-14 April 1995. Further details are available at the URL http://www.igd.fhg.de/www/www95/www95.html

    The third WWW conference will be held in Boston, USA on 11-14 December 1995. Further details are available at the URL http://www.w3.org/hypertext/Conferences/WWW4.html

    Other Resources

    WWW Information

    Information about the World-Wide Web Initiative is available at the URL http://www.w3.org/

    Best of the Web

    The Best of the Web awards promote WWW to new and potential users and help information providers by demonstrating what can be done on WWW. The award winners and entrants are available at the URL http://wings.buffalo.edu/contest/

    WWW FAQ

    The WWW Frequently Asked Questions (FAQ) is available at the URL http://sunsite.unc.edu/boutell/faq/www_faq.html

    Entering The World-Wide Web: A Guide to Cyberspace

    Kevin Hughes' Entering The World-Wide Web: A Guide to Cyberspace is available at the URL http://www.eit.com/web/www.guide

    Information Superhighway in the UK

    Information about the Information Superhighway in the UK is available at the URL http://tin.ssc.plym.ac.uk/up.html

    Appendix 3 National UK Services

    Services

    The Bulletin Board For Libraries (BUBL) holds a wide range of information of interest to anyone involved with libraries in education. Further information is available at the URL http://www.bubl.bath.ac.uk/BUBL/home.html

    The Mailbase mailing list service run a WWW server which is available at the URL http://www.mailbase.ac.uk/

    The Micros Hensa service run a WWW server which is available at the URL http://micros.hensa.ac.uk/

    The Unix Hensa service run a WWW server which is available at the URL http://unix.hensa.ac.uk/

    CTISS run a WWW server which is available at the URL http://www.ox.ac.uk/cti/

    The Office for Library and Information Networking (UKOLN) runs a WWW server which is available at the URL http://ukoln.bath.ac.uk/UKOLN/home.html

    NISS is setting up a WWW server which is available at the URL http://www.niss.ac.uk/

    A TLTP specific Web Server is available at the URL http://www.icbl.hw.ac.uk/tltp

    The Social Sciences Information Gateway is available at the URL http://sosig.esrc.bris.ac.uk/

    CCTA, the UK Government computer agency, runs a WWW server which is available at the URL http://www.open.gov.uk/

    Directories

    A list of United Kingdom Based WWW servers is available at the URL http://src.doc.ic.ac.uk/all-uk.html A UK tourist guide is available at the URL http://www.cs.ucl.ac.uk/misc/uk/intro.html

    A UK sensitive map is available at the URL http://scitsc.wlv.ac.uk/ukinfo/uk.map.html This service is maintained by the School of Computing and Information Technology, University of Wolverhampton (email jphb@scitsc.wlv.ac.uk)

    WAIS Resources

    The following WAIS services are provided by NISS.

    NISS Bulletin Board

    A wide range of information of interest to varying sectors of the academic community. This service is available at the URL wais://gopher.niss.ac.uk/NISSBB

    World Factbook

    Basic details (population, climate, main industries and so on) for the countries in the World. Use a search term such as the name of a country to locate particular records. This service is available at the URL wais://wais.niss.ac.uk/World_Factbook

    Roget's Thesaurus

    The 1911 edition (enhanced with an additional 1,000+ words not included in the original version) of the ever-useful thesaurus of the English language. Use any word as your search term. This service is available at the URL wais://wais.niss.ac.uk/Roget

    JANET News

    JANET News contains material about the JANET computer network, such as registered domain names and addresses, and information about gateways to other networks. This service is available at the URL wais://news.janet.ac.uk/JANET.news

    CHEST Directory

    The CHEST Directory of software is available at the URL wais://wais.niss.ac.uk/CHEST_Directory

    Appendix 4 Conferences On WWW

    Bruce Altner (mailto:ari@clark.net), the Director of Technical Services of ARInternet Corporation has a vision for gatherings at the electronic meeting hall which combine the best features of the WWW (browsing, multimedia and hypertext capabilities, searching and information retrieval, file downloading and e-mail communication, to name just a few) within the format of the traditional poster paper session.

    Electronic Conferences and Workshops

    Here are some real life examples of Electronic Conferences and Workshops:

    ChemConf'93 is available at the URL gopher://info.umd.edu:901/11/inforM/Educational_Resources/Faculty_Resources_and_Support/ChemConference

    NASA High Alpha Conference IV (high angle of attack) is available at the URL http://www.dfrf.nasa.gov/Workshop/HighAlphaIV/highalpha.html

    The HIDEC Electronic Conference (the F-15 Highly Integrated Digital Electronic Control program) is available at the URL http://mosaic.dfrf.nasa.gov/Workshop/HIDEC/Conf.DIRS/.htmllinks/ConfWeb.html

    DL94:Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries is available at the URL http://atg1.wustl.edu/DL94

    On-Line Proceedings of ACL-94 (Association of Computational Linguistics) is available at the URL http://xxx.lanl.gov/cmp-lg/ACL-94-proceedings.html

    ...and its post-conference workshops is available at the URL http://xxx.lanl.gov/cmp-lg/ACL-94-post.html

    1st Electronic Conference in Computational Chemistry (ECCC) is available at the URL http://hackberry.chem.niu.edu:70/0/ECCCinformation.html

    Reviews of Electronic Conferences

    A discussion of the pros and cons of this type of online gathering, written by the ChemConf'93 organizer Dr. Tom O'Haver, is available at the URL gopher://info.umd.edu:901/00/inforM/Educational_Resources/Faculty_Resources_and_Support/ChemConference/BackgroundReading/OnlineConferencin.txt

    And as a wonderful example of self-referencing, a la Douglas Hofstadter's Godel, Escher, and Bach, see the URL http://www.automatrix.com/conferences

    An example of an "after-the-fact" online conference is available at the URL http://stardust.jpl.nasa.gov/igarss/

    TaTTOO '95

    TaTTOO '95 (Teaching and Training in The Technology of Objects) On-Line used state-of-the-art technology combining interactive multi-user virtual environments with the World-Wide Web to bring an International Conference and Trade Exhibition to the desktop. In the virtual conference objects such as delegates, rooms, personal business cards and leaflets could all be browsed on the Web. A virtual exhibition took place in TaTTOO/MOO, a virtual environment. The MOO was extended to make the objects in it Web-aware, so it was possible to browse the system using a Web client. TaTTOO '95 is available at the URL http://www.cms.dmu.ac.uk/Research/OTG/Online/live-announce.html

    Appendix 5 References

    Books

    "Spinning the Web: How to Provide Information on the Internet" by Andrew Ford, to be published by Van Nostrand Reinhold, New York (ISBN 1-850-32141-8) and International Thomson Publishing, London (ISBN 0-442-01962-9). The book describes how to run a web site, which covers creating material for dissemination via the Web and setting up and running a web server. It describes HTML in detail and includes a tear-out HTML reference card and a resource guide. This book is recommended by the author of this handbook.

    "Mosaic Quick Tour For Windows" by Gareth Branwyn, published by Ventana Press costs [[sterling]]7.95 (ISBN 1-56604-194-5). Further information available at the URL http://www.vmedia.com/vvc

    "The Internet via Mosaic and World-Wide Web" by Steve Browne, published by ZD Press costs [[sterling]]22.99 (ISBN 1-56276-259-1).

    "The World-Wide Web, Mosaic and More" by Jason J Manger, published by McGraw Hill costs [[sterling]]24.95 (ISBN 0-07-709132-9).

    "Teach Yourself HTML Web Publishing in a Week" by Laura Lemay, to be published by Sams' Publishing (ISBN 0-672-30667-0). This book discusses not only the various aspects of HTML, Web servers, gateways, forms, and imagemaps, but also focuses strongly on style and structure and navigation.

    "HTML For Fun and Profit" by Mary Morris, to be published by Prentice-Hall. It includes forms, clickable images, server includes, indexing, linking and basic formatting. It will have a CD-ROM with examples and tools on it. See the URL http://www.sun.com/smi/ssoftpress/

    "The Mosaic Handbook for the X Window System" by Richmond Koman and Paula Feguson, published by O'Reilly (ISBN 1-56592-095-3), "The Mosaic Handbook for Microsoft Windows System" by Richmond Koman, published by O'Reilly (ISBN 1-56592-094-5) and "The Mosaic Handbook for the Macintosh" by Richmond Koman, published by O'Reilly (ISBN 1-56592-096-1). These books, which cost [[sterling]]22 each, contain a CD-ROM (the X Window book) or a floppy disk which contains a copy of the Mosaic software.

    Magazines

    Many magazines are being published which cover various aspects of the Internet. The following list gives some of the main ones, including ones published in the UK.

    Web Week, the newspaper of Web technology and business strategy is available at the URL http://www.mecklerweb.com/mags/ww/wwhome.htm

    .net published by Fortune Publishing Ltd. Further details are available at the URL http://www.futurenet.co.uk/home.html or by sending email to netmag@futurenet.co.uk

    infoHighway ISSN 1355-2465. For further details send email to p.deacon@eurodollar.co.uk or david@pipex.net

    Wired. Further details are available at the URL http://www.hotwired.com/ For subscriptions details send email to subscriptions@wired.com


    About This Handbook

    This Handbook was produced using Word For Windows version 2. The graphics were captured using Paintshop Pro. Paintshop Pro was also used to reduce the colour depth and to alter the colour of the images so that they were more suitable for inclusion in the printed version of the Handbook.

    The Handbook was converted to HTML format using the RTFtohtml and RTFtoweb conversion programs.

    About The Author

    Brian Kelly is the Head of User Support, Computing Service, University of Leeds. He first came across the World-Wide Web (WWW) at a workshop on Internet tools organised by the Information Exchange Special Interest Group, University of Leeds on 9th December 1992. In January 1993 the Computing Service installed the CERN httpd server on its central Unix system - this was probably the first WWW service provided by a central service in the UK academic community.

    Following an unannounced visit from Robert Cailliau, one of the WWW co-developers from CERN in March 1993, the Computing Service became convinced of the importance of WWW. The Computing Service contribution to the University Open Day, held in May 1993, was centered on the World-Wide Web: for example the Open Day programme was available on WWW.

    Brian has given presentations about WWW at the universities of Aberdeen, Bangor, Bradford, Kent, Oxford, Sussex and Manchester Metropolitan University. He gave a poster presentation at the first WWW '94 conference in Geneva and gave a paper on Becoming An Information Provider on the World-Wide Web at the INET 94 / JENC 5 conference in Prague in June 1994. He ran a WWW Tutorial at the Network Service Conference in London in November 1994.

    Acknowledgments

    I would like to thank the following for their assistance and comments on this handbook:

    Bruce Altner, Nigel Bruce, John D Lewis, Chris Lilley, Jim Hobbs, Ken Hensarling, Roger Horton, Jon Knight, Inke Kolb, Martijn Koster, Paul Leclerc, Neal McBurnett, Sean Martin, Eric Morgan, George Munroe, Alan Richmond, Paul Sutton, Ton Verschuren, Anne Worden, Bruce Washburn.

    The author, of course, accepts responsibility for any errors in this handbook.

    Feedback

    The author welcomes constructive comments and feedback on this handbook, which should be sent to the email address B.Kelly@leeds.ac.uk Please note, however, that the author is unable to provide individual advice or assistance.

    Copyright

    Copyright (C) 1994 by Brian Kelly.

    All rights reserved. This work may be copied in its entirety, without modification and with this statement attached. Redistribution in part or with modifications is not permitted without advance agreement from the copyright holder.

    Copyright of WWW pages shown in this Handbook belongs to the individual or organisation which created the pages. Any copyright holder who wishes for an image to be removed from this Handbook should contact the author of the Handbook.