Where World Wide Web Went Wrong

Andrew Pam, Xanadu Australia, P.O. Box 477 Blackburn VIC 3130
Email avatar@aus.xanadu.com
URL: http://www.glasswings.com.au/attendants.html
Abstract:
We are all aware by now of the phenomenal success of the WWW. In this paper I would instead like to examine some of the limitations inherent in the current WWW design and implementations and some possible solutions.
Keywords:
Docuverse Hyper-G Hypermedia Information Systems Transclusion Web WWW Xanadu

Contents

Introduction

Everyone's opinions are a product of their experiences, so I would like to start by explaining a little bit about my background in the field of hypermedia. The words "hypertext" and "hypermedia" were coined by my friend Ted Nelson in a paper to the ACM 20th national conference in 1965, before I was even born! Although I had come across occasional articles Ted had written for Creative Computing magazine, my first exposure to his legendary Xanadu project did not occur until 1987 when I purchased the Microsoft Press second edition of his classic book Computer Lib / Dream Machines [Nel87], which outlined his idea of a "docuverse" or universal library of multimedia documents.

As an avid science fiction reader, my imagination had already been captured by this idea of a universally accessible computer storage and retrieval system as presented in the 1975 novel Imperial Earth by Arthur C. Clarke [Cla75]. But here was someone actually involved in trying to create such a system. I immediately sent off a US$100 donation to Project Xanadu to reserve a Xanadu account name, and also purchased the 1988 edition of Ted's self-published book Literary Machines [Nel88] and the Technical Overview video describing the Xanadu project in detail.

At this time I was already heavily involved in online communications, both professionally (designing and implementing a communications protocol for caravan park bookings across Australia, and later integrating it with the CICS mainframe-based booking system at the RACV) and personally, running a computer bulletin board system which eventually grew to a network of bulletin boards spanning New South Wales, Victoria and South Australia.

In 1989 I met Katherine Phelps, my partner, who incidentally is also giving a paper at this conference. She is a writer and publisher and I suppose it was inevitable that we should combine our interests and go into online publishing. After publishing traditional paper books and a lack of success convincing banks of the prospects of a magazine on floppy disks, we decided to try publishing on the Internet instead.

Although we were on the Xanadu beta team, the software was in the process of a complete rewrite commenced in 1988 and was not at a usable stage. Our intention was always to upgrade to Xanalogical (Xanadu-capable) software, but we decided to start with Gopher and World Wide Web (which, with the advent of Mosaic, had just started to become popular). Tim Berners-Lee had actually been aware of the Xanadu ideas in the design of WWW, and he had incorporated Ted's basic 1965 concept of hyperlinks, though not the later refinements. Autodesk were funding the Xanadu research project during this period but in 1993 I heard the news that they had dropped all of their research projects not directly connected with their core business of Computer Aided Design. I immediately contacted Ted and made arrangements to visit him in San Francisco.

As a consequence of those meetings Katherine and I officially became Xanadu Australia, the first licensees of the Xanadu technology. We organised Ted's speaking tour of Melbourne and Sydney in early 1994 and then began organising the necessary support and facilities to set up our own research and commercial online publishing ventures.

In addition to my original computer programming and consultancy business under the name of Serious Cybernetics, I am presently also a partner in Glass Wings and in Xanadu Australia and a system administrator for CinEmedia, a project of the State Film Centre of Victoria which houses the new SFCV / RMIT Annexe where Katherine is presently undertaking her PhD in Animation and Interactive Multimedia.

I have spent the last few years examining as many Internet resource discovery, information delivery and computer-mediated communication applications as possible on as many platforms as possible, with particular interest in Hyper-G, a hypermedia system from the Graz University of Technology which is directly inspired by Ted's vision as expressed in Literary Machines.

So to the subject matter of this paper. I have had the opportunity to examine and compare much of the software presently used on the Internet, I have a good background in hypermedia designs and concepts, and I felt that with the prevalence of hype about the World Wide Web it would be valuable to take a critical stance and look at what I believe to be its major deficiencies, together with various initiatives and possible solutions to overcome them.

Scalability

This is one of the most fundamental design issues for any network information retrieval (NIR) system, since thanks to the phenomenal growth of the Internet there are alrady millions of people using such software and millions more coming online over the next few years. Ted's goal for Xanadu from the outset was to support hundreds of millons of users all over the world (and possibly in orbit) by the year 2020, a goal which he called the "2020 vision".

This took up much of the design team's time in the 1960s and 1970s, when it was by no means clear how this could be accommodated. Today we have, if not the answers, some pretty good working models. Unfortunately, WWW isn't really one of them; it suffers from a crucial flaw, in that any given piece of information (document) is served from a single location.

This is the cause of many problems. A single point of failure results in many documents being periodically (and sometimes permanently) inaccessible due to network or system problems. This is probably one of the most frustrating features of the WWW, since it makes the system as a whole unreliable. One of the key requirements of any NIR tool is reliable access to documents on request, since users expect to be able to obtain information whenever they require it; this also alleviates the need to store information locally in case of retrieval difficulties!

Bandwidth issues

Apart from the reliability issues, another major problem is the requirement for sufficient bandwidth to deliver information to everyone who requests it. In the centralised WWW design, this is one of the most difficult parts of operating a popular site, particularly in countries where the network infrastructure to support large numbers of simultaneous requests from all over the world is prohibitively expensive or simply not available.

Even in cases where fairly substantial bandwidth has been provisioned for a WWW server, however, there can still be occasions on which there is a sudden peak in demand. These are called "flash crowds" after a 1973 Larry Niven science fiction story describing how the advent of teleportation would cause sudden influxes of thousands of people to locations where anything of interest seemed to be occurring, often leading to unexpected riots. This issue was raised by Jeff Duntemann in his "END." column in the June/July issue of PC Techniques [Dun95], although he had no answer to the problem, and also discussed by Frank Kappe in a recent paper about Hyper-G [Kap95].

Cacheing and mirroring

Some WWW servers have attempted to address these problems by providing a proxy cacheing facility, where frequently requested documents only have to be retrieved once by any given site. Unfortunately, this still requires users to actively choose to use the proxy cacheing service, which many do not; it suffers from cache consistency problems, especially with frequently updated documents; and it does not solve the flash crowding problem because the onset of the phenomenon will still cause thousands of proxy cache servers to request the same documents from the original site of publication. Additionally, this obscures the usage statistics which are of special importance to commercial sites.

A better approach would be to automatically make copies or "mirrors" of popular documents on other servers. Unfortunately there is currently no mechanism for indicating that such mirrors exist or where they might be located short of manually placing hyperlinks on a page to list them! This is primarily caused by a confusion between resource identifiers and locators. The Xanadu designs have always started from the premise that every document in the system needs to have its own unique identifier, regardless of the location where it is stored. The Hyper-G system uses globally unique object identifiers for the same reasons. However, documents in the WWW are identified by a Uniform Resource Locator (URL) which only describes the document's location - since it does not uniquely identify the document, it provides no mechanism to determine where other copies of the same document might be located.

An Internet Engineering Task Force (IETF) Working Group was formed to address proposals for Uniform Resource Identifiers (URIs) and Uniform Resource Names (URNs) and how they might be resolved to a list of URLs for the identified document. However, the group was unable to reach consensus and was closed, with new working groups to be formed to address specific issues. I believe that in any case attempting to retrofit global identifiers and resolution systems to the existing locator-based WWW system, while a worthy cause, will require efforts of considerable magnitude and it will probably be far simpler to move towards what the Hyper-G team call "second-generation" NIR tools such as Hyper-G itself which are fundamentally based on unique identifiers and are fully backwards compatible with existing WWW clients.

I am not sure if I can whole-heartedly adopt the term "second-generation", since Xanadu was the forerunner of the present "first-generation" WWW and already possessed the attributes ascribed to second-generation systems. However, since the Xanadu system has never been widely deployed for public use it can perhaps be considered on its own merits as a separate strand of NIR research.

My own suggestion for a quick work-around to provide transparent mirroring capabilities to the WWW is to add a simple enhancement to the Domain Name System (DNS) name resolution procedure used by WWW clients. I propose that clients should check for multiple address records ("A records") and record which addresses respond fastest. This simple change would allow server administrators to designate official mirror sites by simply adding additional A records to the DNS entry for their server. This technique is already in use by Mail Transfer Agents (MTAs) such as sendmail used to transfer Internet email and could surely be adopted for use with WWW.

Distributed file systems

The best solution to these problems is to move to a distributed file system (DFS). Apart from the original Xanadu work on a log-based DFS, this has recently become an area of great research interest. In a 1994 paper "Xanalogy: The State of the Art" [Pam94] I list Prospero, AFS, Mungi, Sprite, Plan 9, DASH, GAFFES and DFS925.

While it is possible to use a DFS with the WWW by using a "file:" URL, this requires that clients have access to the same DFS which unfortunately is rarely the case except within individual organisations. This problem arises because the DFS is being accessed directly by the WWW client software, which can be solved by instead implementing the DFS within the server, the approach taken by Hyper-G.

Symmetry

Another significant problem with the current WWW implementation is the design of the hyperlinks. They are embedded within the documents themselves and are unidirectional and univisible. That is, they can only be followed in one direction and can only be seen from the originating end. This makes link maintenance a nightmare, compounded by the lack of unique document identifiers which significantly increases the frequency with which destination documents change their URL.

Bivisible and bifollowable links have been part of the Xanadu design for a long time, but HyperTed (created by Dr. Adrian Vanzyl of Monash University Medical Informatics) and Hyper-G are the first products I have seen to implement them elsewhere. Naturally they cannot be stored within the document itself (difficult in any case for other media types such as sound, graphics and video) because any given document could easily become the target for any number of links which might entirely outweigh and obscure the document's actual contents! They must therefore fall into the realm of externally stored metadata, which is exactly how the Xanadu and Hyper-G systems treat them. Hyper-G actually creates embedded links from this metadata on the fly when an HTML document is retrieved.

Versions and Alternates

Many documents evolve over time, either through revision or occasionally branching into alternative versions of the same document. An extremely useful facility barely supported by current software (and here I include stand-alone desktop applications as well as NIR tools) would be to provide a mechanism for maintaining multiple versions of the same document, preferably without duplicating the storage required for unaltered material. Version control systems do exist, but largely for use by computer programmers as source code management tools rather than as part of a general-purpose filesystem. This is one part of the Xanadu vision yet to come to fruition.

Historical context

A related problem is that not only are earlier versions of documents usually superceded by revised versions, thus making the original version inaccessible, but often documents are removed from circulation entirely, perhaps because a WWW server has ceased operating or simply because space is no longer available for those documents (a problem common to periodicals). This makes it impossible to access them for future reference and works against the hyperlinking facility of NIR tools. This issue of permanent archival of electronic documents is very important and is being addressed by many library and archival organisations. It is also important that information should be published using data formats which are open standards and easily amenable to format conversion in future as standards change. This is one of the motivations of SGML and one of the benefits of SGML-based document formats such as HTML (native to WWW) and HTF (native to Hyper-G).

Document inter-comparison

Ted believes that the ability to compare documents for their similarities and differences is one of the most important tools that computers can offer us. Unfortunately, many of his early designs such as Qframes (where adjacent window borders indicated correspondences) and lines drawn between screen windows are yet to be widely implemented. This is probably largely due to the dominance of the Xerox PARC windowing model and the prevalence of "user interface police" requiring that window boundaries are sacrosanct and inviolable. However, there is apparently an OS/2 program called PMDIFF that implements the latter comparison facility. I am not currently aware of any NIR tools that provide this sort of function.

Metadata

Another issue of concern to librarians is the storage and interpretation of document metadata -- information about the document itself such as its authorship, copyright status, date of publication and so forth. The IETF URI-WG proposed the creation of Uniform Resource Characteristics (URCs) to accommodate this information. Metadata can also address the social and political problem of censorship currently prominent in public discussion of the Internet. Systems such as SurfWatch are ineffective because they discriminate for or against documents on the basis of keywords in the URL or title, which may not accurately reflect the content of the document (for example, "The Sex Life of Plants" or "Physical Education for Girls"). Furthermore, the selection process is performed by the company rather than by each individual viewer according to their own views and preferences.

The Interpedia project, which aims to create a new encyclopaedia freely available over the Internet, has proposed a Seal of Approval (SOAP) concept which would permit any person or organisation to annotate documents to indicate their approval or disapproval of the material. Each document could bear any number of SOAPs, allowing a broad range of opinions about each document to be expressed (for example, by different religious bodies or national censorship boards). SOAPs could be implemented using public annotations.

Live interaction

The services presently available on the Internet can be classified into two major categories: NIR tools and real-time communication tools. While the WWW was designed as a NIR tool, there have been several initiatives to support real-time communication facilities. WebChat provides a real-time multi-user communication facility using the WWW. Sensemedia, provisionally licensed as Xanadu America, have created a MOO (a multi-user textual virtual environment) which also acts as a WWW server. They call this system the "WOO". Waxweb is another similar system which additionally incorporates Virtual Reality Markup Language (VRML) into the MOO server. Ubique's Sesame and Hyper-G also provide mechanisms for users to communicate with each other while browsing the web.

Spatial dimensions

Speaking of Virtual Reality, one idea especially popular since William Gibson [Gib84] gave us the term "cyberspace" is to change the way we interact with information on our computer screens from a flat two-dimensional "desktop metaphor" to a three-dimensional world. There has been some work on representing WWW documents and hyperlinks in 3D, but this task is made considerably easier by the Hyper-G architecture of external link metadata as demonstrated by the Hyper-G "Information Landscape". [AKM95]

Transclusions

"Transclusion" is a term introduced by Ted Nelson to define virtual inclusion, the process of including something by reference rather than by copying. This is fundamental to the Xanadu designs; originally transclusions were implemented using hyperlinks, but it was later discovered that in fact hyperlinks could be implemented using transclusions! Transclusions permit storage efficiency for multiple reasonably similar documents, such as those generated by versions and alternates as discussed above.

WWW currently permits images to be transcluded using the <IMG> tag, but strangely does not support any other media types. Some support for text transclusion has been added in the form of a "server side include" facility in some WWW servers, but this is a work-around with limited use.

Transclusions also highlight some of the intriguing new legal issues raised by hypermedia technology. If someone takes a copy of an image and places it on their WWW server without permission, this is clearly a breach of copyright. However, if they merely transclude the image, it is still being retrieved directly from the original site but is now being displayed in a completely new context, which probably does not breach copyright law but may raise "droit morale" (moral rights) issues.

This is another reason why links (here including transclusions) need to be bivisible and bifollowable, not only for maintenance reasons as discussed above but also to permit creators to monitor the context in which their material appears and the uses to which it is being put.

Transcopyright

This leads me directly to Transcopyright, the Xanadu solution for business on the Net. Ted Nelson has proposed a new copyright doctrine called "Transcopyright" [Nel95] in the same way that Bob Wallace created "Shareware". Fundamentally, the proposal is that copyright holders choosing to publish on a hypermedia system supporting bivisible and bifollowable transclusions must, under the transcopyright doctrine, explicity grant permission for anyone to transclude and thus reuse their material in any way and in any context so long as it is purchased or obtained, as directed by the rightsholder, by each recipient. Naturally, using the material in any other medium falls outside the terms of the doctrine and is subject to separate agreement.

If a mechanism is in place to permit the system to charge for documents, this would permit copyright holders to be assured of their requested royalties on every use of their information, whether direct or by transclusion. Partial use of documents could be paid pro-rata. Considering the popularity of clip-art, musical "sampling" and collage art, this could rapidly become the ideal market for information of all kinds, especially entertainment content.

Money

The major remaining issue to be resolved before this can become a reality is the difficulty of currency conversion when transactions are carried out in a global market, especially transactions for very small sums (possibly as little as fractions of a cent!) There are already a number of systems for exchanging money on the Net, principally Digicash and systems based on traditional credit cards. However, Digicash emulates real cash so closely that it suffers from all the same drawbacks (having to have the right change in your electronic wallet!) and the credit card systems are no use to people who don't have a credit card. None of these systems are well suited to very small transactions. Katherine Phelps discusses our thoughts on possible solutions in her paper "You Think This Is a Revolution -- You Ain't Seen Nothing Yet." [Phe95]

Conclusion

Despite the various problems and limitations of WWW outlined in this paper, it has clearly been of tremendous benefit to the way information is stored and transmitted in our society. I look forward to participating in the further evolution of these tools as they continue to change the way we entertain ourselves and do business.

Further information

Xanadu
http://www.aus.xanadu.com/xanadu/ or http://www.xanadu.net/xanadu/
Hyper-G
http://hyperg.iicm.tu-graz.ac.at/ or http://hmu1.cs.auckland.ac.nz/
The IETF URI-WG
http://www.ics.uci.edu/pub/ietf/uri/
The Interpedia project
gopher://twinbrook.cis.uab.edu/1interped.70
WebChat
http://www.irsociety.com/webchat.html
Sensemedia
http://www.sensemedia.net/papers/
Waxweb
http://bug.village.virginia.edu/
VRML
http://vrml.wired.com/
Sesame
http://www.ubique.com/
Digicash
http://www.digicash.com/

References

[AKM95]
Keith Andrews, Frank Kappe, and Hermann Maurer, "The Hyper-G Network Information System", Graz: J.UCS vol 1 No. 4 28 Apr 1995 (also available at ftp://iicm.tu-graz.ac.at/pub/Hyper-G/papers/dms94.ps)
[Cla75]
Arthur C. Clarke, "Imperial Earth", London: Gollancz 1975
[Dun95]
Jeff Duntemann, "Corri, the Comet, and the Child-Proof Cap", page 112 PC Techniques June/July 1995
[Gib84]
William Gibson, "Neuromancer", New York: Ace 1984
[Kap95]
Frank Kappe, "A Scalable Architecture for Maintaining Referential Integrity in Distributed Information Systems", Graz: J.UCS vol 1 No. 2 28 Feb 1995 (also available at ftp://iicm.tu-graz.ac.at/pub/Hyper-G/papers/p-flood.ps)
[Nel87]
Theodor Holm Nelson, "Computer Lib / Dream Machines", Redmond: Microsoft Press 1987
[Nel88]
Theodor Holm Nelson, "Literary Machines 88.1", self published 1988
[Nel95]
Theodor Holm Nelson, "Transcopyright: Pre-Permission for Virtual Republishing", forthcoming in Communications of the ACM
[Pam94]
Andrew Pam, "Xanalogy: The State of the Art", privately circulated
[Phe95]
Katherine Phelps, "You Think This Is a Revolution -- You Ain't Seen Nothing Yet", forthcoming in proceedings of the 1995 Asia-Pacific WWW Conference