The World-wide Web

Transcription

454Computer Networks and ISDN Systems 25 (1992) 454-459North-HollandNetworked information servicesThe world-wide webT.J. Berners-Lee, R. Cailliau and J.-F. GroffCERN, 1211 Geneva 23, SwitzerlandAbstractBerners-Lee, T.J., R. Cailliau and J.-F. Groff, The world-wide web, Computer Networks and ISDN Systems 25 (1992)454-459.This paper describes the World-Wide Web (W3) global information system initiative, its protocols and data formats, andhow it is used in practice. It discusses the plethora of different but similar information systems which exist, and how the webunifies them, creating a single information space.We describe the difficulties of information sharing between colleagues, and the basic W3 model of hypertext and searchableindexes. We list the protocols used by W3 and describe a new simple search and retrieve protocol (HTFP), and the SGMLstyle document encoding used. We summarize the current status of the X11, NeXTStep, dumb terminal and other clients,and of the available server and gateway software.Keywords: global information; hypertext; world-wide web; networked information retrieval; application; browser; server.IntroductionThis p a p e r covers m a t e r i a l p r e s e n t e d o relicited by questions at the J E N C 9 2 c o n f e r e n c e .T h e d r e a m o f global h y p e r t e x t a n d its c o m i n g tofruition with W3 has b e e n d e s c r i b e d in [1] whichalso discusses the r e l a t i o n s h i p with o t h e r projectsin t h e field. T h e p r a c t i c a l i t i e s o f p u b l i s h i n g d a t aon the w e b a r e o u t l i n e d in [2], so t h e s e a s p e c t swill only b e s u m m a r i z e d here.The aimM u c h i n f o r m a t i o n is available t o d a y on t h enetwork, b u t most is not. W h e n an individuale n t e r s a new o r g a n i z a t i o n , or a new field, it isn o r m a l l y n e c e s s a r y to talk to p e o p l e , look onb o o k s h e l v e s a n d n o s e a r o u n d for clues a b o u t howthe p l a c e works, w h a t is new, a n d w h a t he o r shen e e d s to know.W h e n d a t a is available on the net, the a v e r a g ep e r s o n is not privy to it, but m u s t consult a" g u r u " w h o u n d e r s t a n d s the ins a n d outs o fa n o n y m o u s F T P , telnet, stty, a n d the c o m m a n dsystems of t h e v a r i o u s i n f o r m a t i o n servers.T h e aims o f the W3 initiative a r e twofold:firstly to m a k e a single, easy u s e r - i n t e r f a c e to alltypes of i n f o r m a t i o n so that all m a y access it, a n dsecondly to m a k e it so easy to a d d new i n f o r m a tion t h a t t h e q u a n t i t y a n d quality of o n l i n e inform a t i o n will b o t h increase. A l r e a d y , most i n f o r m a tion of value exists in s o m e m a c h i n e - r e a d a b l eform: if we can solve the p r o b l e m s o f h e t e r o g e n e ity of p l a t f o r m , d a t a f o r m a t a n d access p r o t o c o lt h e resulting universe of k n o w l e d g e will c o n s i d e r ably e n h a n c e o u r w o r k i n g t o g e t h e r .The W3 modelCorrespondence to: Mr. T.J. Berners-Lee, CERN, 1211Geneva 23, Switzerland. Tel. ( 41) 22 76 73 755, Fax ( 41)22 76 67 7155, E-mail: timbl@info.cern.ch.Elsevier Science Publishers B.V.This is d e s c r i b e d at m o r e l e n g t h in [1] b u t isbasically as follows. A t any time, t h e u s e r is

T.J. Berners-Lee et al. / The world-wide webreading a document. Two navigation operationsare provided: one operation is to follow a linkfrom a particular piece of text to a related document or part of a document. The other operationis to query a server with a text string. The resultof either operation is the display of a new document. The query operation is only provided forcertain documents: those which represent a searchfacility provided by a remote server. The querytypically returns a synthesized hypertext list ofitems, each linked to some document whichmatches the query.The power of hypertext links, whether generated automatically or authored by a human being,to represent knowledge in a way easy to follow bythe reader cannot be replaced by powerful querylanguages. Conversely, the power of special-purpose query engines to solve user-posed queriescannot be replaced by a system using only hypertext links. The two operations are both found tobe necessary.One view encompasses all the systems.Although a simple model, a significant point inits favour has been its ability to represent almost455all existing information systems. This gives usboth a great start in putting existing informationonline, and also confidence that the model is onewhich will last for future information systems.(Some information systems include complexsearch functions which have many fields to fill in.However, when the h u m a n - c o m p u t e r interactionis analysed, it can be broken down into a sequence of choices and user input. This leads ingeneral to a simplified user interface and a directmapping onto the web.)Systems which have been mapped onto theweb to date include WAIS, Gopher, V M S ( T M ) /Help, FTP archives, The " H y p e r - G " system fromthe Technical University of Graz, Gnu Texlnfo,unix manual pages, unix "finger", and severalproprietary documentation sytems. A W3 clientprovides a seamless view of all this data using asimple user interface. For the reader, therefore,W3 solves the problems of the over-abundance ofdifferent information systems.An information provider, however, may wonder which management system to use on hisserver. There is no single recommended W3Tim Berners-Lee, before coming to CERN, worked on, among other things, document production and textprocessing. He developed his first hypertext system, "Enquire", in 1980 for his own use (although unawareof the existence of the term HyperText). With a background in text processing, real-time software andcommunications, Tim decided that high energy physics needed a networked hypertext system and CERNwas an ideal site for the development of wide-area hypertext ideas. Tim started the WorldWideWebproject at CERN in 1989. He wrote the application on the NeXT along with most of the communicationssoftware.Robert Cailliau, who was formerly in programming language design and compiler construction, has beeninterested in document production since 1975, when he designed and implemented a widely useddocument markup and formatting system. He ran CERN's Office Computing Systems group from 1987 to1989. He is a long-time user of Hypercard, which he used to such diverse ends as writing trip reports,games, bookkeeping software, and budget preparation forms. When he is not doing WWW's publicrelations, Robert is contributing browser software for the Macintosh platform, and analysing the needs ofphysics experiments for online data access.Jean-Francois Groff provided useful input in the "design issues". During his stay at CERN as "cooperant",J-F joined the project in September 1991. He wrote the gateway to the VMS Help system, worked on thenew modular browser architecture, and helped support and present WWW at all levels. He is now portingthe communications code to DECnet in order to set up servers for physics experiments.

456T.J. Berners-Lee et al. / The world-wide webever exists visible. However, it is hoped that moresimple tools for automatically publishing files andmail archives as indexed hypertext will be available in the near future.It is important to emphasize that although theuser model is hypertext, the data published doesnot have to be prepared in hypertext form. Mostdata on the web is plain text, accessed by automatically generated hypertext links (following forexample the directory tree in which the files arestored) or an automatic indexing system. Theprospective information provider need not befrightened by the term "hypertext". It is true,however, that he may in the end wish at least hisserver. Fortunately, it is normally true that if hehas a relatively organized approach to keeping hisdata, he can generally adopt new tools withoutthe user being very aware that things are different. For example, he can run a new indexingsystem, or generate a new browsable hierarchy,on top of his existing data. He should pay closeattention to the easy collection or contributionand update of the data, as this is the step whichwill ensure its quality. No amount of work on theaccess software can make up for a lack of accuracy or timeliness of the raw data. The W3 project does not restrict the choice of databases forinformation management; it simply makes what-Browsersdumb PCMacXNeXTddressingscheme Commonprotocol Formatnegotiation iInternetNewsServers/GatewaysFig. 1. The worldzwideweb client-serverarchitecture.

T.J. Berners-Lee et al. / The world-wide weboverview document tO be hand-written hypertextin order to maximize the impact and communication with the reader.ProtocolsW3 uses a client-server architecure (Fig. 1), toallow complex presentation facilities to be provided by the client, and powerful search and datamanipulation algorithms to be provided at thesite of the data by a server. The protocol neededto connect server and client is a simple statelesssearch and retrieve protocol. In practice the W3clients all include the ability to use various otherprotocols, including FTP [3], G o p h e r [4], local fileaccess, and NNTP [5] for internet news. Thisgives each W3 client access to several alreadyexisting worlds of information. The documentaddressing scheme allows names to be given toany document, file, directory, newsgroup or article in these systems. This means that a hypertextdocument may be written or generated whichincludes links to these objects. Other worlds ofinformation such as that of WAIS servers [6] aremade available by gateways which perform themapping of that world into the web.Ideally, a protocol used by W3 has the following features:Document retrieval by name.- Index search by name of index plus readersupplied text.Stateless operation. Rapid traversal of linksbetween documents on different servers makesthe concept of a session between client andserver inappropriate.Minimum number of round-trips. As technology advances, processing time may continue to shrink, leaving long-distance round trip delaysthe dominant factor in response time.Pipelining allowing the first part of a documentto be displayed (or relayed through a gateway)before the whole document has been transferred. This is easy when using a byte streamoriented protocol.To achieve this, a simple new protocol, H T T P(Hypertext Transfer Protocol) was defined in theconventional Internet style. This runs over T C P /IP, using one T C P / I P connection per search orretrieve operation.457The initial form of the protocol involves theclient sending a simple ASCII request for a document: the command " G E T " and the document'sname. The response to this is either a hypertextfile marked up in SGML, using a specific document type known as " H T M L " , or a plain textdocument with an H T M L header. In the newversion of the protocol (under development) anSGML-formatted request object includes detailsabout the client capabilities. The client capabilities include a weighting, in the form of penaltypoints for information loss and time taken forconversions at the client end. This allows theserver to make a balanced decision to send aparticular format when several are available, minimising the information degradation and extradelay associated with format conversion.The returned document contains an H T M Lheader, and a body which may be in any notationor combination of encoding schemes which theclient has declared itself able to handle. Cachingof converted documents at client or server side isobviously a useful technique which could in principle be applied as an optimization.DataencodingSGML was chosen as the base format for thereturned document because it is an accepted encoding scheme and has traditional use in documentation. It is flexible enough to allow differentobject types to be encoded, and non-SGML encodings to be encapsulated. SGML was not chosen out of any particular aesthetic appeal orinherent cleanliness.In general the philosophy of the W3 initiativeis not to force any more standards or commonpractices upon the world than absolutely necessary. The document naming and addressingscheme is considered essential, as its flexibilityallows us to include new protocols and namespaces as they appear. The protocols are nottherefore seen as so essential, allowing continuing research and development. Transition strategies (using gateways for example) will allow theintroduction of new protocols, and in a "market"of several coexisting protocols, the technicallysuperior ones will hopefully achieve the mostwidespread use.

458T.J. Berners-Lee et al. / The world-wide webThe data format negotiation of HT-FP is designed to allow the same market forces to exist indata encoding schemes and document types.When two coworkers in the same field or organization are using the same tools, one imagines thatthe negotiation will allow them to exchange dataat a high level, with a high level of functionality.For example, genetics workers may be able toexchange coded forms of D N A sequences whichmay be viewed and operated on with specialtools. On the other hand, a worker in a verydifferent environment will still have access to thisdata, even if as a last resort it has to be renderedinto a plain ASCII text on his terminal. It will,after a while, become obvious which formats arebecoming popular, and a snowball effect shouldcause their rapid adoption by both servers andclients.The W3 clients will not themselves be able tohandle all formats, but will be configurable tolaunch other applications.All W3 clients are as a minimum required tohandle plain text and H T M L . Because H T M L isa high-level mark-up, it allows the same logicalstructured text to be represented optimally whatever the capabilities of the client platform. Forexample, to highlight headings, a dumb terminalbrowser may use capital letters when an Xwindows browser uses a different font.W3 softwareSoftware provided by the various contributorsto the W3 project includes browsers, servers andgateways.A prototype hypertext editor was made usingthe NeXTStep(TM) environment. This has beenfrozen for almost a year at the time of writing, soit lacks many features of other browsers, but isstill the only hypertext editor allowing links andannotations to be directly inserted by the readeror author.The simple line mode browser "www" originally written by Nicola Pellow has now become ageneral information access tool. As well as operating in interactive mode, it can be used from thecommand line to retrieve any object on the webby name, or query any index. It can return formatted text or H T M L source. When used as afilter, it becomes a text formatter. When used asa server, it becomes a gateway.For the X l l window system, two browsersexist, with different look and feel. The "Erwise"browser was developed by a team at the HelsinkiTechnical University, and the "Viola" browser(demonstrated at this conference) by Pei Wei ofBerkeley, California.A browser for the Macintosh environment isunder developmentat C E R N ,a n d anM S D O S / W i n d o w s browser may also be produced. As the status of this software is constantlybeing updated, it is wise to check the web for thelatest situation.On the server side, a simple file server exists.There are also examples of servers which havebeen set up using simple shell or perl scripts (see[2]). Contributions of new software are alwayswelcome, as are new servers.Making data availableIf you look at existing information in or aboutyour own institute, or a particular field, maybe itoccurs to you that it would be useful to others tomake it generally available. There are a numberof ways to do this, and it can be done in a fewhours or a few days depending on the complexityof the data you have and the facilities you want tooffer.If you have an existing database, you couldmake a simple script to make this available on theweb. The simplest way is to run an existing shellinterface to the database from your server script.A more sophisticated way is to take that userinterface program, and our skeleton server code,and merge them into a single C program toprovide the data. This may be more powerful, ormatch better your style of working. A solution weused under V M / X A was for a basic daemonprogram to call a command file ( R E X X exec).If you have no database, but some files ofinteresting information, then running the basicH T I ' P file daemon (httpd) will allow them to bemade visible. In this case you can make by hand(or generate if you really have no time) a hypertext list of files for users to browse through.Theformat of H T M L is described in the web. Youcan also pick up the source of any document you

T.J. Berners-Lee et al. / The world-wide webfind on the web with the -source option to thewww (line m o d e browser) c o m m a n d .459questions and receive feedback, ideas, requestsand suggestions by email at: www-bug@info.cern.ch.Research directionsT h e W3 project is not a research project, but apractical plan to implement a global informationsystem. However, the existence of the web opensup m a n y interesting research possibilities. A m o n gthese are new h u m a n interface techniques form a n a g i n g a large space and the user's view of it,and automatic tools for traversing the web andsearching indexes in pursuit of the answers tospecific questions.More informationAll the technical information about the W3initiative is on the web. Y o u can read it by telnetto info.cern.ch which gives you the simplest formof browser. Better, pick up a browser by anonymous F T P to the same host (directory / p u b /w w w / s r c ) and run it on your own machine. T h ewww t e a m at C E R N will be happy to answerReferences[1] T.J. Berners-Lee et al., World-wide web: the informationuniverse, in: Electronic Networking: Research, Applicationsand Policy, Vol. 2, No. 1 (Meckler, New York, 1992)52-58.[2] T.J. Berners-Lee et al., World-wide web: an informationinfrastructure for high-energy physics, in: D. Perret-Gallix,ed., Proc. International Workshop on Software Engineeringand Artificial Intelligence for High Energy Physics, LaLonde, France, January 1992.[3] J. Postel and J. Reynolds, File Transfer Protocol (FTP),lnternet RFC-959, 1985.[4] M. McCahil et al., The Internet Gopher: An InformationSheet, in: Electronic Networking: Research, Applicationsand Policy, Vol. 2, No. 1 (Meckler, New York, 1992)67-71.[5] B. Kantor and P. Lapsley, A Proposed Standard for theStream-based Transmission of News, lnternet RFC-977,1986.[6] B. Kahle et al., Wide Area Information Servers: An Executive Information System for Unstructured Files, in: Electronic Networking: Research, Applications and Policy, Vol.2, No. 1 (Meckler, New York, 1992) 59-68.

ment type known as "HTML", or a plain text document with an HTML header. In the new version of the protocol (under development) an SGML-formatted request object includes details about the client capabilities. The client capab