Comparative Analysis Of Open Source Digital Library . - Semantic Scholar

Transcription

DESIDOC Journal of Library & Information Technology, Vol. 38, No. 5, Sept 2018, pp. 361-368, DOI : 10.14429/djlit.38.5.12425 2018, DESIDOCComparative Analysis of Open Source Digital Library Softwares: A Case StudyLakshmi Verma* and Nishant KumarDRDO-Defence Scientific Information & Documentation Center, Delhi – 110 054, IndiaE-mail: laxmi@desidoc.drdo.inABSTRACTThe exponential growth in data generation and subsequent transformation into knowledge has created hugerepositories of knowledge in the libraries. This has revolutionalised the methods and techniques to retrieve therelevant and useful information for the users. The growth of Information and communication technology (ICT) hasfacilitated into achieving this. In this paper, a study of three open-source digital library management software hasbeen presented which collects and disseminates information for library-users. This analysis involves the study andcomparison of related software documents and respective technical manuals. Based on the results of the comparison,the implementation of Digtial Library Management Software at DESIDOC has also been dealt in details.Keywords: Open source; Digital library; Digital library software; DSpace; GSDL; Greenstone; EPrints1.IntroductionA place, where collection of information resources isstored in print and other forms in an organised and accessiblemanner for print or study is referred to as Library. As definedby International Organisation for Standardisation a library is“irrespective of the title, any organised collection of printedbooks and periodicals or of any other graphic or audio-visualmaterials, and the services of a staff to provide and facilitate theuse of such materials as are required to meet the informational,research, educational or recreational needs of its users”1.Digital Library is a type of information retrieval system wherethe information is stored in digital format which can be accessedwithin network of computer users2. It uses online repositorieswhich can store the textual information systematically andcan be accessed by users 24X7. There are various such digitalrepositories available, which may be open source or proprietary.Open source describes the method of software development,which uses the power of review and transparency of distributedpeer-to-peer progression. Here the codes of software areavailable in open domain which can be customised by therespective users. This technique helps in providing highquality software through high reliablity, low cost, flexibilityand end of traditional seller lock-in. Since, these open sourcesoftware come under “Open Source free license”, it allows thedevelopers / users to change, improve and distribute softwaremany times.DESIDOC is the central information centre of DRDOwhich contains various types of information repositories todisseminate S&T information digitally to the DRDO users. Tocreate and manage these information repositories, a suitabledigital library software, DLS (sometimes also called digitalReceived : 7 December 2017, Revised : 20 August 2018Accepted : 28 August 2018, Online published : 5 September 2018library management software) was required. Due to this, itwas necessary to analyse the best-suited DLS for DESIDOCuse. This resulted in the comparison of the three most popularDLS available today i.e. DSpace, GSDL and Eprints. Basedon the comparison, DSpace was found to be the most suitableLMS in DESIDOC scenario and hence was used to develop theInstitutional Repositories. The repositories which were earlierrunning on different Library Management software were alsomigrated into DSpace platform to create a common unifiedInstitutional Repository for DESIDOC.2.MethodologyTo compare the three digital library softwares (DLS),various review papers on related subject were analysed. Alsothe technical details and complete specifications were alsoconsidered through the technical manuals available on theofficial websites of the three DLS. To compare the practicalaspects, the frontend and backend of DSpace, GSDL andEPrints were also analysed to find their suitability with specificrequirements of an information repository. The frontend andbackend of DSpace are JSP and PostgreSQL where as forGSDL it is Perl/ Java and GSDL’s own database. For EPrintthe frontend and backend are Perl and My SQL/PostgreSQL/Oracle respectively. So DSpace becomes a default choice fororganisations familiar with JSP and PostgreSQL. Furthermore,there are many more adavantages in DSpace which make it asuitable choice for implementation of IR in an organisation.3.Digital Library ManagementSystemsThe Open Source Digital Library software are thesoftware applications that help in creating and presentinginformation repositories. The repositories built with the help of361

DJLIt, Vol. 38, No. 5, sept 2018these Digital library management systems can be searched andbrowsed based on Metadata as these features are inbuilt in suchapplications. Apart from this, they can be easily maintained,enhanced and re-created. Presently many open source software(OSS) applications are available for library and informationmanagement, for example DSpace, GSDL, Fedora, Eprintsetc. Therefore, organisations can choose the one which is themost suitable for their requirement and implement them tocreate digital repositories. Focused mainly on three of the mostpopular Open source Digital Library software- DSpace, GSDLand EPrints.4.Literature ReviewThe Digital Library Management softwares (DLMS)provide a user-friendly and customisable architecture tocreate online digital libraries with much ease. With help ofthese applications, institutions/organisations can publish theirresearch work, technical papers, manuscripts which will notonly be available globally but also preserved as digital items.The softwares discussed above (Dspace, GSDL & EPrints)possess different services and architectures. However, it is noteasy to propose one specific DLMS system as the most suitablefor all cases. The study can help an organisation to select aproper DLMS for showcasing their digital repositories basedon their own criteria. These criteria can consist of the type/format of the content to be uploaded, how the material is to bedistributed, what is the backend and frontend of the softwareand the time frame available to setup this digital collection.3Das compared the three software (Dspace, GSDL &EPrints) and observed that current open source digitallibrary software still lacks certain functionalities apparentto be significant, as gathered from the literature. However,considering the three Dspace, GSDL & EPrints, Dspace andGreenstone have been found to be most suitable as they havewell-built support to provide the desired functionalities to theend-users. EPrints is not far behind and it has potential to getbetter as it is going to add usage monitoring and reportingelement in its upcoming version. The shortcomings of E-Printsas pointed out by the paper were lack of strong support in certainareas, especially in its search-module. However, this paper alsoagrees that each software package has its own strengths andweaknesses that caters to the need of various organisationswith different set of needs16.Seshaiah and Veeraanjaneyulu5 presented someremarkable features of GSDL, and found that GSDL suits bothWindows and Unix (Linux SunOS) and any of these systemscan be used as a web server. It also has inbuilt administrationfunction that enables the items to authorise new users tobuild collection, protect documents so that they can only beaccessed by registered users. The collection created by GSDLpossess effective full-text searching as well as metadata-basedbrowsing facilities. Large volume (upto several gigabytes) canbe built. Despite large data-volume, full-text searching is fastbecause of techniques like compression of the indexes to reducedata sise etc. There is provision of Plug-Ins to accommodatenew document types. The collection can accept multiple typeof data like pictures, music, audio, video etc. It also supportsdocuments from a variety of languages. Collection can be362updated in real time.Another study by Sahu and Kadaria also discussed aboutthe selection criteria for Open source digital library software.It states that evaluation of open source software is differentfrom proprietary programs. The major variation in evaluationcomes from the fact the information available for open sourceprograms is generally different than that for proprietaryprograms. This information can be like availability of sourcecode, program design opens for analysis by others, interactionbetween users and developers through open platform regardingthe performance issues and many others. The authors are ofview that selection criteria can be Open source licenses,Functional modules, Stable releases, Developers and usercommunity, User interface and Documentation6.The paper titled “Institutional repository softwarecomparison: DSpace, EPrints, Digital Commons, Islandoraand Hydra” supports DSpace as it has proven to be a strongand reliable repository platform since it was launched in 2002.With its latest releases, DSpace still maintains its positionamong the plethora of new DLMS available by providing morerobust support for research data and more extensible back-ends.Whereas about EPrints, it points out that the main attractionof EPrints seem to be its user-friendly interface and ease-ofimplementation. However, the migration from another systeminto Eprints is not that easy The paper also mentions that Eprintcan be an ideal repository solution for implementation in aninstitute where resources (financial or technical expertise) arelimited7.Rao8 explored some of the reasons for using open sourcelibrary management software. The major points that hementioned are like free of cost availability since it can be freelydownloaded from internet and ease of customisation to meetthe organisation’s specific needs. There are no copyright issueswith this software and they use open standards which allowseasy interoperability with other software. This software areregularly updated and there are online manuals available fortechnical support and help. Online help through developers’community is also available.Madalli9 advocated that DSpace is a fairly powerfulsoftware. Its main strength is that it allows submission of digitaldocuments by it members but presently, it does not followMETS (Metadata Encoding and Transmission Standard). If itfollows that, it can become much more powerful. The paperexpects that the upcoming versions of DSpace will includeMETS also13.Patil & Kanamadi10 compared GSDL and EPrints as twowidely used open source repository-software which mainlyaimed at providing open access to article pre-prints and postprints, including digital theses. These support a variety of filetypes like video, audio, images and zip files i.e all these typesof files can be uploaded in these repositories. The authorsconcluded that EPrints is a useful Digital Library system whichalso has a large user community. But on the flip side whenever,there is a need for technical support and training in using thesoftware, DSpace was found more convenient.5.DSpaceDSpace is an open source digital library software which

verma & kumar: comparative analysis of open source digital library softwares: a case studyallows us to capture and store digital data like text, video,audio etc into created repositories. It also provides facilityto index, preserve and disseminate the digital material. Thusdigital libraries use DSpace to manage the digital materials andpublications in professionally maintained repositories.If we see the world-wide scenario, there are more than1000 digital repositories which are developed using theDSpace application for storing, distributing and preservingtheir digital data. DSpace is more common as a platform tobuild an institutional repository which is a digital collectionof research documentation, intellectual publications, librarycollections etc. In Indian scenario Dspace is being used in manyreputed organisations and projects like National Digital LibraryProgramme of GoI, IIT Kharagpur Central Library, DIAT, DU(Deemed University) Pune, KUVEMPU University other IITs,IIMs and many other research and academic organisations.DSpace performs three major tasks to build a repository: It captures and ingests the digital content along withmetadata It lists the content systematically and helps in searchingbased on keywords and metadata It supports preservation of the digital data for a longperiod of timeTherefore, DSpace can easily be customised to manageand preserve the digital content and provide accessibility ofthis data to the users. Since it is an open source software, anactive community of developers, researchers and users acrossthe world are collaborating to provide their expertise to enhancethis application.DSpace is capable of storing a wide range of digital data,which includes documents like articles, technical reports,conference papers, books, theses, multimedia publications,Administrative records, images, audio-video files, web pagesetc. It also provides multiple features like visualisation,simulation of the stored data etc.5.1 Latest Features of DspaceAs DSpace is a continuously growing platform, it keepson releasing upgraded versions from time to time. 6.x isthe latest update to the DSpace platform11. It consists of anupgraded configuration system, upgraded file storage plugins,and better quality control / health-check reporting features(through REST API and also through email). Furthermore,DSpace 6 has a Java API refactor that adds support for bothUUIDs and Hibernate in the database layer. This feature makesit compatible for future challenges.As reported by DSpace official website, the new Featuresand improvements in 6.x version includes. Java API refactor, featuring Hibernate and UUIDs Enhanced (reloadable) configuration system, featuring anew local.cfg configuration file Enhanced file storage plugins, featuring support forAmazon S3 Configurable site healthchecks via email XMLUI framework for metadata import from externalsources, featuring support for PubMed imports XMLUI export of search results to CSV (for batchediting) XMLUI extensible administrative control panelREST API Quality Control Reports, along with sampleHTML clients and CSV export (for batch editing)REST API support for additional authentication methods(e.g. LDAP, etc)All searches default to Boolean AND.Enhanced indexing for searches (Excel is now searchable,as well as right-to-left text in PDFs)OAI-PMH adds compliance for Open AIRE 3.0 guidelinesfor literature repositories”125.2 Limitation of DspaceDuring implementation some limitations have beenobserved such as Flat File and Metadata structure, poor userinterface, lack of scalability and extensibility, Limited API,Limited Metadata Features, Limited Reporting Capabilitiesand lack of support for linked data.6. GreenStone Digital LibraryGreenstone Digital Library (GSDL) is an open source,multilingual software, which has been released under theterms of the GNU General Public License and is used widelyfor creating repositories and making them accessible online13.The development and distribution of GSDL is an outcomeof the joint efforts by the New Sealand Digital LibraryProject at the University of Waikato, UNESCO and the HumanInfo hyperlink “http://humaninfo.org/” NGO. The aim ofGreenstone software is to enable the users in building their owndigital libraries. It provides a way to organise this informationand publish it on the web or any other digital storage medialike DVD and USB flash drives. In the later case, it will runon a non-networked environment. The digital libraries builtby GSDL are fully-searchable and metadata-driven digitalresource14.Infact, this software encourages the effective deploymentof digital libraries to share information and put it in the publicdomain. Therefore, it is in itself not a digital library, rather itprovides a platform to build the digital library.In 2004 its developers of GSDL were awarded by IFIPNamur award for “contributions to the awareness of socialimplications of information technology, and the need for aholistic approach in the use of information technology thattakes account of social implications”14.6.1 GreenStone Digital Library VersionsThere are two main versions of GSDL namely GSDL2and GSDL3. GSDL2 was the earlier version and still underwide-use where as GSDL3 is the latest version under activedevelopment. The best thing is that GSDL3 has backwardcompatibility and contains almost all the features of GSDL2.If a programmer is already working on GSDL2, he can eitherwork with the latest release of GSDL2 or consider upgradingto GSDL3. The Greenstone Librarian Interface (GLI) providesa feature to import ‘Greenstone2 collection’ which helps inmigrating to the new software for existing users of GSDL2.Greenstone3 has been developed in JAVA and uses variouslatest web technologies—like XML Transforms (XSLT), andthe Java Authentication and Authorisation Service (JASS). In363

DJLIt, Vol. 38, No. 5, sept 2018the same context if we see Greenstone2, then it was written inC and was based on many self-developed techniques by thedevelopers as many latest web technologies were not availableat the time. This made the users totally dependent upon thedocumentation by the development team. All these limitationshave been overcome in the latest GSDL version.6.2 Limitation of Green Stone Digital LibrarySome limitations of GSDL have also been observed likeInteractive content updation and management are not possible,no provisions for identifying duplicacy, metadata handlingseems to be a bit difficult, during the collection buildingprocessing of some documents it hangs. Also, Linux Versionlooks robust than Windows.7. EPrintsEprints has been one of the popular Digital librarysoftware which has been in use for almost last two decadesIt has been created at the University of Southampton and thecurrently version EPrints 3.3.16 Beta 1 is being used.Being an open source software, it is convenient for use byany organisation with limited resources also. Initially Eprintsrequired software-platform like Linux, Apache, MySQL, andPerl; now it can also run on Window’s platform which hasmade it even easier for users.Just like the other two Digital library software, Eprintsis also a good choice to create an Institutional Repositoryand make it running. Documents along with the necessarymetadata for the records can be uploaded by the users by fillinginformation into a web form.This software links to the SHERPA/RoMEO databasewhich helps the authors to verify their rights regarding theirsubmissions in the repository. In this way any unauthorisedsubmission by the content-publisher is well taken care of.7.1 Features of EprintsEprintsis easy to use for both the end-users and theadministrators; this is the biggest quality of Eprints. Users cansubmit the documents on Eprint in a straight-forward mannerwhere users can proceed through the submission-process onestep at a time. The metadata information can be provided withthe e-copy of the document. The metadata information is quitesimple like document type, document-title, author’s name, dateof submission etc. and can be submitted using a simple form.This doesn’t require any knowledge of HTML or XML. Forthe administrator, the fields in the metadata are customizable.Therefore, the administrator can allow only those fields whichare relevant for a particular repository and the end-user needs tofill only those particular fields. Users have an added advantageto manage their submissions as editing, updating, and removalof documents is possible even after submission. However, theadministrator has the rights to restrict these functionalities.Another facility that Eprints provide is that theadministrator can specify a period only after which thedocument is transferred automatically to the archive-section.Eprints also provide very effective search as well asbrowsing features. Search can be performed based on multipleoptions whereas the browsing feature is customizable and364robust. This helps in finding the documents effectively inthe archives (“Repositories Support Project”). The MetadataField entered, help in browsing the collection. For example,a particular document can be browsed Year-wise, departmentwise, volume-wise etc. Browsing can be done based on any ofthe metadata fields within a collection, and multiple browsingcriteria can be used. The browsing category can be customisedby the administrator. Since Eprints is OAI-compliant, Googleindexes the documents which are uploaded on an Eprintsarchive. This helps in enhancing the visibility of Eprintdocuments in cyber-space.As per the feedback provided by users and othertechnical reviews, it has been widely accepted thatthe installation and configuration of Eprints is simpleand fast. ‘Eprints Services’ is a company formed bythe developers of Eprints which helps organisationsto install, configure and use Eprint based repositories.Due to its multiple advantages today Eprints is being usedin approximately 300 reputed organisation, the largest beingthe repository developed at the University of Twente in theNetherlands. This repository contains over 60,000 record. Thisin itself demonstrates the capability of Eprints in handlinglarge collections.7.2 Limitations of EPrintsNo doubt there are multiple advantages of using Eprintsto create digital repositories in libraries; still we may countcertain limitations like the lack of the bulk upload feature.Uploading of files and creating records is definitely easy, butif someone has to upload an existing archive, then there areno options available to upload multiple records at one time.Multiple files can be uploaded in one go, but only when belongto the same record.To elaborate further, migrating of records from an existingdigital library software to Eprints is not at all a problem but ifthe existing collections are not contained within a database, thenthe records can’t be uploaded in bulk in Eprints. This meanseach record has to be created individually. Also, in Eprints onecan’t create common records for multiple documents ratherindividual records for each document should be created oneby one.Another limitation of Eprints is the limited features inits search functionality. Boolean search is not available andalso sometimes the search gives no output at all, which is notacceptable in today’s time. At least suggestions for alternatesearch should be provided. User-created tagging feature is alsomissing in Eprints.8.Comparison of Dspace, GSDL andEprintsBased on above discussion Features Comparison forDSpace, GSDL and EPrints are given in Table 1.9. Practical Implementation ofDspace at DESIDOCDefence Scientific information and Documentation Centre(DESIDOC) of DRDO which provides information to variousDRDO laboratories through its information and knowledge

verma & kumar: comparative analysis of open source digital library softwares: a case studyTable 1. Comparative account of GSDL, DSpace and EPrintsFeatures of Opensource SoftwareGSDLDSpaceEPrintsProduct TypeSoftwareSoftwareSoftwareYear of creation199720022000License sTrainingService via 3rd part service providerTraining, Consultancy, SiteVisits.Plug-in extendsYesYesResource IdentifierNo/OAI IdentifierCNRI HandlesNoOAI-PMHYesYesYesZ39.50 SupportYesNoNoSupported File formatsdoc, pdf, html, ppt, postscript, jpeg, gif,video, mp3, etcdoc, pdf, html, ppt, jpeg, gif, audio,video,etc.Pdf, html, jpeg,tiff, MP3 andAVISupported ItemTypes(Storage andrendition)Can store and manage all types ofcontentCan store and manage all types ofcontentCan store and manage all typesof contentThumbnail PreviewImages, Audio, VideoImagesImages, Audio, VideoMultilingual Support5Greenstone provides ready-to usemultilingual interfaces that are alreadytranslated in many languages.Unicode character encoding, so differentlanguages can be supportedUnicode is usedMachine-to MachineInteroperability.Z39.50, OAIMHPOAIMHP, OAIORE, SWORD, SWAPOAIMHP, OAIORE, SWORD,SWAP,RDFSyndication---RSS, ATOMRSS, ATOMUser AuthenticationUser GroupsLDAP Authentication, ShibolethAuthenticationLDAP AuthenticationSearching Capabilities4Field Specific, Boolean LogicField Specific, Boolean Logic, SortingoptionsField Specific, Sorting optionsBrowsing OptionsBrowsing can be done usingany fieldBy Author, Title, Subject and collectionBrowsing can be done usingany fieldMetadata formats3Dublin Core, Qualified DC , METS,RFC1807’ NZGLS (New ZealandGovernment Locator Service), AGLS(Australian Government LocatorService)Dublin Core, Qualified DC, METSDublin Core, METSAssociated SoftwareApache Web server, Java 1.4.0 orabove, Image Magick Software Ghostscripts and Web BrowserJava JDK5 or later Apache Ant 1.6.2 orlater, Apache Maven 2.0.8 or later JavaLinux or Unix, Apache, Perl1.4 or later, PostgreSQL 7.3 or later,Apache Tomcat 4.x/5.x and Web BrowserSoftware Platforms3Windows95/98/Me/NT/2000/XP/10Unix/Linux, and MAC OS-XWindows(NT/2000/XP/10) andAllPOSIX (Linux/BSD/UNIX-like OSs), Linux, Unix, Windows,OSXStatistical reportingYes(Count of Full records)Yes(Count of Full records)73Yes(Count of Full records)365

DJLIt, Vol. 38, No. 5, sept 2018Features of Opensource SoftwareGSDLDSpaceEPrintsDatabasesIts OwnOracle, PostgreSQLMySQL, Oracle, PostgreSQ L,Cloud.ProgrammingLanguageC , Perl, JavaJava and JSPPerlWeb ServerApache/I ISApache and TomcatApacheURL for re 1. Single window services.based e-services. More than 30 services are being provided toall the DRDO units country-wide through DRDO Intranet (Fig.1) depicts the webpage on DESIDOC library-portal whichcontains the list of all these 33 services.9.1 Requirement of DLS at DESIDOCThe web-services used to provide the informationto the users were developed at different platforms in duecourse of time caused varied user experiences and difficultyin maintaining and hosting these 33 services. Therefore, foruniform user experience, enhanced search features and foreffective maintenance of these service, it was decided tobuild these information repositories by using a uniform DLSplatform.9.2 DSpace at DESIDOCBased on the analysis carried out to compare the mostsuitable DLS platform to build DESIDOC repositories, itwas decided to opt for DSpace for creating these digitalrepositories. Few reasons to go for it was that it used JSP366and PostgreSQL as frontend and backendto build the applications. Both of these areavailable in open domain and easy to workwith. Also, postgreSQL is capable of storingand handling large amount to data which wasthe requirement of DESIDOC. Furthermore,DSpace provides various features for theusers like full text search, metadata basedsearch, federated search etc.Furthermore, we can add or change anyfield to customise the default dubling coreMetadata of DSpace.Dspace applicationis capable of accepting & managing largeno of file formats like word, pdf, jpg, tiff,jpeg and even unrecognised formats can beregistered in Dspace for future identification.Dspace has also been designed with flexiblestorage & retrieval architecture which cansupport variety of data formats & researchdisciplines.At the same time many more featurescan be incorporated in these DSpace digitalrepositories like usage pattern analysis,implementation of business-intelligencetools etc, to make the services much more effective and userfriendly. All these features and qualities made DSpace as thedefault choice for IR at DESIDOC.The features provided currently by DSpace can besummarised as follows. Uniform user-interface (UI) for all the services Specific search for individual services, available on thehome page of each service Searches from meta data and full text for the service. Federated search facility: One common text box whichsearch full text and meta data in all the 18 servicesmigrated into DSpace.Figure 2 shows these features in one of the repositoriescreated using DSpace platform.Figure 3 and Figure 4 depict the search features ofrepositories built on DSpace platform.In this way DESIDOC has successfully implementedDSpace platform for 18 of its services like DRDO E-Journals,DRDO Knowledge Repository, DRDO Science Spectrum,DRDO Technology Spectrum, Newspaper Clipping Service,Institutional repositories of DRDO, Archiving of Newspaper

verma & kumar: comparative analysis of open source digital library softwares: a case studyClipping, Union catalogue of periodicals, Archivingof E- journals etc.What is worth-mentioning here is that all theserepositories are actually, different communities of thesame DSPace DLS where as for the users, they appearas different services provided by the Digital library ofDESIDOC. This integration of various informationweb-services into a common Dspace platform, hashelped the IT-administrators at DESIDOC a lot,because now, instead of maintaining and managingmultiple backends and frontends of various repositoriesthey have now to deal with only one DLS i.e DSpace.Figure 2. Uniform UI & search facility.Figure 3. Individual service search results.Figure 4. Federated search result.9.3 Proposed Enhancement in the DLS atDESIDOCAs the current repositories built using Dspacehave performed quite well in meeting the user’srequirements, DESIDOC is in the process to migratethe remaining repositories also on the Dspace platform.This will help DESIDOC, as an information centre,to have all its repositories in the common DLS i.e.DSpace. After this is done, there will be an advantageto search all the repositories through a single searchi.e. federated search. Ease of maintenance and uniformuser experience will be another benefits.Also, it is proposed to analyse the usage patternof these repositories through the features providedin the Dspace DLS. This will help the librarians atDESIDOC to und

facilitated into achieving this. In this paper, a study of three open-source digital library management software has been presented which collects and disseminates information for library-users. This analysis involves the study and comparison of related software documents and respective technical manuals. Based on the results of the comparison,