Ontology Learning For The Semantic Web - Inspiring Innovation

Transcription

T h eS e m a n t i cW e bOntology Learning forthe Semantic WebAlexander Maedche and Steffen Staab, University of KarlsruheThe Semantic Web relies heavily on formal ontologies to structure data for comprehensive and transportable machine understanding. Thus, the proliferation ofontologies factors largely in the Semantic Web’s success. Ontology learning greatly helpsontology engineers construct ontologies. The vision of ontology learning that we proposeThe authors presentan ontology-learningframework thatextends typicalontology engineeringenvironments by usingsemiautomaticThe conceptual structures that define an underlyingontology provide the key to machine-processable dataon the Semantic Web. Ontologies serve as metadataschemas, providing a controlled vocabulary of concepts,each with explicitly defined and machine-processablesemantics. By defining shared and common domain theories, ontologies help people and machines to communicate concisely—supporting semantics exchange, notjust syntax. Hence, the Semantic Web’s success and proliferation depends on quickly and cheaply constructingdomain-specific ontologies.Although ontology-engineering tools have maturedover the last decade,1 manual ontology acquisitionremains a tedious, cumbersome task that can easilyresult in a knowledge acquisition bottleneck. Whendeveloping our ontology-engineering workbench,OntoEdit, we particularly faced this question as wewere asked questions that dealt with time (“Can youdevelop an ontology quickly?”), difficulty, (“Is it difficult to build an ontology?”), and confidence (“Howdo you know that you’ve got the ontology right?”).These problems resemble those that knowledgeengineers have dealt with over the last two decadesas they worked on knowledge acquisition methodologies or workbenches for defining knowledgebases. The integration of knowledge acquisition withmachine-learning techniques proved extremely beneficial for knowledge acquisition.2 The drawback tosuch approaches,3 however, was their rather strongfocus on structured knowledge or databases, fromwhich they induced their rules.Conversely, in the Web environment we encounterwhen building Web ontologies, structured knowledge bases or databases are the exception rather thanthe norm. Hence, intelligent support tools for anontology engineer take on a different meaning thanthe integration architectures for more conventionalknowledge acquisition.4In ontology learning, we aim to integrate numerousdisciplines to facilitate ontology construction, particularly machine learning. Because fully automaticmachine knowledge acquisition remains in the distantfuture, we consider ontology learning as semiautomatic with human intervention, adopting the paradigmof balanced cooperative modeling for constructingontologies for the Semantic Web.5 With this objectivein mind, we built an architecture that combines knowledge acquisition with machine learning, drawing on1094-7167/01/ 10.00 2001 IEEEIEEE INTELLIGENT SYSTEMSincludes a number of complementary disciplines thatfeed on different types of unstructured, semistructured, and fully structured data to support semiautomatic, cooperative ontology engineering. Our ontology-learning framework proceeds through ontologyimport, extraction, pruning, refinement, and evaluation, giving the ontology engineer coordinated toolsfor ontology modeling. Besides the general framework and architecture, this article discusses techniques in the ontology-learning cycle that we implemented in our ontology-learning environment, suchas ontology learning from free text, dictionaries, andlegacy ontologies. We also refer to other techniquesfor future implementation, such as reverse engineering of ontologies from database schemata orlearning from XML documents.ontology-constructionOntologies for the Semantic Webtools. The frameworkencompasses ontologyimport, extraction,pruning, refinement,and evaluation.72

Legacy and application dataresources that we find on the syntactic Web—free text, semistructured text, schema definitions (such as document type definitions[DTDs]), and so on. Thereby, our framework’smodules serve different steps in the engineering cycle (see Figure 1): Merging existing structures or definingmapping rules between these structuresallows importing and reusing existingontologies. (For instance, Cyc’s ontological structures have been used to constructa domain-specific ontology.6) Ontology extraction models major partsof the target ontology, with learning support fed from Web documents. The target ontology’s rough outline, whichresults from import, reuse, and extraction,is pruned to better fit the ontology to itsprimary purpose. Ontology refinement profits from the prunedontology but completes the ontology at afine granularity (in contrast to extraction). The target application serves as a measurefor validating the resulting ontology.7Finally, the ontology engineer can begin thiscycle again—for example, to include newdomains in the constructed ontology or tomaintain and update its logyOntologylearningImport and reuseApplyLegacy and application dataFigure 1. The ontology-learning process.XML schemaLegacy databasesWeb documentsArchitectureGiven the task of constructing and maintaining an ontology for a Semantic Webapplication such as an ontology-basedknowledge portal,8 we produced support forthe ontology engineer embedded in a comprehensive architecture (see Figure 2). Theontology engineer only interacts via thegraphical interfaces, which comprise two ofthe four components: the OntoEdit Ontology Engineering Workbench and the Management Component. Resource Processingand the Algorithm Library are the architecture’s remaining components.The OntoEdit Ontology EngineeringWorkbench offers sophisticated graphicalmeans for manual modeling and refining of thefinal ontology. The interface gives the user different views, targeting the epistemologicallevel rather than a particular representation language. However, the user can export the ontological structures to standard Semantic Webrepresentation languages such as OIL (ontology interchange language) and DAML-ONT(DAML ontology language), as well as our ownF-Logic-based extensions of RDF(S)—we useRDF(S) to refer to the combined technologiesMARCH/APRIL 2001O2DTDHTMLHTMLCrawlcorpusXMLImport ing ontologiesO1Natural-language-processing ResourceProcessingsystemCenterManagement Edit OntologyEngineering WorkbenchProcessed dataAlgorithmLibraryResultsetFigure 2. Ontology-learning architecture for the Semantic Web.computer.org/intelligent73

T h eS e m a n t i cW e bof the resource description framework andRDF Schema. Additionally, users can generate and access executable representations forconstraint checking and application debuggingthrough SilRi (simple logic-based RDF interpreter, www.ontoprise.de), our F-Logic inference engine, which connects directly toOntoEdit.We knew that sophisticated ontology-engineering tools—for example, the Protégé modeling environment for knowledge-based systems1—would offer capabilities roughlycomparable to OntoEdit. However, in tryingto construct a knowledge portal, we foundthat a large conceptual gap existed betweenthe ontology-engineering tool and the input(often legacy data), such as Web documents,Web document schemata, databases on theWeb, and Web ontologies, which ultimatelydetermine the target ontology. Into this voidwe have positioned new components of ourontology-learning architecture (see Figure 2).The new components support the ontologyengineer in importing existing ontology primitives, extracting new ones, pruning givenones, or refining with additional ontologyprimitives. In our case, the ontology primitives comprise a set of strings that describe lexical entriesL for concepts and relations; a set of concepts C (roughly akin tosynsets in WordNet9); a taxonomy of concepts with multipleinheritance (heterarchy) HC; a set of nontaxonomic relations Rdescribed by their domain and rangerestrictions; a heterarchy of relations—HR; relations F and G that relate concepts andrelations with their lexical entries; and a set of axioms A that describe additionalconstraints on the ontology and makeimplicit facts explicit.8This structure corresponds closely toRDF(S), except for the explicit considerationof lexical entries. Separating concept reference from concept denotation permits verydomain-specific ontologies without incurring an instantaneous conflict when merging ontologies—a standard Semantic Webrequest. For instance, the lexical entry schoolin one ontology might refer to a building inontology A, an organization in ontology B,or both in ontology C. Also, in ontology A,we can refer to the concept referred to inEnglish by school and school building by the74German Schule and Schulgebäude.Ontology learning relies on an ontologystructured along these lines and on input dataas described earlier to propose new knowledgeabout reasonably interesting concepts, relations, and lexical entries or about links betweenthese entities—proposing some for addition,deletion, or merging. The graphical result setpresents the ontology-learning process’sresults to the ontology engineer (we’ll discussthis further in the “Association rules” section).The ontology engineer can then browse theresults and decide to follow, delete, or modifythe proposals, as the task requires.ComponentsBy integrating the previously discussed con-In trying to construct aknowledge portal, we found that alarge conceptual gap existedbetween the ontologyengineering tool and the input(often legacy data).siderations into a coherent generic architecturefor extracting and maintaining ontologies fromWeb data, we have identified several core components (including the graphical user interfacediscussed earlier).Management componentgraphical user interfaceThe ontology engineer uses the management component to select input data—that is,relevant resources such as HTML and XMLdocuments, DTDs, databases, or existingontologies that the discovery process can further exploit. Then, using the managementcomponent, the engineer chooses from a setof resource-processing methods available inthe resource-processing component and froma set of algorithms available in the algorithmlibrary.The management component also supportsthe engineer in discovering task-relevantlegacy data—for example, an ontology-basedcrawler gathers HTML documents that are relevant to a given core ontology.computer.org/intelligentResource processingDepending on the available input data, theengineer can choose various strategies forresource processing: Index and reduce HTML documents tofree text. Transform semistructured documents,such as dictionaries, into a predefined relational structure. Handle semistructured and structuredschema data (such as DTDs, structureddatabase schemata, and existing ontologies) by following different strategies forimport, as described later in this article. Process free natural text. Our systemaccesses the natural-language-processingsystem Saarbrücken Message ExtractionSystem, a shallow-text processor for German.10 SMES comprises a tokenizerbased on regular expressions, a lexicalanalysis component including variousword lexicons, an amorphological analysis module, a named-entity recognizer,a part-of-speech tagger, and a chunkparser.After first preprocessing data according to oneof these or similar strategies, the resource-processing module transforms the data into analgorithm-specific relational representation.Algorithm libraryWe can describe an ontology by a numberof sets of concepts, relations, lexical entries,and links between these entities. We canacquire an existing ontology definition(including L, C, HC, R, HR, A, F, and G),using various algorithms that work on thisdefinition and the preprocessed input data.Although specific algorithms can varygreatly from one type of input to the next, aconsiderable overlap exists for underlyinglearning approaches such as associationrules, formal concept analysis, or clustering.Hence, we can reuse algorithms from thelibrary for acquiring different parts of theontology definition.In our implementation, we generally usea multistrategy learning and result combination approach. Thus, each algorithm pluggedinto the library generates normalized resultsthat adhere to the ontology structures we’vediscussed and that we can apply toward acoherent ontology definition.Import and reuseGiven our experiences in medicine,IEEE INTELLIGENT SYSTEMS

Figure 3. Screenshot of our ontology-learning workbench, Text-To-Onto.telecommunications, tourism, and insurance,we expect that domain conceptualizations areavailable for almost any commercially significant domain. Thus, we need mechanisms andstrategies to import and reuse domain conceptualizations from existing (schema) structures. We can recover the conceptualizations,for example, from legacy database schemata,DTDs, or from existing ontologies that conceptualize some relevant part of the targetontology.In the first part of import and reuse, weidentify the schema structures and discusstheir general content with domain experts.We must import each of these knowledgesources separately. We can also import manually—which can include a manual definition of transformation rules. Alternatively,reverse-engineering tools—such as thosethat exist for recovering extended entityrelationship diagrams from a given database’s SQL description (see the sidebar)—might facilitate the recovery of conceptualstructures.In the second part of the import and reusestep, we must merge or align imported conMARCH/APRIL 2001ceptual structures to form a single commonground from which to springboard into thesubsequent ontology-learning phases ofextracting, pruning, and refining. Althoughthe general research issue of merging andaligning is still an open problem, recent proposals have shown how to improve the manual merging and aligning process. Existingmethods mostly rely on matching heuristicsfor proposing the merger of concepts and similar knowledge base operations. Our researchalso integrates mechanisms that use an application-data–oriented, bottom-up approach.11For instance, formal concept analysis lets usdiscover patterns between application dataand the use of concepts, on one hand, andtheir heterarchies’relations and semantics, onthe other, in a formally concise way (see B.Ganter and R. Wille’s work on formal concept analysis in the sidebar).Overall, the ontology-learning import andreuse step seems to be the hardest to generalize. The task vaguely resembles the generalproblems encountered in data-warehousing—adding, however, challenging problems ofits raction models majorparts—the complete ontology or largechunks representing a new ontology subdomain—with learning support exploitingvarious types of Web sources. Ontologylearning techniques partially rely on givenontology parts. Thus, we here encounter aniterative model where previous revisionsthrough the ontology-learning cycle canpropel subsequent ones, and more sophisticated algorithms can work on structuresthat previous, more straightforward algorithms have proposed.To describe this phase, let’s look at someof the techniques and algorithms that weembedded in our framework and implemented in our ontology-learning environment Text-To-Onto (see Figure 3). Wecover a substantial part of the overall ontology-learning task in the extraction phase.Text-To-Onto proposes many differentontology learning algorithms for primitives, which we described previously (thatis, L, C, R, and so on), to the ontology engineer building on several types of input.75

A Common PerspectiveUntil recently, ontology learning—for comprehensive ontology construction—did not exist. However, much work in numerous disciplines—computational linguistics, information retrieval,machine learning, databases, and software engineering—hasresearched and practiced techniques that we can use in ontology learning. Hence, we can find techniques and methods relevant for ontology learning referred to as“acquisition of selectional restrictions,”1,2“word sense disambiguation and learning of word senses,”3“computation of concept lattices from formal contexts,”4 and“reverse engineering in software engineering.”5 Ontology learning puts many research activities—which focuson different input types but share a common domain conceptualization—into one perspective. The activities in Table A span avariety of communities, with references from 20 completely different events and journals.References1. P. Resnik, Selection and Information: A Class-Based Approach toLexical Relationships, PhD thesis, Dept. of Computer Science, Univ.of Pennsylvania, Philadelphia, 1993.2. R. Basili, M.T. Pazienza, and P. Velardi, “Acquisition of SelectionalPatterns in a Sublanguage,” Machine Translation, vol. 8, no. 1, 1993,pp. 175–201.3. P. Wiemer-Hastings, A. Graesser, and K. Wiemer-Hastings, “Inferringthe Meaning of Verbs from Context,” Proc. 20th Ann. Conf. CognitiveScience Society (CogSci-98), Lawrence Erlbaum, New York, 1998.4. B. Ganter and R. Wille, Formal Concept Analysis: MathematicalFoundations, Springer-Verlag, Berlin, 1999.5. H.A. Mueller et al., “Reverse Engineering: A Roadmap,” Proc. Int’lConf. Software Eng. (ICSE-00),ACM Press, New York, 2000, pp. 47–60.6. P. Buitelaar, CORELEX Systematic Polysemy and Underspecification, PhDthesis, Dept. of Computer Science, Brandeis Univ.,Waltham, Mass., 1998.7. H. Assadi, “Construction of a Regional Ontology from Text and ItsUse within a Documentary System,” Proc. Int’l Conf. Formal Ontology and Information Systems (FOIS-98), IOS Press, Amsterdam.8. D. Faure and C. Nedellec, “A Corpus-Based Conceptual ClusteringMethod for Verb Frames and Ontology Acquisition,” Proc. LREC-98Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications, European Language Resources—Distribution Agency, Paris, 1998.9. F. Esposito et al., “Learning from Parsed Sentences with INTHELEX,”Proc. Learning Language in Logic Workshop (LLL-2000) and Learning Language in Logic Workshop (LLL-2000), Assoc. for Computational Linguistics, New Brunswick, N.J., 2000, pp. 194-198.10. A. Maedche and S. Staab, “Discovering Conceptual Relations fromText,” Proc. European Conf. Artificial Intelligence (ECAI-00), IOSPress, Amsterdam, 2000, pp. 321–325.11. J.-U. Kietz,A. Maedche, and R. Volz, “Semi-Automatic Ontology Acquisition from a Corporate Intranet.” Proc. Learning Language in LogicWorkshop (LLL-2000), ACL, New Brunswick, N.J., 2000, pp. 31–43.12. E. Morin, “Automatic Acquisition of Semantic Relations betweenTerms from Technical Corpora,” Proc. of the Fifth Int’l Congress onTerminology and Knowledge Engineering (TKE-99), TermNet-Verlag, Vienna, 1999.13. U. Hahn and K. Schnattinger, “Towards Text Knowledge Engineering,” Proc. Am. Assoc. for Artificial Intelligence (AAAI-98),AAAI/MIT Press, Menlo Park, Calif., 1998.14. M.A. Hearst, “Automatic Acquisition of Hyponyms from Large TextCorpora,” Proc. Conf. Computational Linguistics (COLING-92), 1992.15. Y. Wilks, B. Slator, and L. Guthrie, Electric Words: Dictionaries,Computers, and Meanings, MIT Press, Cambridge, Mass., 1996.16. J. Jannink and G. Wiederhold, “Thesaurus Entry Extraction from anOn-Line Dictionary,” Proc. Second Int’l Conf. Information Fusion(Fusion-99), Omnipress, Wisconsin, 1999.17. J.-U. Kietz and K. Morik, “A Polynomial Approach to the Constructive Induction of Structural Knowledge,” Machine Learning,vol. 14, no. 2, 1994, pp. 193–211.18. S. Schlobach, “Assertional Mining in Description Logics,” Proc. 2000Int’l Workshop on Description Logics (DL-2000), 2000; ons/CEUR-WS/Vol-33.19. A. Doan, P. Domingos, and A. Levy, “Learning Source Descriptionsfor Data Integration,” Proc. Int’l Workshop on The Web and Databases (WebDB-2000), Springer-Verlag, Berlin, 2000, pp. 60–71.20. P. Johannesson, “A Method for Transforming Relational Schemasinto Conceptual Schemas,” Proc. Int’l Conf. Data Engineering(IDCE-94), IEEE Press, Piscataway, N.J., 1994, pp. 190–201.21. Z. Tari et al., “The Reengineering of Relational Databases Based onKey and Data Correlations,” Proc. Seventh Conf. Database Semantics (DS-7), Chapman & Hall, 1998, pp. 40–52.Table A. A survey of ontology-learning approaches.DomainMethodsFeatures usedPrime purposePapersFree TextClusteringInductive logicprogrammingAssociation rulesFrequency-basedPattern matchingClassificationSyntaxSyntax, logicrepresentationSyntax, TokensSyntax—Syntax, semanticsExtractExtractPaul Buitelaar,6 H. Assadi,7 and David Faure and Claure Nedellec8Frederique Esposito et al.9ExtractPruneExtractRefineAlexander Maedche and Steffen Staab10Joerg-Uwe Kietz et al.11Emanuelle Morin12Udo Hahn and Klemens Schnattinger13Extract—Marti Hearst,14 Yorik Wilks,15 and Joerg-Uwe Kietz et al.11Jan Jannink and Gio Wiederhold16Dictionary76Information extraction SyntaxPage rankTokensKnowledge base Concept induction,A-Box miningRelationsExtractJoerg-Uwe Kietz and Katharina Morik17 and S. Schlobach18Semistructured Naive BayesschemataRelationsReverse engineeringAnahai Doan et al.19RelationalschemataRelationsReverse engineeringPaul Johannesson20 and Zahir Tari et al.21Data correlationcomputer.org/intelligentIEEE INTELLIGENT SYSTEMS

Lexical entry and concept extractionOne of the baseline methods applied in ourframework for acquiring lexical entries withcorresponding concepts is lexical entry andconcept extraction. Text-To-Onto processesWeb documents on the morphological level,including multiword terms such as “databasereverse engineering” by n-grams, a simple statistics-based technique. Based on this text preprocessing, we apply term-extraction techniques, which are based on (weighted)statistical frequencies, to propose new lexicalentries for L.Often, the ontology engineer follows theproposal by the lexical entry and conceptextraction mechanism and includes a new lexical entry in the ontology. Because the newlexical entry comes without an associated concept, the ontology engineer must then decide(possibly with help from further processing)whether to introduce a new concept or link thenew lexical entry to an existing concept.Hierarchical concept clusteringGiven a lexicon and a set of concepts, onemajor next step is taxonomic concept classification. One generally applicable method withregard to this is hierarchical clustering, whichexploits items’ similarities to propose a hierarchy of item categories. We compute the similarity measure on the properties of items.When extracting a hierarchy from naturallanguage text, term adjacency or syntacticalrelationships between terms yield considerable descriptive power to induce the semantichierarchy of concepts related to these terms.David Faure and Claure Nedellec give asophisticated example for hierarchical clustering (see the sidebar). They present a cooperative machine-learning system, Asium(acquisition of semantic knowledge usingmachine-learning method), which acquirestaxonomic relations and subcategorizationframes of verbs based on syntactic input. TheAsium system hierarchically clusters nounsbased on the verbs to which they are syntactically related and vice versa. Thus, theycooperatively extend the lexicon, the conceptset, and the concept heterarchy (L, C, HC).Dictionary parsingMachine-readable dictionaries are frequently available for many domains. Althoughtheir internal structure is mostly free text, comparatively few patterns are used to give textdefinitions. Hence, MRDs exhibit a largedegree of regularity that can be exploited toextract a domain conceptualization.MARCH/APRIL 2001We have used Text-To-Onto to generate aconcept taxonomy from an insurance company’s MRD (see the sidebar). Likewise,we’ve applied morphological processing toterm extraction from free text—this time,however, complementing several patternmatching heuristics. Take, for example, thefollowing dictionary entry:Automatic Debit Transfer: Electronic servicearising from a debit authorization of the YellowAccount holder for a recipient to debit bills thatfall due direct from the account .We applied several heuristics to the morphologically analyzed definitions. Forinstance, one simple heuristic relates the definition term, here automatic debit transfer, withTargeting completeness for thedomain model appears to bepractically unmanageable andcomputationally intractable, buttargeting the scarcest modeloverly limits expressiveness.the first noun phrase in the definition, here electronic service. The heterarchy HC : HC (automatic debit transfer, electronic service) linkstheir corresponding concepts. Applying thisheuristic iteratively, we can propose large partsof the target ontology—more precisely, L, C,and HC to the ontology engineer. In fact,because verbs tend to be modeled as relations,we can also use this method to extend R (andthe linkage between R and L).Association rulesOne typically uses association-rule-learning algorithms for prototypical applications ofdata mining—for example, finding associations that occur between items such as supermarket products in a set of transactions forexample customers’ purchases. The generalized association-rule-learning algorithm extends its baseline by aiming at descriptions atthe appropriate taxonomy level—for example,“snacks are purchased together with drinks,”rather than “chips are purchased with beer,”and “peanuts are purchased with soda.”computer.org/intelligentIn Text-To-Onto (see the sidebar), we usea modified generalized association-rulelearning algorithm to discover relationsbetween concepts. A given class hierarchyHC serves as background knowledge. Pairsof syntactically related concepts—for example, pair (festival,island) describing thehead–modifier relationship contained in thesentence “The festival on Usedom attractstourists from all over the world.”—are givenas input to the algorithm. The algorithm generates association rules that compare the relevance of different rules while climbing up ordown the taxonomy. The algorithm proposeswhat appears to be the most relevant binaryrules to the ontology engineer for modelingrelations into the ontology, thus extending R.As the algorithm tends to generate a highnumber of rules, we offer various interactionmodes. For example, the ontology engineercan restrict the number of suggested relationsby defining so-called restriction concepts thatmust participate in the extracted relations.The flexible enabling and disabling of taxonomic knowledge for extracting relations isanother way of focusing.Figure 4 shows various views of theresults. We can induce a generalized relationfrom the example data given earlier—relationrel(event,area), which the ontology engineercould name locatedin, namely, events located inan area (which extends L and G). The user canadd extracted relations to the ontology bydragging and dropping them. To explore anddetermine the right aggregation level ofadding a relation to the ontology, the user canbrowse the relation views for extracted properties (see the left side of Figure 4).PruningA common theme of modeling in variousdisciplines is the balance between completeness and domain-model scarcity. Targeting completeness for the domain modelappears to be practically unmanageable andcomputationally intractable, but targeting thescarcest model overly limits expressiveness.Hence, we aim for a balance between the twothat works. Our model should capture a richtarget-domain conceptualization but excludethe parts out of its focus. Ontology importand reuse as well as ontology extraction putthe scale considerably out of balance whereout-of-focus concepts reign. Therefore, weappropriately diminish the ontology in thepruning phase.We can view the problem of pruning in atleast two ways. First, we need to clarify how77

T h eS e m a n t i cW e bFigure 4. Result presentation inText-To-Onto.pruning particular parts of the ontology (forexample, removing a concept or relation)affects the rest. For instance, Brian Petersonand his colleagues have described strategiesthat leave the user with a coherent ontology(that is, no dangling or broken links).6 Second,we can consider strategies for proposing ontology items that we should either keep or prune.Given a set of application-specific documents,several strategies exist for pruning the ontology that are based on absolute or relativecounts of term frequency combined with theontology’s background knowledge (see thesidebar).RefinementRefining plays a similar role to extracting—the difference is on a sliding scalerather than a clear-cut distinction. Although78extracting serves mainly for cooperativemodeling of the overall ontology (or at leastof very significant chunks of it), the refinement phase is about fine-tuning the targetontology and the support of its evolvingnature. The refinement phase can use datathat comes from a concrete Semantic Webapplication—for example, log files of userqueries or generic user data. Adapting andrefining the ontology with respect to userrequirements plays a major role in theapplication’s acceptance and its furtherdevelopment.In principle, we can use the same algorithms for extraction and refinement. However, during refinement, we must considerin detail the existing ontology and its existing connections, while extraction worksmore often than not practically from scratch.computer.org/intelligentUdo Hahn and Klemens Schnattingerpresented a prototypical approach for refinement (see the sidebar)—although notfor extraction! They introduced a methodology for automating the maintenance ofdomain-specific taxonomies. This incrementally updates an ontology as it acquiresnew concepts from text. The acquisitionprocess is centered on the linguistic andconceptual “quality” of various forms ofevidence underlying concept-hypothesisgeneration and refinement. Particul

ogy-learning framework proceeds through ontology import,extraction,pruning, refinement,and evalua-tion,giving the ontology engineer coordinated tools for ontology modeling. Besides the general frame-work and architecture, this article discusses tech-niques in the ontology-learning cycle that we imple-mented in our ontology-learning environment .