Natural Language Processing: An Introduction

Transcription

Natural Language Processing:An Introduction

NLP: The Ultimate Goal (1990)The Ultimate Goal – For computers to use NL as effectively ashumans do .“Natural language, whether spoken, written, or typed, is themost natural means of communication between humans,and the mode of expression of choice for most of thedocuments they produce. As computers play a larger rolein the preparation, acquisition, transmission, monitoring,storage, analysis, and transformation of information,endowing them with the ability to understand and generateinformation expressed in natural languages becomes moreand more necessary.”CIS 521 - Intro to AI2

NLP: Grand Challenges (1990)The Ultimate Goal – For computers to use NL as effectively ashumans do .Reading and writing text Abstracting Monitoring Extraction into DatabasesInteractive Dialogue: Natural, effective access to computer systems Informal Speech Input and OutputTranslation: Input and Output in Multiple LanguagesCIS 521 - Intro to AI3

Recent Significant Advances In NLP Web-scale information extraction& question answering IBM’s Watson Interactive Dialogue Systems Apple’s Siri (Microsoft Cortana) (Google Now)CIS 521 - Intro to AI4

Recent Significant Advances In NLPAutomatic Machine TranslationXinhua story (Chinese) Google ��发展。CIS 521 - Intro to AIThe Hague, March 24 Xinhua (ReporterChen Zhi Pan Governance ) ThirdNuclear Security Summit held in TheHague on the 24th . State President XiJinping attended and delivered animportant speech on China's nuclearsafety measures and achievements ,elaborated China on the developmentand security of both the rights andobligations of both, both independentlyand in collaboration , both temporaryand permanent nuclear security concept, called on the international communityto work together nuclear achieve lastingsecurity and development.5

Early Successes: Human Machine Interfaces SHRDLU (Winograd, 1969) A fragile demonstration of the fundamental vision LUNAR (Woods, Webber, Kaplan 1971) Answering geologist’s questions about the Apollo 11 moon rocksCIS 521 - Intro to AI6

Review: SHRDLU: A demonstration proofCIS 521 - Intro to AI7

LUNAR – William Woods 1971 NLP interface to database of analyses of Apollo11 moon rocks Examples What is the average concentration of aluminum in high alkalirocks? How many breccias contain olivine? Give me the modal analyses of those samples for all phases. Handled 78% of sentences typed by geologists at1971 Lunar Rocks conference (12% more with “minor fixes”)CIS 521 - Intro to AI8

The Past: Crucial flaws in the paradigmThese and other later systems worked well, BUT1. Person-years of work to port to new applications2. Very limited coverage of EnglishCrucially, they worked well because of a magical fact:People automatically adapt and limit their language givena small set of exemplars if the underlying linguisticgeneralizations are HABITABLEThis won’t handle pre-existing text!CIS 521 - Intro to AI9

The State of NLPNLP Past before 1995: Rich RepresentationsNLP Present: Powerful Statistical DisambiguationCIS 521 - Intro to AI10

A Few Core Technologies1. Named Entity Recognition& Information Extraction2. Machine Translation3. Text SummarizationCIS 521 - Intro to AI11

Information Extraction &Named Entity Recognition

Information Extraction Information extraction is the identification, in text, ofspecified classes of Named Entities —Relations—Events For relations and events, this includes finding theparticipants and modifiers (date, time, location, etc.). Goal: fill out a data base with given relation or event types:people’s jobs—people’s whereabouts—merger and acquisition activity—disease outbreaks—genomics relationCIS 521 - Intro to AI13

Extraction Example GeorgeGeorgeGarrick,Garrick,4040yearsyears old,old, president of the Londonbased European Information Services Inc., wasappointed chief executive officer earch,USA.Position CompanyPresident European InformationServices, Inc.CEOLocationLondonNielsen Marketing Research USACIS 521 - Intro to AIPersonGeorge GarrickStatusOutGeorge GarrickIn14

Named Entity RecognitionThe task: identify atomic elements of information intext Flag the who, where, when & how much in text Person namesCompany /organization namesLocationsDates & timesPercentagesMonetary amountsCIS 521 - Intro to AI15

Won‘t simple lists solve the problem? too numerous to include in dictionarieschanging constantlyappear in many variant formssubsequent occurrences might be abbreviated list search/matching doesn‘t perform wellCIS 521 - Intro to AI16

Levels of BBN Statistical Analysis (2005)SYugoslav President Slobodan Milosevic received on Thursday therepresentatives of the Association of Yugoslav Banks, headed by its presidentMilos Milosavljevic, who is also the general director of JugoBanka.VPNPPPName IS 521 - Intro to ldirectorofJugoBankaNPA17

Information Extraction fromPropositionsPropositions are normalized connections from the parse trees.Entities and relations are extracted statistically from propositions.Person: Milos MilosevicPosition: presidentOrganization: Associationof Yugoslav e PersonORGissubjargpresidentORGCIS 521 - Intro to AIargdirectorPerson GPEheadedwhoreceivedPerson: Milos MilosevicPosition: general directorOrganization: : Slobodan MilosevicPosition: presidentOrganization: Yugoslavia1818

Machine Translation(Several slides from Language Weaver)

Statistical Machine Translation TechnologySpanish/EnglishBilingual TextEnglish TextStatistical AnalysisSpanishQue hambre tengo yoCIS 521 - Intro to AIStatistical AnalysisBrokenEnglishWhat hunger have I,Hungry I am so,I am so hungry,Have I that hunger EnglishI am so hungry20

How A Statistical MT System LearnsCIS 521 - Intro to AI21

Translating a New DocumentCIS 521 - Intro to AI22

Broadcast MonitoringBBN MAPS & Language Weaver MTCIS 521 - Intro to AI23

CIS 521 - Intro to AI24

Language Weaver Hybrid Translation Technology Chinese Source TextSample �生 ��销售轿车18.8万辆,同比增长 下降1.1万辆,下降幅度近25%。Language Weaver Experimental Syntax MTSample 1 :The motor show, has always been the' barometer' of a national car consumption and market potential. The Beijing International AutoShow has more than 1,200 exhibitors from 24 countries and 8 days of receiving more than 40 million visitors, setting a new record inChina's auto show, are deeply aware of the automobile market signals."China is one of the largest automobile market in the world. Over the years, this phrase implies more auto businessmen. But now,more and more facts indicates that it is to become a reality.Data from the Motor Show is very convincing. The Beijing Qingnian Bao Report on-the-spot investigation showed that about 35percent of 35-year-old visitors, 62.1 percent of the respondents said that the truck was mainly to buy a car in the near future tocollect information, even at the exhibition may purchase or suitable products; 76% of respondents indicated in the past two years tobuy private cars.Since the beginning of this year, the strong growth of the domestic car market. According to the figures released by the NationalBureau of Statistics, in the first four months, the country produced 267,900 vehicles, up 27.6 percent; in particular, in April, theproduction of 90,000 vehicles, an increase of 50.5% over the same period last year, setting a record high for the monthly outputgrowth over the past 10-odd years. In terms of sales in the first quarter, manufacturing enterprises in the country sold 188,000 cars,up 22 percent over the same period of last year, up 10.5 percent; 11,000 vehicles, dropping by nearly 25 percent lower than thebeginning of the year.CIS 521 - Intro to AI25

Text Summarization(For more on this topic, check out coursestaught by Prof. Ani Nenkova)

(Includes work by Prof. Nenkova)CIS 521 - Intro to AI27

NLP: The Ultimate Goal (1990) The UltimateGoal –For computers to use NL as effectively as humans do . “Natural language, whether spoken, written, or typed, is the most natural means of communication between humans, and the mode of expression of choice for most of th