EUROPE MEDIA MONITOR - Emm.newsbrief.eu

Transcription

EUROPE MEDIAMONITOR

TABLE OF CONTENTS:Introduction - page 1Information gathering - page 2Information presentationNewsBrief - page 4MyNews - page 5Mobile Devices - page 6Big Screen Map - page 7EMM Map - page 8Customised domainMedisys - page 9Editing ToolsNewsDesk - page 10Channel Editor - page 11Category Editor - page 12Information AnalysisTrend Impact Analysis - page 13Media Impact Analysis - page 14Ongoing ResearchSentiment Analysis - page 15Event Detection System - page 16Named Entity Guesser - page 17JRC Names Resource - page 18Translation System - page 19NewsExplorer: Analysis over time and across languages - page 20EMM OSINT Suite - page 21

1INTRODUCTIONFor those who are not familiar with it,EMM is the Europe Media Monitor, asystem for monitoring open source newsinformation. EMM is developed andmaintained by the Data & Text MiningUnit, in the Competences Institute of theEC Joint Research Centre (JRC).EMM was started in 2002 as a project tosupport the Commission with its mediamonitoring activities. The main purpose ofEMM is to provide monitoring of a large(but selected) set of electronic media,reducing the information flow tomanageable proportions by clusteringrelated news, categorising articles andapplying Language Technology tools toderive further meta-data, such asrecognising and disambiguating entities inthe text, extracting quotes by and aboutpeople, applying sentiment/tonalityanalysis and more.The system continuously monitors over7.000 HTML pages and RSS feeds to findnew articles published on the Internet( 250.000 articles daily). It then readsand analyses these articles and extractsinformation, like references to people,organisations and places in the news,extracts quotes, groups articles intocategories and clusters similar articles.This last process in effect creates a viewof the current biggest stories in the newsin a certain language.Highlights: New map, allowingeasy visualisation ofwhat is going onwhere. New mobile apps,Category EditorCollaboration Layerand, Lots of ‘behind thescenes’ developmentswhich are making oursystems more reliabletools for daily mediamonitoring.We would be very pleased toreceive your feedback, andwould gladly provide you withfurther information.Please contact us atemm@jrc.ec.europa.euFor more information about theJRC please visit the followinglink:https://ec.europa.eu/jrc/

INFORMATIONGATHERINGThe Europe Media Monitor isdesigned as a near real-timemonitoring system for newpublications. Thesystem analyses publications as theyflow through and continuouslygenerates the required informationproducts, without storing a copy of theoriginal publication. It does not rely on(and does not have) a biginformation archive. Although EMMmaintains an index of all retrievedmaterial, allowing for limited historicalresearch, the information productsalways refer to the originalpublication, mostly on the Internet.At the core of EMM there is a chain oflightweight extensible processes eachrunning independently and chainedtogether using robust and reliablein-house developed web servicearchitecture. Articles begin their flowthrough the processing chain as thinRSS (Really Simple Syndication)items that grow as metadata getsadded at each stage of theprocessing chain.EMM has been expanded with socialmedia monitoring functionality.Currently, we are extracting the mostfrequent URLs, hash tags and Twitsthat are related to the most recentdisease outbreaks, violent events anddisasters.2Data collection(scraper andgrabber)Highlights: Article extractionfrom HTML feedswithout the need forcustom xslts Website scraping forunstructured sites Access to the Internetusing configurableproxy servers anduser agents Improved handling ofbadly formatted RSSfeeds All RSS now have an‘origin’ tag EMM has beenexpanded with socialmedia monitoringfunctionality.

3INFORMATIONPRESENTATIONThe results of the information harvestingand processing can be accessed in anumber of ways: a NewsBrief website(e.g. http://emm.newsbrief.eu) thatallows for classical data browsing, anda full editorial and publishing systemNewsDesk (not publicly accessible) thatallows for the creation and publication ofhigh level information products. EMMdelivers emails and RSS feeds andthere are (free) mobile applications foriPhone, iPad and Android tablets.Examples of current applications of theEMM technology can be found indifferent application domains. EMM isused in a number of traditional mediamonitoring applications by various EUInstitutions and Agencies. MediSys(http://medisys.newsbrief.eu) is aninstance of EMM specifically developedfor internet bio-surveillance and is usedby a number of Health Agencies,including the WHO. Open sourceintelligence for humanitarian and conflictearly warning is also covered by at least3 instances of the EMM system.MyNews is a web interface designed fordesktop browsers, for the news itemssupplied by the EMM engine. It’s highlycustomisable, since it allows each userto define his/her own specific view byselecting the topics he/she’s mostinterested in. This is achieved – similarlyto the EMM mobile apps - by allowinghim/her to tune news channels focusedon very specific topics.EMM comes to you indifferent views: NewsBrief NewsDesk MyNews MediSys EMM Mobile AppsUsers can create as manychannels as they like, and theycan organise them in sets. Thereare many different ways they cancreate new channels, whichincreases greatly the flexibility ofthe tool.At the moment, the publiclyaccessible instance of EMMmonitors almost 20 000 RSSfeeds/HTML pages from over7000 media websites andretrieves and processes around300 000 new news articles perday. These articles arecategorized into over 2000categories. A selected subset ofthese categories and the resultsof the clustering process can beseen on the public EMM websitehttp://emm.newsbrief.eu.

INFORMATION PRESENTATIONNEWSBRIEFEMM NewsBrief is a public website thatprovides many different views on thenews published right now.The NewsBrief pages mostly reflect thecategorisation and the topic-basedclustering. The typical front page which isshown when you go to http://emm.newsbrief.eu is the result of theclustering system. Most pages accessiblethrough the menu system reflect the resultof the categorization process.The categories are predefined and cannotbe modified by the public. On every'category page' you can click on the '?'icon to see how a particular category isdefined. Most of these categories aredefined by domain experts and are madeavailable to you.Some of the categories were defined byus, mostly when we were first developingthe system and tried to generate somemeaningful content.On every page you can choose to receivethe news by e-mail, or you can use theinformation through the RSS feed.4Feel free to include any RSSfeed on your website, but pleasegive us credit and include a linkto the EMM website as areference on your page.You can also search EMM forinformation as we do maintain anindex for all articles that weprocess. The results include alink to the original article,although in some cases it mayno longer be available.The primary aim ofthe system is toprovide you with frequent and near realtime news updates ontopics of interest toyou.

5INFORMATION PRESENTATIONMYNEWSMyNews is a highly customisable web interface that gives access to thenews items produced by the EMM engine.It is user-driven, for its main focus is about offering you the possibility tocreate your own personal view by means of many different customisationoptions. It is based upon a “TV metaphor”: the users, as if sitting in front of aTV with a remote control, can tune into different channels on the specifictopics they are interested in.There are many different types of channels users can choose from:Category channels, associated with the EMM categories.Country channels, associated with the source countries, or with thecountries the news articles talk about.Multilanguage top stories, associated with the EMM most activeclusters in any given language.Person or Organisation channels: associated with collections ofseveral EMM categories and/or entities.Search channels, associated with queries performed on text andmetadata extracted from the news articles.Channels are organised into sets, thus you can have many sets, each onewith as many channels as you like. The structure of sets and channels iseasily editable at any time, and recorded on the server for subsequentaccess(es).

MYNEWSWhen you get into a set, you see the“cover sheets” of its channels, representedby boxes. By clicking on one box you getinto the details of the channel: the list ofarticles with the representation of all theassociated metadata (categories, entities,geotags, quotes, etc.). The information isalso enriched with several graphical tools:a map with the distribution of the articles,several charts, multi-language word clouds,etc.Several refinement tools are provided: youcan filter the articles based upon sources,countries, attributes (i.e.: categories andentities), languages and date/time range.Highlights: Highly and easilycustomizable on a per-userbasis Many different visualrepresentations of data(charts, maps, word clouds,etc.) Newsletters in HTML, PDF orMS-Word format, based onselection of articles. Advanced Search channelsactively catching new articlesthat satisfy user-definedqueriesTop story channels display a listof stories, ordered by relevance– i.e. stories/clusters by topic –which are most active in thatvery moment or over the last 24hours. For each of the selectedlanguages, the twenty topstories are displayed.Each story is listed with its mainarticle and the representation ofall the associated metadata.

INFORMATION PRESENTATIONMOBILEDEVICESThe EMM iPhone and Android mobileApps provide up-to-the-minute resultsusing Automatic Text Analysis of newsarticles from around the world (over300 000 new news articles per day).Both the Apps and EMM desk systemsupport more than 70 differentlanguages. In-line translation to Englishfrom Arabic, Czech, Chinese, Danish,French, German, Italian, Polish,Portuguese and Swedish is supportedtoo. The automatic story detection,groups the articles reported on the samesubject, tracking the stories as theydevelop over time.Highlights: Android phone versionreleased Customized version forCERT-EU released Other customizationfor some clients will bereleased soon6All apps support automaticdetection of people &organisations and produceviews of what was said byand about people ororganisations.

INFORMATION PRESENTATIONMOBILEDEVICESReal-time NotificationsReal-time alerts allowcustom notificationsbased on changes inthe specific data setthe user has defined.When a logicalthreshold is activatedthe system displays anotification directlyon the user’s mobiledevice.By merging our notifications with the system’s corenotification we alert the user only when it is appropriate.For example, notification will wait silently when the useris asleep and will schedule the notifications to bepresented a few minutes after theuser has started using the device.This is being done without any userintervention or pre-settings.Supporting Android, iOS, PC, Linux & MacOS

INFORMATION PRESENTATION7BIG SCREENMAPThe Big Screen Map is an application that automatically loops through thelatest news from the EMM system. It is designed to run on large-formatscreens. The application is fully configurable providing the ability to selectthe languages and the categories to be displayed.New developments in the application are: Expanding the list of clients (JRC, APIIPA, Cert,Frontex, OAS, Europol, African Union, FRA) Integration with Finder (ability to run any finder queryand loop through the returned articles) App updated to work with version 2 of the emmApp API Introduction of Configuration Sets (multiple clients candisplay different data using a single instance of theapplication)

INFORMATION PRESENTATIONEMM MAP8The EMM Map is another useful and popular application that shows thegeographical distribution of the news in the EMM system. The news can bedisplayed by top stories, 24 hour stories, country, category and entity. Timelinescan be displayed for stories.New developments include the possibility to set a default configuration whenthe application starts.The first production installation of the application was done at the EuropeanLaboratory for Structural Assessment Unit, JRC, followed by CERT, EEAS,Frontex and FRA.

CUSTOMISED DOMAIN9MEDISYSMediSys is an instance of EMM specifically developed for internet biosurveillance and is used by a number of Health Agencies, including ECDC,EFSA and WHO. A system for the detection of disease-related informationpublished on Twitter was deployed as part of the MediSys website.The development process is driven byour ongoing collaborations with ECDCon communicable diseases, EFSA onfood safety and plant health andEMCDDA on psychoactive substances.We support EU member states in theirsurveillance efforts and work withinternational partners such as WHO andG7.In 2016, we have seen an intense mediainterest in the Zika virus epidemic in theAmericas. The spread of the virus andreports on birth defects were monitoredin news media and Twitter.We are collaborating with EFSA,university of Lleida and Institutd'Investigació de la Generalitat deCatalunya (IRTA) on monitoring planthealth threats. More than 150 categorieson bacteria, fungi, insects, mollusks,nematodes, oomycetes and viruses thatpose a threat to plant health have beenadded to MedISys. An entire ontology of350 plant health threats has beendeveloped.In support of EU member states, wehave set up accounts for analysts inItaly for monitoring public health eventsin Italy during mass gathering events.The NewsDesk tool is used forproducing newsletters and sending themto all stakeholders.MedISys is also used asprocessing chain for theWHO HDRAS portal withfunctionality for commentingand risk assessments inuser groups. A similar portalis routinely used by the G7countries within the GHSAGEAR project. We areinvolved in the developmentof the future WHO EIOSportal which will allowvarious user groups toanalyze, assess andcomment on news jointly.

EDITING TOOLSNEWSDESK10NewsDesk is a groupware application that allows a community of usersorganised in workgroups to create reports and newsletters by selectingnews items coming from registered sources as well as manually uploadeddocuments. NewsDesk offers a wide range of tools and features to ease theprocess of collecting, searching and filtering news items.Although the Open Source monitoring remains the EMM core business( 300.000 automatically analyzed news per day), the newly developed“PressReview” module addresses another need common to severalbusinesses: the aggregation of human-moderated content. The supportedbusiness process is often named “PressReview” because one of the mainproducts is the Daily Press Review report: a daily selection of mostrepresentative news in the press from different countries.

NEWSDESKThe IT infrastructure needed to supportthe press review workflow implements adistributed system that allows severalgroups of analysts to cooperate in thecollection, tagging, and publishing ofregional content. A central workgroupoversees the activities of the others andprepares products that aggregatecontents from all the countries. The aboveuse case has been recently implementedwithin the European Parliament MediaMonitor Platform (EPMM).The EP press review involves differentactors across all the 28 EU countries. Ineach country there is an EP InformationOffice, responsible for coordinating thecollection of news items for that country.The cuttings are manually uploaded everyday by a contractor company (one foreach country) and uploaded via theDocument Upload module of NewsDesk.Each country has a dedicated workgroupin NewsDesk. All the country workgroupsare orchestrated and supervised by theHeadquarter workgroup located at EPpremises.The news items Document Uploadmodule lets the users define metainformation about the item, like title anddescription in two languages, thepublication date, whether the EP ismentioned in the title of the news item, thetype of the uploaded item, the source, andso forth. Probably one of the mostinteresting features is the possibility forthe user to manually assign one or morecategories to the uploaded item. Everyuploaded item will flow then through theEMM processing chain where thecategorisation system automatically addsadditional categories to the item (basedon alert and filter specification).So at the end of their journeythrough the press reviewsystem all the news items aretagged both by analysts and theautomatic categorisationsystem.Once all the cuttings have beenuploaded in NewsDesk, eachworkgroup makes a finalselection of the most significantitems for that specific country byadding them to one or morenewsletters/reports. At the endof the selection process eachnewsletter can still be editedand further refined. After theediting step, the final product the newsletter in HTML, PDFand DOCX formats - isgenerated and sent to thesubscribers. In the meanwhile,the Headquarter workgroup alsopublishes its own newsletterswith a selection of items comingfrom all the countries.All workgroups can also accessthe System View module ofNewsDesk to retrieve statisticsof uploaded and published newsitems in order to perform theaccounting process.Highlights: Items are tagged both byanalysts and theautomatic categorisationsystem Creation of reports andnewsletters

EDITING TOOLS11CHANNELEDITORThe Channel Editor applicationallows complete management ofthe sources monitored by theEMM system. Sources can beeasily filtered with the advancedsearch functionality. Also, theapplication features a sourcevalidation mechanism to ensurearticles can be properly read bythe system. The flexible exportoptions allow different sets ofsources to be published to variousprocessing chains or to be savedin xml and xlsx format.Highlights: Ability to import achannel directory into theapplication Global source validation(produces an xls reportfor all sources) Audit functionality (trackwhich users worked on aspecific source) Integration with Scraper/Grabber logs (being ableto monitor from withinthe application theoutput and health of asource)

EDITING TOOLSCATEGORYEDITOR12The Category Editor application manages the definition files used by the EMMsystem to categorise incoming information. The definition files are kept in acentral repository and the application allows multi-user access and lockingmanagement for the repository. Through its flexible publishing mechanism, theapplication can use a single repository to serve multiple processing chains.Category Editor Collaboration LayerThe Category Editor Collaboration Layer is a completely new concept of themost powerful EMM tool, the Category Editor. The Collaboration layer allowsusers from different organisations to work together on the category definition.Each organisation retains complete control over its own category repositorywhile at the same time being able to engage with other partners in order toproduce improved definitions that will yield better data. A deep integration withversioning software allows for easy merging, rollback and comparison betweencategory definitions.A notification system that sends messages to users when a new version isadded to the collaboration layer is also developed.

INFORMATION ANALYSIS13TREND IMPACTANALYSIS (TIA)Trend Impact Analysis (TIA) is a new tool that allows users to explore trends inreporting. They can analyse articles collected by EMM in multiple dimensionsand interactively, from multiple perspectives. This includes a broad range ofinformation resulting from EMM's automatic analysis. The supporteddimensions are time (day, month, year, epoch), topicof article, language of the article, country of publisherand reach of publisher (International, National,Regional or Local). These dimensions can

EMM is the Europe Media Monitor, a system for monitoring open source news information. EMM is developed and maintained by the Data & Text Mining Unit, in the Competences Institute of the EC Joint Research Centre (JRC). EMM was started in 2002 as a project to support the Commission