State Of Geospatial BigData

Transcription

State of Geospatial BigDataMansour aad@mraad 2016

“WHERE” IS UBIQUITOUS !@mraad 2016

Where is the closest ATM ? Where is the best location to place my store ? Where is UBL ? Where is the next Ebola/Zika outbreak ?@mraad 2016

A BIT OF HISTORY With Esri Specifically :-)@mraad 2016

1990@mraad 2016

http://en.wikipedia.org/wiki/Shapefile@mraad 2016

1995@mraad 2016

APITCP/IPSpatial Data EngineSQLRDBMS@mraad 2016ID,Name,Address,Lat,Lon

1996-2005@mraad 2016

xDBCRDBMS@mraad 2016ID,Name,Address,Lat,Lon

SPATIAL INDEXINGhttp://en.wikipedia.org/wiki/Spatial database#Spatial index@mraad 2016

d 2016

R-TREEhttp://en.wikipedia.org/wiki/R-tree@mraad 2016

(NOT SO) MODERN DAY @mraad 2016

STORY TIME @mraad 2016

U.S.DemographicData@mraad 2016

@mraad 2016

F O R E AC H LO C AT I ONF O R E AC H D E M O G R A P H I C 5 0 M I L E H E AT M A P@mraad 2016

@mraad 2016

TRADITIONAL MEANS 14 Days850 GB Raster Files@mraad 2016

BETTER WAY ?@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

BIG DATA ?@mraad 2016

U R IN BIGDATA SPACE IF VolumeVelocity@mraad 2016Variety

BUT THEN I’VE SEEN VulnerabilityValue@mraad 2016 data at rest data in motion many types data in doubt data that is correct data in patterns data at risk data that is meaningful

I’M STICKING WITH VolumeVelocity@mraad 2016Variety

@mraad 2016

NOSQL(NOT ONLY SQL :-)@mraad 2016

GEOJSONhttp://geojson.org/@mraad 2016

{"type": "Feature","geometry": {"type": "Point","coordinates": [125.6, 10.1]},"properties": {"name": "Dinagat Islands"}}@mraad 2016

Points Lines Polygons Multipoints Multilines Multipolygons Geometry Collection@mraad 2016

Sortedbyte[]@mraad 2016byte[]Key1 Value1Key2 Value2 KeyN ValueN

GEOHASHhttp://en.wikipedia.org/wiki/Geohash@mraad 2016

if left of vertical center set left bit to 0 else 1if lower of horizontal center set right bit 0 else 190@mraad 20160111-90 00-18010180

901111010011000001 0011-90 0000 0010 1000-180@mraad 2016180

901111010011000001 0011-90 0000 0010 1000-180@mraad 2016180

901111010011000001 0011-90 0000 0010 1000-180@mraad 2016180

SPACE FILLING CURVES 1880@mraad 2016

N DIM 1DIM@mraad 2016

HILBERT CURVEhttp://en.wikipedia.org/wiki/Space-filling curve@mraad 2016

SPACE LINEARIZATION@mraad 2016

SPATIAL SUPPORTRTree@mraad 2016

INDIRECT SUPPORT@mraad 2016

WHAT IS OLD IS NEW AGAIN !@mraad 2016

SPATIAL MIDDLEWARE@mraad 2016

http://ws://Spatial Data EngineNoSQLAPINoSQL@mraad 2016ID,Name,Address,Lat,Lon

NOT A BIGGER OX @mraad 2016

HADOOP.APACHE.ORG@mraad 2016

WHAT’S IN A NAME ing-hadoop-in-5-pictures@mraad 2016

WHAT IS HADOOP ? Library / Framework Very Very Large Un/Structured Dataset Multi Node Distributed Processing Resilient To Commodity Hardware Failure@mraad 2016

HADOOP BASIC STACKMapReduceYet Another Resource Negotiator (YARN)Hadoop Distributed File System (HDFS)Commodity Servers@mraad 2016

OTHER HADOOP PROJECTS Avro - Serialization / RPC System HBase - Distributed Columnar Database Hive - Ad Hoc “SQL” Interface Pig - Data Flow Parallel Execution (AML) ZooKeeper - Coordination Service More .@mraad 2016

HDFS Distributed File System Lots and Lots of Commodity Drives Fault Tolerant Loves Big Files “POSIX” Like Interface@mraad 2016

HDFSHDFS ClientNameNodeDataNode@mraad 2016DataNodeDataNode

HDFS Resilience !HDFSDataNode@mraad 2016DataNodeDataNode

Program BigData@mraad 2016

ProgramBigData@mraad 2016

ed tutorial.html@mraad 2016

WHAT IS MAPREDUCE ? Parallel Fault Tolerant Framework Splits Large Input Invoke User Defined “Map” Function Shuffle and Sort Invoke User Defined “Reduce” Function@mraad 2016

MAPREDUCE & HDFS.jarClientTaskTracker@mraad askTrackerDataNode

WRITING MR IS HARD @mraad 2016

HOW ABOUT .NO PROGRAMING ?@mraad 2016

@mraad 2016

APACHE HIVE“SQL” MapReduce Job@mraad 2016

HQLdrop table if exists logs;create external table if not exists logs(ip string,method string,uri string,status string,bytes int,time taken int,referrer string,user agent string) partitioned by (year int, month int, day int, hour int)row format delimitedfields terminated by '\t'lines terminated by '\n'stored as textfilelocation ‘hdfs://hadoop:8020/logs/';@mraad 2016

OTHER ADHOC ENGINES Cloudera Impala Facebook Presto SparkSQL Bypass MR generation / Direct HDFS Access@mraad 2016

WHAT ABOUT SPATIAL ?@mraad 2016

@mraad 2016

GIS TOOLS FOR HADOOP Computational Geometry Library Hive Spatial UDF Functions GeoProcessing Extensions to ArcMap@mraad 2016

GEOMETRY LIBRARY Points / Lines / Polygons I/O (GeoJSON,WTK,WBT,Shape) Spatial Relations (inside, touches, intersects, ) Spatial Operations (buffer, cut, convex hull, ) In-Memory Spatial Index@mraad 2016

API USAGE IN BIGDATA Map-only jobs - GeoEnrichment Given set of locations Given demographic area Augment location with demographic attributes@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

TELCO CDR - Call Data Record DateTime, UUID, LatLon, Duration, Status, etc Drop Call Emerging HotSpot Street Traffic Condition Massive Spatial Join million x million polygon Overlay with Demographic polygons Overlay with Current Weather Overlay with Social Media@mraad 2016

TELEMATICS Feedback to Engineering Car to Car Communication Street Condition Detection Best Route Prediction (EV) Overlay with weather@mraad 2016

Insurance as you drive !@mraad 2016

Observations@mraad 2016

@mraad 2016

TitleA Hadoop-enabled Ship TrackingApplication for the Port of RotterdamHadoop Summit, Brussels, 15 April 2015Frank Cremer (Geomatik)Mansour Raad (ESRI)A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 Copyright Port of Rotterdam‹#›

ImageAccess information in three clicksA Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 Copyright Port of Rotterdam‹#›

TextUsage of ship position data Harbour master Incident analysis Safety checks Capacity management Identifying bottlenecks Planning decision support Environmental management? Pollution (NOx) calculations Speed measures to reduce pollutionsA Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 Copyright Port of Rotterdam‹#›

Where is Δ 0 ?(Lat,lon)d@mraad 2016ΔD

@mraad 2016

@mraad 2016

@mraad 2016

of'passenger'drop2offs'Turtle'Bay'–'UN'@mraad 2016

Shu le'Loca,ons'!!!!!!!!!!Pickup!!!!!!!!!!!Drop off!@mraad 2016

Mission: Fast DynamicSegmentation for LinearReferencing

Event From Mile M to PMile LMile MSegment (L-M) has Green, Blue, Red attributesMile P

Step 1 - Clean and Bulk Load RoadsBulkLoad Feature - te code road id”,shape: “wkt-string”}

Step 2 - New Dyn-Seg GeoProcess{json}Clean Roads10M SearchIndex Every Document/Field{roadid “xxx”wkt “multilinestring(((x y, )IRI 1F SYSTEM 2 .}

Step 3 - Query and DisplayArcPyArcGISElasticSearch

DataCenters

ArcGIS NerdalizeElasticSearchArcPysshdDynSeg GPnerdalize.com

YOU CAN DO IT TOO !@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

180 million entries - small :-)@mraad 2016

IMPORT BY TIME/SPACE*.csv@mraad 2016MRImporthdfs:// /yyyy/mm/dd/hh/uuid.csv

create external table if not exists trips (pickupdatetime string,dropoffdatetime string,pickupx double,pickupy double,dropoffx double,dropoffy double,passengercount int,triptime int,tripdist double,rc25 string,rc50 string,rc100 string,rc200 string) partitioned by (year int, month int, day int, hour int)row format delimitedfields terminated by '\t'lines terminated by '\n'stored as textfile;@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

STATISTICAL SIGNIFICANCE ?@mraad 2016

HOTSPOT 0.1/index.html#//005p00000010000000@mraad 2016

PROCESSING EVOLUTIONTransaction - BatchOperational - DashboardAnalytics - ExplorationIntelligent - Realtime / Predictive@mraad 2016

WHAT IS NEXT ? In Memory Native Spatial Index In NoSQL DB Native Spatial Types (Point, Line, ) Out-of-the-box Spatial Operators / Operations Distributed/Disconnected GPU Integration Visualization via Gamification@mraad 2016

Q&AMansour aad@mraad 2016

@mraad 2016 HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes int, time_taken int, referrer string, user_agent string) partitioned by (year int, month int, day int, hour int) row format delimited