Transcription
State of Geospatial BigDataMansour aad@mraad 2016
“WHERE” IS UBIQUITOUS !@mraad 2016
Where is the closest ATM ? Where is the best location to place my store ? Where is UBL ? Where is the next Ebola/Zika outbreak ?@mraad 2016
A BIT OF HISTORY With Esri Specifically :-)@mraad 2016
1990@mraad 2016
http://en.wikipedia.org/wiki/Shapefile@mraad 2016
1995@mraad 2016
APITCP/IPSpatial Data EngineSQLRDBMS@mraad 2016ID,Name,Address,Lat,Lon
1996-2005@mraad 2016
xDBCRDBMS@mraad 2016ID,Name,Address,Lat,Lon
SPATIAL INDEXINGhttp://en.wikipedia.org/wiki/Spatial database#Spatial index@mraad 2016
d 2016
R-TREEhttp://en.wikipedia.org/wiki/R-tree@mraad 2016
(NOT SO) MODERN DAY @mraad 2016
STORY TIME @mraad 2016
U.S.DemographicData@mraad 2016
@mraad 2016
F O R E AC H LO C AT I ONF O R E AC H D E M O G R A P H I C 5 0 M I L E H E AT M A P@mraad 2016
@mraad 2016
TRADITIONAL MEANS 14 Days850 GB Raster Files@mraad 2016
BETTER WAY ?@mraad 2016
@mraad 2016
@mraad 2016
@mraad 2016
@mraad 2016
@mraad 2016
BIG DATA ?@mraad 2016
U R IN BIGDATA SPACE IF VolumeVelocity@mraad 2016Variety
BUT THEN I’VE SEEN VulnerabilityValue@mraad 2016 data at rest data in motion many types data in doubt data that is correct data in patterns data at risk data that is meaningful
I’M STICKING WITH VolumeVelocity@mraad 2016Variety
@mraad 2016
NOSQL(NOT ONLY SQL :-)@mraad 2016
GEOJSONhttp://geojson.org/@mraad 2016
{"type": "Feature","geometry": {"type": "Point","coordinates": [125.6, 10.1]},"properties": {"name": "Dinagat Islands"}}@mraad 2016
Points Lines Polygons Multipoints Multilines Multipolygons Geometry Collection@mraad 2016
Sortedbyte[]@mraad 2016byte[]Key1 Value1Key2 Value2 KeyN ValueN
GEOHASHhttp://en.wikipedia.org/wiki/Geohash@mraad 2016
if left of vertical center set left bit to 0 else 1if lower of horizontal center set right bit 0 else 190@mraad 20160111-90 00-18010180
901111010011000001 0011-90 0000 0010 1000-180@mraad 2016180
901111010011000001 0011-90 0000 0010 1000-180@mraad 2016180
901111010011000001 0011-90 0000 0010 1000-180@mraad 2016180
SPACE FILLING CURVES 1880@mraad 2016
N DIM 1DIM@mraad 2016
HILBERT CURVEhttp://en.wikipedia.org/wiki/Space-filling curve@mraad 2016
SPACE LINEARIZATION@mraad 2016
SPATIAL SUPPORTRTree@mraad 2016
INDIRECT SUPPORT@mraad 2016
WHAT IS OLD IS NEW AGAIN !@mraad 2016
SPATIAL MIDDLEWARE@mraad 2016
http://ws://Spatial Data EngineNoSQLAPINoSQL@mraad 2016ID,Name,Address,Lat,Lon
NOT A BIGGER OX @mraad 2016
HADOOP.APACHE.ORG@mraad 2016
WHAT’S IN A NAME ing-hadoop-in-5-pictures@mraad 2016
WHAT IS HADOOP ? Library / Framework Very Very Large Un/Structured Dataset Multi Node Distributed Processing Resilient To Commodity Hardware Failure@mraad 2016
HADOOP BASIC STACKMapReduceYet Another Resource Negotiator (YARN)Hadoop Distributed File System (HDFS)Commodity Servers@mraad 2016
OTHER HADOOP PROJECTS Avro - Serialization / RPC System HBase - Distributed Columnar Database Hive - Ad Hoc “SQL” Interface Pig - Data Flow Parallel Execution (AML) ZooKeeper - Coordination Service More .@mraad 2016
HDFS Distributed File System Lots and Lots of Commodity Drives Fault Tolerant Loves Big Files “POSIX” Like Interface@mraad 2016
HDFSHDFS ClientNameNodeDataNode@mraad 2016DataNodeDataNode
HDFS Resilience !HDFSDataNode@mraad 2016DataNodeDataNode
Program BigData@mraad 2016
ProgramBigData@mraad 2016
ed tutorial.html@mraad 2016
WHAT IS MAPREDUCE ? Parallel Fault Tolerant Framework Splits Large Input Invoke User Defined “Map” Function Shuffle and Sort Invoke User Defined “Reduce” Function@mraad 2016
MAPREDUCE & HDFS.jarClientTaskTracker@mraad askTrackerDataNode
WRITING MR IS HARD @mraad 2016
HOW ABOUT .NO PROGRAMING ?@mraad 2016
@mraad 2016
APACHE HIVE“SQL” MapReduce Job@mraad 2016
HQLdrop table if exists logs;create external table if not exists logs(ip string,method string,uri string,status string,bytes int,time taken int,referrer string,user agent string) partitioned by (year int, month int, day int, hour int)row format delimitedfields terminated by '\t'lines terminated by '\n'stored as textfilelocation ‘hdfs://hadoop:8020/logs/';@mraad 2016
OTHER ADHOC ENGINES Cloudera Impala Facebook Presto SparkSQL Bypass MR generation / Direct HDFS Access@mraad 2016
WHAT ABOUT SPATIAL ?@mraad 2016
@mraad 2016
GIS TOOLS FOR HADOOP Computational Geometry Library Hive Spatial UDF Functions GeoProcessing Extensions to ArcMap@mraad 2016
GEOMETRY LIBRARY Points / Lines / Polygons I/O (GeoJSON,WTK,WBT,Shape) Spatial Relations (inside, touches, intersects, ) Spatial Operations (buffer, cut, convex hull, ) In-Memory Spatial Index@mraad 2016
API USAGE IN BIGDATA Map-only jobs - GeoEnrichment Given set of locations Given demographic area Augment location with demographic attributes@mraad 2016
@mraad 2016
@mraad 2016
@mraad 2016
TELCO CDR - Call Data Record DateTime, UUID, LatLon, Duration, Status, etc Drop Call Emerging HotSpot Street Traffic Condition Massive Spatial Join million x million polygon Overlay with Demographic polygons Overlay with Current Weather Overlay with Social Media@mraad 2016
TELEMATICS Feedback to Engineering Car to Car Communication Street Condition Detection Best Route Prediction (EV) Overlay with weather@mraad 2016
Insurance as you drive !@mraad 2016
Observations@mraad 2016
@mraad 2016
TitleA Hadoop-enabled Ship TrackingApplication for the Port of RotterdamHadoop Summit, Brussels, 15 April 2015Frank Cremer (Geomatik)Mansour Raad (ESRI)A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 Copyright Port of Rotterdam‹#›
ImageAccess information in three clicksA Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 Copyright Port of Rotterdam‹#›
TextUsage of ship position data Harbour master Incident analysis Safety checks Capacity management Identifying bottlenecks Planning decision support Environmental management? Pollution (NOx) calculations Speed measures to reduce pollutionsA Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 Copyright Port of Rotterdam‹#›
Where is Δ 0 ?(Lat,lon)d@mraad 2016ΔD
@mraad 2016
@mraad 2016
@mraad 2016
of'passenger'drop2offs'Turtle'Bay'–'UN'@mraad 2016
Shu le'Loca,ons'!!!!!!!!!!Pickup!!!!!!!!!!!Drop off!@mraad 2016
Mission: Fast DynamicSegmentation for LinearReferencing
Event From Mile M to PMile LMile MSegment (L-M) has Green, Blue, Red attributesMile P
Step 1 - Clean and Bulk Load RoadsBulkLoad Feature - te code road id”,shape: “wkt-string”}
Step 2 - New Dyn-Seg GeoProcess{json}Clean Roads10M SearchIndex Every Document/Field{roadid “xxx”wkt “multilinestring(((x y, )IRI 1F SYSTEM 2 .}
Step 3 - Query and DisplayArcPyArcGISElasticSearch
DataCenters
ArcGIS NerdalizeElasticSearchArcPysshdDynSeg GPnerdalize.com
YOU CAN DO IT TOO !@mraad 2016
@mraad 2016
@mraad 2016
@mraad 2016
180 million entries - small :-)@mraad 2016
IMPORT BY TIME/SPACE*.csv@mraad 2016MRImporthdfs:// /yyyy/mm/dd/hh/uuid.csv
create external table if not exists trips (pickupdatetime string,dropoffdatetime string,pickupx double,pickupy double,dropoffx double,dropoffy double,passengercount int,triptime int,tripdist double,rc25 string,rc50 string,rc100 string,rc200 string) partitioned by (year int, month int, day int, hour int)row format delimitedfields terminated by '\t'lines terminated by '\n'stored as textfile;@mraad 2016
@mraad 2016
@mraad 2016
@mraad 2016
@mraad 2016
@mraad 2016
@mraad 2016
@mraad 2016
@mraad 2016
@mraad 2016
STATISTICAL SIGNIFICANCE ?@mraad 2016
HOTSPOT 0.1/index.html#//005p00000010000000@mraad 2016
PROCESSING EVOLUTIONTransaction - BatchOperational - DashboardAnalytics - ExplorationIntelligent - Realtime / Predictive@mraad 2016
WHAT IS NEXT ? In Memory Native Spatial Index In NoSQL DB Native Spatial Types (Point, Line, ) Out-of-the-box Spatial Operators / Operations Distributed/Disconnected GPU Integration Visualization via Gamification@mraad 2016
Q&AMansour aad@mraad 2016
@mraad 2016 HQL drop table if exists logs; create external table if not exists logs( ip string, method string, uri string, status string, bytes int, time_taken int, referrer string, user_agent string) partitioned by (year int, month int, day int, hour int) row format delimited