Postgres 10 Ways To Load Data Into

Transcription

10 WAYS TO LOAD DATA INTOPOSTGRESREGINA OBE AND LEO HSUhttp://www.paragoncorporation.comBuy our books! at http://postgis.us/page buy bookOUR LATEST BOOKpgRouting: A Practical Guide http://locatepress.com/pgrouting1

CATEGORIES OF LOADING WE'LL COVERServer-SideSQL COPY / COPY FROM PROGRAMLarge Object storage SQL functionsForeign Data Wrappers (FDWs)Client-SidePSQL \copy and \copy FROM PROGRAMPSQL Large Object support functionsOther commandline tools: ogr2ogr, shp2pgsqlNeed not be on same server as Postgres service2

LOADING DELIMITED FILES WITH SQL COPY (SERVERSIDE)postgres daemon account needs to have access to filesUser has to have super user rights to Postgres service3.1

STEP 1: CREATE STAGING TABLEHas to match the structure of the file. Using film locations v?accessType DOWNLOADCREATE TABLE film locations(title text ,release year integer ,locations text ,fun facts text ,production company text ,distributor text ,director text ,writer text ,actor 1 text ,actor 2 text ,actor 3 text );3.2

STEP 2 (FROM FILE): LOAD THE DATA USING SQL COPYCOPY film locationsFROM '/data talk/csvs/Film Locations in San Francisco.csv' HEADER CSV DELIMITER ',';3.3

STEP 2 (OUTPUT FROM PROGRAM): LOAD THE DATAUSING SQL COPY FROM PROGRAMRequires PostgreSQL 9.3 COPY film locationsFROM PROGRAM 'wget -q -O - " @" sv?accessType DOWNLHEADER CSV DELIMITER ',';3.4

LOADING DELIMITED FILES WITH PSQL \COPY (CLIENTSIDE)psql client needs to have access to the filesUser initiating does not need super user rights to database,but needs to have permissions to the filesCould be slow if postgres server is not on same local networkas client.4.1

STEP 1: CREATE STAGING TABLEHas to exactly match the structure of the file. Using filmlocations - ?accessType DOWNLOADCREATE TABLE film locations(title text ,release year integer ,locations text ,fun facts text ,production company text ,distributor text ,director text ,writer text ,actor 1 text ,actor 2 text ,actor 3 text );4.2

STEP 2: LOAD THE DATA WITH \COPY FROM\copy film locations FROM '/data talk/csvs/Film Locations in San Francisco.csv' HEADER CSV DELIMiTER4.3

STEP 2 ALTERNATIVE: LOAD THE DATA USING \COPYFROM PROGRAMRequires psql compiled for PostgreSQL 9.3 \copy film locations FROM PROGRAM 'wget -q -O - " @" 4

SERVER SIDE: LOADING BINARY FILESLoading documents and images into a database table fromserver's file system.Use COPY FROM PROGRAM (PostgreSQL 9.3 ) in conjunctionwith Large Object support (LO)5.1

STEP 1: CREATE STAGING TABLECREATE TABLE tmp docs(file name text PRIMARY KEY);5.2

STEP 2: GET LIST OF FILESPull list from folder with COPY FROM PROGRAMWindowsCOPY tmp docs FROM PROGRAM 'dir C:\data /b /S'WITH (format 'csv');Unix/LinuxCOPY tmp docs FROM PROGRAM 'ls /data/* -R'WITH (format 'csv');5.3

STEP 2: ADD FIELDS TO HOLD FILE LINK ID AND BLOBOF THE FILESALTER TABLE tmp docs ADD COLUMN doc bytea, ADD COLUMN doc oid oid;5.4

STEP 3: LOAD THE BINARY DATA-- add the document to large object storage and return the link idUPDATE tmp docs SET doc oid lo import(filename);-- pull document from large object storageUPDATE tmp docs SET doc lo get(doc oid);-- delete the files from large object storageSELECT lo unlink(doc oid)FROM tmp docs;5.5

CLIENT SIDE: LOADING BINARY FILES USING PSQLLoading documents and images into a database table fromclient's file system.Use PSQL \copy and \lo * functions and SQL to generate aload script6.1

STEP 1: CREATE STAGING TABLENote this is same as what we did for the server side approachCREATE TABLE tmp docs(file name text PRIMARY KEY);6.2

STEP 2: GET LIST OF FILESPull list from folder with PSQL \copy FROM PROGRAM (psqlpackaged with 9.3 )Windows\copy tmp docs FROM PROGRAM 'dir C:\data /b /S'WITH (format 'csv');Unix/Linux\copy tmp docs FROM PROGRAM 'ls /data/*'WITH (format 'csv');6.3

STEP 2: ADD FIELDS TO HOLD FILE LINK ID AND BLOBOF THE FILESALTER TABLE tmp docs ADD COLUMN doc bytea, ADD COLUMN doc oid oid;6.4

STEP 3: GENERATE A LOAD SCRIPT FILE\t on returns only tuples (no header), and \x off turns offexpanded mode, and \a toggles axis align\o /temp/loadscript.psql\t on\x off\aSELECT '\lo import ' quote literal(replace(file name, '\', '/')) 'UPDATE tmp docs SET doc oid :LASTOIDWHERE file name ' quote literal(file name) ';'FROM tmp docs;\o6.5

STEP 4: RUN THE LOAD SCRIPT FILE GENERATED INSTEP 3the load script file will look something like this\lo import '/scans/file1.pdf'UPDATE tmp docs SET doc oid :LASTOIDWHERE file name E'/scans/file1.pdf';\lo import '/scans/file2.pdf'UPDATE tmp docs SET doc oid :LASTOIDWHERE file name E'/scans/file2.pdf';run the load script file generated in step 3\i /temp/loadscript.psql6.6

STEP 5: SAME AS SERVER SIDE, USING SERVER SIDEFUNCTIONS GRAB THE BLOB AND DELETE THE FILEFROM LARGE STORAGE-- pull document from large object storageUPDATE tmp docs SET doc lo get(doc oid);-- delete the files from large object storageSELECT lo unlink(doc oid)FROM tmp docs;6.7

USING FOREIGN DATA WRAPPERS TO LOAD DATAfile fdw: use to read flat files and flat outputs. New inPostgreSQL 10 can read from commandline programspostgres fdw: use to query other postgres serversogr fdw - use to query and load spatial formats and alsoother relational and flat (e.g. spreadsheets, odbc datasources, dbase files, openstreetmap datafile text array - loads each row of data into an array great where number of columns on each row is not the samelike data consisting of orders on one row followed by lineitems.Honorable mentions: multicorn, odbc fdw, mysql fdw,oracle fdw, db2 fdw, tds fdw7

FILE file-fdw.htmlGenerally available with most PostgreSQL packages, mayrequire installing postgresql-contrib if no by default includedRequires super user to create a foreign table, but usermappings control access.New in PostgreSQL 10: can read from output of programssimilar to COPY FROM PROGRAM.8.1

STEP 1: INSTALL EXTENSION AND CREATE FILE FDWFOREIGN SERVERCREATE EXTENSION file fdw;CREATE SERVER svr file FOREIGN DATA WRAPPER file fdw;8.2

STEP 2 (FILE VERSION): CREATE FOREIGN TABLE TO AFILE SYSTEM FILECREATE FOREIGN TABLE fdt film locations(title text ,release year integer ,locations text ,fun facts text ,production company text ,distributor text ,director text ,writer text ,actor 1 text ,actor 2 text ,actor 3 text )SERVER svr fileOPTIONS ( format 'csv', header 'true',filename '/data talk/csvs/Film Locations in San Francisco.csv',delimiter ',',null '');8.3

STEP 2 (PROGRAM VERSION): CREATE FOREIGN TABLEFROM PROGRAM OUTPUTRequires PostgreSQL 10 . This will pull the website data onevery query of table.CREATE FOREIGN TABLE fdt film locations(title text ,release year integer ,locations text ,fun facts text ,production company text ,distributor text ,director text ,writer text ,actor 1 text ,actor 2 text ,actor 3 text )SERVER svr fileOPTIONS ( format 'csv', header 'true',program 'wget -q -O - " @" elimiter ',',null '');8.4

POSTGRES FDW: READ FROM OTHER POSTGRESSERVERSPart of standard extension offering so should already havethe binariesCan read from higher/lower postgres versions, but somefeatures are disabled if both not of same higher version.Requires super user to create a foreign table, but usermappings control access.New in PostgreSQL 10: Aggregates can be pushed down,which means things like COUNT(*), MAX(*) etc are muchfaster across databases. More joins can be pushed to remoteserver thus making cross joins between two databasesfaster.9.1

STEP 1:INSTALL THE EXTENSION IN YOUR DATABASECREATE EXTENSION postgres fdw;9.2

STEP 2:CREATE FOREIGN SERVERCREATE SERVER remote dbFOREIGN DATA WRAPPER postgres fdwOPTIONS (host 'faraway.host.com', dbname 'db', port '5432');9.3

STEP 3:CREATE USER MAPPINGS (CAN BE A GROUP ORUSER)CREATE USER MAPPING FOR public SERVER remote db OPTIONS (user 'pubinfo', password9.4

STEP 4:LINK IN THE TABLESCREATE SCHEMA remote public;-- requires PostgreSQL 9.5IMPORT FOREIGN SCHEMA public FROM SERVER remote db INTO remote public;9.5

POSTGRESQL GDAL (OGR) POSTGIS OGR FDWPOSTGRESQL MORE THAN SPATIAL FOREIGN DATAWRAPPERDoesn't require PostGIS to use, but will expose spatial columnsas PostGIS geometry if PostGIS is installed.10 . 1

USE OGR FDW EXTENSIONIf you have all sorts of data of both a spatial and non-spatialflavor to tame, make sure you have ogr fdw foreign datawrapper in your tool belt.For windows users using EDB distribution, it's part ofPostGIS bundle (versions 2.2 and up) on applicationstackbuilder.For windows/linux/mac desktop users, it's part of the BigSQLPostGIS package.For CentOS/Red Hat/Scientific etc, it's available viayum.postgresql.orgAvailable via debian and apt.postgresql.orgFor others, if you have PostGIS with GDAL support, just needpostgresql dev package to compile. Download the sourcehttps://github.com/pramsey/pgsql-ogr-fdw10 . 2

WHAT CAN OGR FDW READ?You have the combined power of Geospatial Data AbstractionLayer (GDAL), PostgreSQL, and any PostgreSQL extension youwant (including PostGIS) working seamlessly together. Somany kinds of data you can query and take advantage ofPostgreSQL functions and any extension functions and typessuch as PostGIS, hstore, built-in json/jsonb to tame your data.SpreadsheetsODBC datasourcesOther relationalOSM files (OSM, PBF)Dbase filesESRI ShapefilesSpatial web servicesMany more10 . 3

INSTALL BINARIESMake sure to use version for your PostgreSQL, examples beloware for 10Yum (CentOS, RedHat going via yum.postgresql.org) yum install ogr fdw10Debian/Ubuntu (via apt.postgresql.org) apt install postgresql-10-ogr-fdwWindows via application Stackbuilder - included as part ofPostGIS bundle 2.2 and above bundles.BigSQL (linux/windows/Mac) - included as part of thePostGIS install:pgc install postgis24-pg1010 . 4

ENABLE IT IN YOUR DATABASECREATE EXTENSION ogr fdw;10 . 5

LOAD IN FOLDER OF CSV FILESCREATE SCHEMA IF NOT EXISTS staging;CREATE SERVER svr csv FOREIGN DATA WRAPPER ogr fdwOPTIONS (datasource '/fdw data/csvs', format 'CSV');-- requires PostgreSQL 9.5 IMPORT FOREIGN SCHEMA ogr all FROM SERVER svr csv INTO staging;10 . 6

OTHER RELATIONAL DATABASESFormat for SQL Server ODBC'ODBC:your user/your password@yourDSN,table1,table2'.ODBC can be slow with a lot of tables (more than 150) so filterlist if you have over 200 tablesCREATE SERVER svr sqlserver FOREIGN DATA WRAPPER ogr fdwOPTIONS (datasource ssueNotes',format 'ODBC');CREATE SCHEMA IF NOT EXISTS ss;IMPORT FOREIGN SCHEMA "dbo."FROM SERVER svr sqlserver INTO ss;\dE ss.*List of relationsSchema Name Type Owner-------- ---------------- --------------- ---------ss dbo issuelog foreign table postgresss dbo issuenotes foreign table postgres(2 rows)10 . 7

SPREADSHEETSEach workbook is considered a server and each sheet a tableCREATE SERVER svr currency ratesFOREIGN DATA WRAPPER ogr fdwOPTIONS (datasource '/fdw data/ExchangeRates.xlsx',format 'XLSX',config options 'OGR XLSX HEADERS FORCE');CREATE SCHEMA staging;-- link only 2 spreadsheets preserve headers (requires PostgreSQL 9.5 to use IMPOIMPORT FOREIGN SCHEMA ogr all LIMIT TO (EUR, USD)FROM SERVER svr currency rates INTO stagingOPTIONS (launder column names 'false');10 . 8

FILE TEXT ARRAY: LOADING DELIMITED FILES ANDJAGGED FILES WITH FILE TEXT ARRAYUsually not available from distros but fairly easy compileWe have a windows 32/64-bit buildshttps://tinyurl.com/y8bojebkSource code here:https://github.com/adunstan/file text array fdw (notedifferent branch for each version of PostgreSQL)New in PostgreSQL 10: can read from output of programssimilar to COPY FROM PROGRAM.11 . 1

STEP 1: CREATE EXTENSION AND SERVERCREATE EXTENSION file textarray fdw;CREATE SERVER file ta server FOREIGN DATA WRAPPER file textarray fdw;CREATE USER MAPPING FOR public SERVER file ta server;11 . 2

STEP 2 (FILE VERSION): CREATE FOREIGN TABLECREATE FOREIGN TABLE fdt film locations ta( x text[] ) SERVER file ta serverOPTIONS (filename '/data talk/csvs/Film Locations in San Francisco.csv', encoding11 . 3

STEP 2 (PROGRAM VERSION): CREATE FOREIGN TABLERequires PostgreSQL 10 CREATE FOREIGN TABLE fdt film locations ta( x text[] ) SERVER file ta serverOPTIONS (program 'wget -q -O - " @" "https://data.sfgov.org/api/views/yitu-d5am/r11 . 4

COMMANDLINE TOOLSCommonly available Open source command-line when youhave PostgreSQL / PostGIS installed.shp2pgsql- use to load Dbase and ESRI shapefiles, generallypart of based postgis or postgis-gui packageogr2ogr - Binaries and packages available for most serverand Desktop OS inaries ).Use to load any kind of data, specially designed for spatialvector data.12

SHP2PGSQLPart of PostGIS project, often packaged separately as postgisgui. On EDB Windows Stackbuilder, part of PostGIS bundle. OnBigSQL part of postgis package.Pipe to psql for loadexport PGPORT 5432export PGDATABASE pgopen2018export PGUSER postgresexport PGPASSWORD xxxxexportPGHOST localhostshp2pgsql -s 4269 -D -d data\BART Lines\Bart 13 bart lines psqlOutput as an sql scriptshp2pgsql -s 4269 -D -d bart lines.sql13

OGR2OGR: SWISS ARMY KNIFE FOR DATA LOADINGogr2ogr --formatsSupported Formats:JP2ECW -raster,vector- (rov): ERDAS JPEG2000 (SDK 5.3)OCI -vector- (rw ): Oracle SpatialSOSI -vector- (ro): Norwegian SOSI StandardPCIDSK -raster,vector- (rw v): PCIDSK Database FilenetCDF -raster,vector- (rw s): Network Common Data FormatJP2OpenJPEG -raster,vector- (rwv): JPEG-2000 driver based on OpenJPEG libraryPDF -raster,vector- (rw vs): Geospatial PDFDB2ODBC -raster,vector- (rw ): IBM DB2 Spatial DatabaseESRI Shapefile -vector- (rw v): ESRI ShapefileMapInfo File -vector- (rw v): MapInfo FileUK .NTF -vector- (ro): UK .NTFOGR SDTS -vector- (ro): SDTSS57 -vector- (rw v): IHO S-57 (ENC)DGN -vector- (rw ): Microstation DGNOGR VRT -vector- (rov): VRT - Virtual DatasourceREC -vector- (ro): EPIInfo .RECMemory -vector- (rw ): MemoryBNA -vector- (rw v): Atlas BNA14 . 1

OGR2OGR LOAD DATA INTO POSTGRESQLCan use psql variables or be specified on commandlineLoad an OpenStreetMap protobuf fileogr2ogr -f "PostgreSQL" \"PG:host localhost user postgres password xxx dbname pgopen2018" sf.osm.pbfLoad a folder of CSV files (folder is called csvs)ogr2ogr -f "PostgreSQL" \"PG:host localhost user postgres password xxx dbname pgopen2018" /data csv14 . 2

FINBUY OUR BOOKSHTTP://WWW.POSTGIS.US15

SQL COPY / COPY FROM PROGRAM Large Object storage SQL functions Foreign Data Wrappers (FDWs) Client-Side PSQL \copy and \copy FROM PROGRAM . \lo_import '/scans/file2.pdf' UPDATE tmp_docs SET doc_oid :LASTOID WHERE file_name E'/scans/file2.pdf'; run the load script file generated in step 3 \i /temp/loadscript.psql.