Case Study: PostgreSQL ( PostGIS) - Europa

Transcription

Case study: PostgreSQL ( PostGIS)ContentsQuerying. 1Exploiting data . 1Internal relationships . 2External relationships . 2Inserting data . 2Multi-user access, concurrency and transactions. 3Altering data structures . 3Copying datasets and snapshots. 3Creating copies. 3Performing backups . 4Security . 4Spatial data . 5Binary data . 5Scalability . 5QueryingPostgreSQL provides a full SQL syntax, and this document assumes familiarity with it. Querying bythe application can be done in optimal ways thanks to this syntax. Regular users can also benefitfrom the ability to write filters in a rather friendly way, like status ’Open’ and date ’2018-03-01’Bottom line: Widely-known syntax and full-featured capabilities. Friendly filter syntax.Score: 5Exploiting dataExternal exploitation of the data is made by EEA end applications. Currently, the main types are Map services that generally use extracts (FGDB) made with FMEDownloadable datasets (CSV, mdb, SQLite, xls, ) also made with FME or custom scriptsTableau dashboards that access the database directlyWeb applications that access the database directlyFME can query PostgreSQL through native readers that support filtering. Complex queries can beperformed via SQLExecutors.If necessary, map servers (including ArcGIS) can use PostgreSQL directly as a live datasource.Tableau supports connetions to PostgreSQL out of the box, with similar capabilities as those of theSQL server connector.

Native drivers exist for all major development platforms, and have been around for quite a longtime, ensuring stability and completeness.Bottom line: Mature client components for all use cases.Score: 5Internal relationshipsA single, standard way of encoding all relation types can be easily agreed upon. 1-1, 1-N and N-Mrelationships, with their variants (required-not required) map to normalized and efficient databasestructures, and the Reportnet application will implement a single algorithm for each.If custom queries are allowed, relationships will expose a normalized structure.Bottom line: need to implement the logic, but underlying structure is standard.Score: 4External relationshipsAn official extension in PostgreSQL (Foreign Data Wrappers) allows querying heterogeneous externaldata sources in a manner similar to SQL Server Polybase. This is also the preferred method to queryother external PostgreSQL databases.Both databases in external servers and those in the same server can be accessed in this unified way.One of the limitations is that, when referencing remote tables, the spec (field names and types)must be defined as a local “proxy” object. This means that, if the remote table changes, the localproxy needs to be changed accordingly. In order to keep everything in sync, the application needs tokeep metadata and re-create proxies as needed. Mappings must also be defined for users acrossservers.Another drawback is that foreign keys cannot be enabled cross-database.Bottom line: very similar implementation to internal relationships. Visible even without the R3application. R3 needs to keep track of connections and maintain them.Score: 3Inserting dataThis process would normally be controlled in all possible input channels FormsREST APIBulk load from filesCustom applications, commercial products or power users will find a familiar sql interface.Bottom line: application wrappers will be easy to make. Familiar interface towards third parties.Score: 4

Multi-user access, concurrency and transactionsPostgreSQL runs as a service that can be accessed by multiple users simultaneously without anyspecial configuration.It provides fully ACID compliant transactions with the standard isolation levels. User code canemploy the usual BEGIN . COMMIT . ROLLBACK cycle.Some sort of nested transaction mechanism is implemented through SAVEPOINTsSo, concurrent operations read-write or write-write should behave as expected, both if they aremade from the application or directly on the database server.Bottom line: transactions are available regardless of R3, and behave “as expected”.Score: 4Altering data structuresData in postgresql is structured in the usual, relational way. This means a list of fields with a fixeddata type. The elastic nature of R3 demands a way for users to add, remove and modify fieldsregardless of whether they are foreign keys, primary keys, required or optional.All the possible interactions will have to be modelled and implemented by R3. In some cases, it willinvolve recreating the table, with all its relations and data. In N-M relations, multiple tables mighthave to be recreated. In other relations, foreign keys in other tables might also be affected.Development of this functionality is likely to be costly, and the effect on user interaction potentiallyserious, when the tables that need recreating contain many records.The application also needs to keep its metadata in sync with the actual database structure, becausethe interface it displays to the user relies on knowing exactly what fields each table contains.Bottom line: Feasible but expensive both in development time and performance impact.Score: 2Copying datasets and snapshotsThis covers all the cases where a copy of the database is made by end users whithin the system, asopposed to automatic backups.Creating copiesTypes Empty: structure refrence dataFull copies of all the dataScenarios At design time, to experiment or share with stakeholders (empty or full)When reporting starts, to create a new “country sandbox” (empty)During reporting, inside each country’s sandbox, to make a new “release” (empty or full)

The standard command line tools provided by postgresql allow highly customizable backups.Combining flags, we can create backups that contain all the table structures and any combination oftable data. Thus, we can copy all the table data, none or just some. Metadata information is neededin order to know which tables are “reference data”.The syntax is complex and will require extensive testing before the application can safely clonedatabases, but once there it’s likely to be very reliable. A minor drawback is that most optionsrequire a temporary dump file to be created on diskBottom line: powerful mechanism for creating copies with minor drawbacks. Metadata is needed.Score: 4Performing backupsThere are multiple ways to perform system backups in PostgreSQL. This section is dedicated to the ITinfrastructure approach to backups, as opposed to the previous one. That is, we will discuss how toperform regular backups for disaster recovery purposes.Dump restoreUsing the same command line tools from the previous section, full backups can be created indifferent formats. Each format has its distinct advantage, like cross-version compatibility, backupsize or restoring speed.In any case, they all perform full, consistent backups of the system at the time they are made, andoperations need not be stopped while the backup runs (except altering structure).Filesystem basedMost of these require the server to be stopped. This can be avoided or reduced by using filesystemlevel snapshot capabilities, but complexity increases. You always have to backup the entire systemand, like in the previous case, incremental backups are not possible.Continuous archivingConsists of making filesystem full copies from time to time, and continuously archive the transactionlogs after that point. The FS copies need not be totally consistent, Provides point in time recoverycapabilities.Full copies are made by pg basebackup, and do not affect the running system. Transaction logs arearchived by defining a command line that needs to be run whenever one 16Mb log file is complete.Bottom line: provides out of the box options for full and incremental backups. Minor manualscripting might be necessaryScore: 4SecurityPostgreSQL can authenticate users through a wide choice of methods that include LDAP andintegrated windows authentication. They should cover any conceivable scenario for Reportnet 3.

In postgresql, both users and groups are actually roles. Some roles can belong to other roles (this ishow “users” belong in “groups”. All authorization is based on roles.Authorization rules can be set as low as the row level.Bottom line: Multiple authentication mechanisms, including LDAP, windows and user pass. Finegrained control down to row levelScore: 4Spatial dataPostGIS is a widely known extension to PostgreSQL that handles spatial data. It is mature, stable andused around the world. Its feature set covers every conceivable use case we might come across,including raster support.Bottom line: PostGIS has everything we might ever need.Score: 5Binary dataTwo types of binary fields are supported by postgresql, one of them aimed at really larger objects.Both have minor drawbacks, like a different syntax to interact or high memory requirements. Sizelimitations, even in the “small” version are quite generous (1Gb/4Tb)Bottom line: Supported, and relatively easy to interact with.Score: 4ScalabilityThe simple alternative described in the mongodb paper applies here as well, with the same setupand challenges for the application.For horizontal scaling, a number of solutions exist, the most promising being Citus Data. Citusprovides sharding over an arbitrary number of nodes, and is an extension to Postgresql. It is opensource, with paid support optional. The shariding key problem is the same as in the mongodb study,with the only obvious automatic choice being countryCode.Bottom line: Horizontal scaling available with extensions. Sharding key difficult to assignautomatically. Simple alternative easy to setup.Score: 3

Tableau dashboards that access the database directly Web applications that access the database directly FME can query PostgreSQL through native readers that support filtering. Complex queries can be performed via SQLExecutors. If necessary, map servers (including ArcGIS) can use PostgreSQL directly as a live datasource.