JSON By Example - PostgreSQL

Transcription

JSON by exampleFOSDEM PostgreSQL Devevoper RoomJanuary 2016Stefanie Janine Stölting@sjstoelting

JSONJavaScript Object NotationDon't have to care about encoding, it is alwaysUnicode, most implemantations use UTF8Used for data exchange in web applicationCurrently two standards RFC 7159 by DouglasCrockford und ECMA-404PostgreSQL impementation is RFC 7159

JSON DatatypesJSONAvailable since 9.2BSONAvailable as extension on GitHub since 2013JSONBAvailable since 9.4Crompessed JSONFully transactionoalUp to 1 GB (uses TOAST)

PerformanceTest done byEnterpriseDB,see the articleby Marc Linster

JSON Functionsrow to json({row})Returns the row as JSONarray to json({array})Returns the array as JSONjsonb to recordsetReturns a recordset from JSONB

JSON OpertatorsArray element- {int}Array element by name- {text}Object element- {text}Value at path# {text}

Index on JSONIndex JSONB content for faster access with indexesGIN index overallCREATE INDEX idx 1 ON jsonb.actor USINGGIN (jsondata);Even unique B-Tree indexes are possibleCREATE UNIQUE INDEX actor id 2 ONjsonb.actor((CAST(jsondata- 'actor id' ASINTEGER)));

New JSON functionsPostgreSQL 9.5 new JSONB functions:jsonb pretty: Formats JSONB human readablejsonb set: Update or add valuesPostgreSQL 9.5 new JSONB operators: : Concatenate two JSONB-: Delete keyAvailable as extions for 9.4 at PGXN: jsonbx

Data sourcesThe Chinook database is availableat chinookdatabase.codeplex.comAmazon book reviews of 1998 areavailable atexamples.citusdata.com/customer reviews nested 1998.json.gz

Chinook Tables

CTECommon Table Expressions will be used in examples Example:WITH RECURSIVE t(n) AS (VALUES (1)UNION ALLSELECT n 1 FROM t WHERE n 100)SELECT sum(n), min(n), max(n) FROM t; Result:

Live ExamplesLet's see, how it does work.

Live with Chinook data-- Step 1: Tracks as JSON with the album identifierWITH tracks AS(SELECT "AlbumId" AS album id, "TrackId" AS track id, "Name" AS track nameFROM "Track")SELECT row to json(tracks) AS tracksFROM tracks;

Live with Chinook data-- Step 2 Abums including tracks with aritst identifierWITH tracks AS(SELECT "AlbumId" AS album id, "TrackId" AS track id, "Name" AS track nameFROM "Track"), json tracks AS(SELECT row to json(tracks) AS tracksFROM tracks), albums AS(SELECT a."ArtistId" AS artist id, a."AlbumId" AS album id, a."Title" AS album title, array agg(t.tracks) AS album tracksFROM "Album" AS aINNER JOIN json tracks AS tON a."AlbumId" (t.tracks- 'album id')::intGROUP BY a."ArtistId", a."AlbumId", a."Title")SELECT artist id, array agg(row to json(albums)) AS albumFROM albumsGROUP BY artist id;

Live with Chinook data

Live with Chinook data-- Step 3 Return one row for an artist with all albums as VIEWCREATE OR REPLACE VIEW v json artist data ASWITH tracks AS(SELECT "AlbumId" AS album id, "TrackId" AS track id, "Name" AS track name, "MediaTypeId" AS media type id, "Milliseconds" As milliseconds, "UnitPrice" AS unit priceFROM "Track"), json tracks AS(SELECT row to json(tracks) AS tracksFROM tracks), albums AS(SELECT a."ArtistId" AS artist id, a."AlbumId" AS album id, a."Title" AS album title, array agg(t.tracks) AS album tracksFROM "Album" AS aINNER JOIN json tracks AS tON a."AlbumId" (t.tracks- 'album id')::intGROUP BY a."ArtistId", a."AlbumId", a."Title"), json albums AS(SELECT artist id, array agg(row to json(albums)) AS albumFROM albumsGROUP BY artist id)-- - Next Page

Live with Chinook data-- Step 3 Return one row for an artist with all albums as VIEW, artists AS(SELECT a."ArtistId" AS artist id, a."Name" AS artist, jsa.album AS albumsFROM "Artist" AS aINNER JOIN json albums AS jsaON a."ArtistId" jsa.artist id)SELECT (row to json(artists))::jsonb AS artist dataFROM artists;

Live with Chinook data-- Select data from the viewSELECT *FROM v json artist data;

Live with Chinook data-- SELECT data from that VIEW, that does queryingSELECT jsonb pretty(artist data)FROM v json artist dataWHERE artist data- 'artist' IN ('Miles Davis', 'AC/DC');

Live with Chinook data-- SELECT some data from that VIEW using JSON methodsSELECT artist data- 'artist' AS artist, artist data# '{albums, 1, album title}' AS album title, jsonb pretty(artist data# '{albums, 1, album tracks}') AS album tracksFROM v json artist dataWHERE artist data- 'albums' @ '[{"album title":"Miles Ahead"}]';

Live with Chinook data-- Array to recordsSELECT artist data- 'artist id' AS artist id, artist data- 'artist' AS artist, jsonb array elements(artist data# '{albums}')- 'album title' AS album title, jsonb array elements(jsonb array elements(artist data# '{albums}')# '{album tracks}')- 'track name' AS song titles, jsonb array elements(jsonb array elements(artist data# '{albums}')# '{album tracks}')- 'track id' AS song idFROM v json artist dataWHERE artist data- 'artist' 'Metallica'ORDER BY album title, song id;

Live with Chinook data-- Convert albums to a recordsetSELECT *FROM jsonb to recordset((SELECT (artist data- 'albums')::jsonbFROM v json artist dataWHERE (artist data- 'artist id')::int 50)) AS x(album id int, artist id int, album title text, album tracks jsonb);

Live with Chinook data-- Convert the tracks to a recordsetSELECT album id, track id, track name, media type id, milliseconds, unit priceFROM jsonb to recordset((SELECT artist data# '{albums, 1, album tracks}'FROM v json artist dataWHERE (artist data- 'artist id')::int 50)) AS x(album id int, track id int, track name text, media type id int, milliseconds int, unit price float);

Live with Chinook data-- Create a function, which will be used for UPDATE on the view v artrist dataCREATE OR REPLACE FUNCTION trigger v json artist data update()RETURNS trigger AS BODY -- Data variablesDECLARE recRECORD;-- Error variablesDECLARE v stateTEXT;DECLARE v msgTEXT;DECLARE v detailTEXT;DECLARE v hintTEXT;DECLARE v contextTEXT;BEGIN-- Update table ArtistIF (OLD.artist data- 'artist')::varchar(120) (NEW.artist data- 'artist')::varchar(120) THENUPDATE "Artist"SET "Name" (NEW.artist data- 'artist')::varchar(120)WHERE "ArtistId" (OLD.artist data- 'artist id')::int;END IF;-- Update table Album with an UPSERT-- Update table Track with an UPSERTRETURN NEW;EXCEPTION WHEN unique violation THENRAISE NOTICE 'Sorry, but the something went wrong while trying to update artist data';RETURN OLD;WHEN others THENGET STACKED DIAGNOSTICSv state RETURNED SQLSTATE,v msg MESSAGE TEXT,v detail PG EXCEPTION DETAIL,v hint PG EXCEPTION HINT,v context PG EXCEPTION CONTEXT;RAISE NOTICE '%', v msg;RETURN OLD;END; BODY LANGUAGE plpgsql;

Live with Chinook data

Live with Chinook data-- The trigger will be fired instead of an UPDATE statement to save dataCREATE TRIGGER v json artist data instead update INSTEAD OF UPDATEON v json artist dataFOR EACH ROWEXECUTE PROCEDURE trigger v json artist data update();

Live with Chinook data-- Manipulate data with jsonb setSELECT artist data- 'artist id' AS artist id, artist data- 'artist' AS artist, jsonb set(artist data, '{artist}', '"Whatever we want, it is just text"'::jsonb)- 'artist' AS new artistFROM v json artist dataWHERE (artist data- 'artist id')::int 50;

Live with Chinook data-- Update a JSONB column with a jsonb set resultUPDATE v json artist dataSET artist data jsonb set(artist data, '{artist}', '"NEW Metallica"'::jsonb)WHERE (artist data- 'artist id')::int 50;

Live with Chinook data-- View the changes done by the UPDATE statementSELECT artist data- 'artist id' AS artist id, artist data- 'artist' AS artistFROM v json artist dataWHERE (artist data- 'artist id')::int 50;

Live with Chinook data-- Lets have a view on the explain plans– SELECT the data from the view

Live with Chinook data-- View the changes in in the table instead of the JSONB view-- The result should be the same, only the column name differSELECT *FROM "Artist"WHERE "ArtistId" 50;

Live with Chinook data-- Lets have a view on the explain plans– SELECT the data from table Artist

Live with Chinook data-- Manipulate data with the concatenating / overwrite operatorSELECT artist data- 'artist id' AS artist id, artist data- 'artist' AS artist, jsonb set(artist data, '{artist}', '"Whatever we want, it is just text"'::jsonb)- 'artist' AS new artist, artist data '{"artist":"Metallica"}'::jsonb- 'artist' AS correct nameFROM v json artist dataWHERE (artist data- 'artist id')::int 50;

Live with Chinook data-- Revert the name change of Metallica with in a different way: With the replace operatorUPDATE v json artist dataSET artist data artist data '{"artist":"Metallica"}'::jsonbWHERE (artist data- 'artist id')::int 50;

Live with Chinook data-- View the changes done by the UPDATE statement with the replace operatorSELECT artist data- 'artist id' AS artist id, artist data- 'artist' AS artistFROM v json artist dataWHERE (artist data- 'artist id')::int 50;

Live with Chinook data-- Remove some data with the - operatorSELECT jsonb pretty(artist data) AS complete, jsonb pretty(artist data - 'albums') AS minus albums, jsonb pretty(artist data) jsonb pretty(artist data - 'albums') AS is differentFROM v json artist dataWHERE artist data- 'artist' IN ('Miles Davis', 'AC/DC');

Live Amazon reviews-- Create a table for JSON data with 1998 Amazon reviewsCREATE TABLE reviews(review jsonb jsonb);

Live Amazon reviews-- Import customer reviews from a fileCOPY reviewsFROM '/var/tmp/customer reviews nested 1998.json';

Live Amazon reviews-- There should be 589.859 records imported into the tableSELECT count(*)FROM reviews;

Live Amazon reviewsSELECT jsonb pretty(review jsonb)FROM reviewsLIMIT 1;

Live Amazon reviews-- Select data with JSONSELECTreview jsonb# '{product,title}' AS title, avg((review jsonb# '{review,rating}')::int) AS average ratingFROM reviewsWHERE review jsonb@ '{"product": {"category": "Sheet Music & Scores"}}'GROUP BY titleORDER BY average rating DESC;Without an Index: 248ms

Live Amazon reviews-- Create a GIN indexCREATE INDEX review review jsonb ON reviews USING GIN (review jsonb);

Live Amazon reviews-- Select data with JSONSELECT review jsonb# '{product,title}' AS title, avg((review jsonb# '{review,rating}')::int) AS average ratingFROM reviewsWHERE review jsonb@ '{"product": {"category": "Sheet Music & Scores"}}'GROUP BY titleORDER BY average rating DESC;The same query as before with the previously created GIN Index: 7ms

Live Amazon reviews-- SELECT some statistics from the JSON dataSELECT review jsonb# '{product,category}' AS category, avg((review jsonb# '{review,rating}')::int) AS average rating, count((review jsonb# '{review,rating}')::int) AS count ratingFROM reviewsGROUP BY category;Without an Index: 9747ms

Live Amazon reviews-- Create a B-Tree index on a JSON expressionCREATE INDEX reviews product category ON reviews ((review jsonb# '{product,category}'));

Live Amazon reviews-- SELECT some statistics from the JSON dataSELECT review jsonb# '{product,category}' AS category, avg((review jsonb# '{review,rating}')::int) AS average rating, count((review jsonb# '{review,rating}')::int) AS count ratingFROM reviewsGROUP BY category;The same query as before with the previously created BTREE Index: 1605ms

JSON by exampleThis document by Stefanie Janine Stölting is covered by theCreative Commons Attribution 4.0 International

Live with Chinook data-- Step 3 Return one row for an artist with all albums as VIEW CREATE OR REPLACE VIEW v_json_artist_data AS WITH tracks AS SELECT "AlbumId" AS album_id, "TrackId" AS track_id, "Name" AS track_name, "MediaTypeId" AS media_type_id, "Milliseconds" As milliseconds, "UnitPrice" AS unit_price FROM "Track", json_tracks AS SELECT row_to_json(tracks) AS tracks