Water Quality Database Database Design And Data Dictionary

Transcription

Water Quality DatabaseDatabase Design and Data DictionaryPrepared For:U.S. Environmental Protection Agency, Region IIIChesapeake Bay Program OfficeJanuary 2004

BACKGROUND .4INTRODUCTION .6WATER QUALITY DATA.6THE RELATIONAL CONCEPT .6THE RELATIONAL DATABASE STRUCTURE .7WATER QUALITY DATABASE STRUCTURE .8PRIMARY TABLES .8WQ CRUISES .8WQ EVENT.8WQ DATA.9WQ DATA BMDL .10WQ CHLOROPHYLL .11WQ KD .13WQ QAQC .13LOOKUP TABLES .15WQ STATIONS .15WQ EVENT lookup tables. 25SOURCE .25AGENCY .25PROGRAM.26PROJECT .27CLOUD COVER .27WIND SPEED.27WIND DIRECTION .28PRECIP TYPE .28WAVE HEIGHT.29TIDE STAGE.29WQ DATA lookup tables . 30PARAMETER .30SAMPLE TYPE .32SAMPLE ID .32LAYER .33UNIT .34QUALIFIER .35METHOD.35PROBLEM .39LAB .40WQ STATIONS lookup tables . 41FIPS .41-2Water Quality Database Design and Data Dictionary

HUC8.45HUC11 .46CBP BASIN .56TS BASIN .57SEGS 1985.57SEGS 1998.59SEGS 2003.61WSM SEGS.63DATA PRESENTATION TABLES .65TIME SERIES.65STATISTICS.65LOADS TABLE .66LOADS .66CONVERSION AUXILIARY TABLES .67AGENCY AKA.67LAB AKA.67PARAMETER AKA .68PARAM METHOD MAP .69WQ STATIONS AKA.71RIM INIT STATIONS.72RIM INIT PARAMETERS .73-3Water Quality Database Design and Data Dictionary

BACKGROUNDSince the formation of the Chesapeake Bay Program in 1983, the Chesapeake Bay ProgramOffice (CBPO) acted as a central repository for environmental data related to the ChesapeakeBay and its tributaries. While this type of data management system provided some level ofcontrol over the timely collection of data and the quality of those data, it also created problemssuch as: Data accessibilityDatabase inconsistenciesDuplication of effortData located at the CBPO were physically stored on several different and disparate computingplatforms running different operating systems. The data stored in numerous formats included: Semi-relational dBase databasesNon-relational and non-normalized SAS data setsASCII text filesSpreadsheets and,Document tablesThis situation made it difficult for data analysts to retrieve different types of data required todetermine possible cause and effect relationshipsThroughout the years, State and federal government agencies and several information technologycompanies processed the different types of environmental data. Depending upon the type of dataand the amount of resources devoted to managing it, at both the State and federal levels, differentlevels of quality assurance (QA) were performed on the of data. In some cases, QA proceduresperformed at the CBPO duplicated efforts made at the State level.A distributed data management system, the Chesapeake Information Management System(CIMS), was developed to combat these problems. In contrast to the old system where allChesapeake Bay environmental data was stored at the CBPO, CIMS will transfer ownership ofthe data back to the agencies that generate the data. These data generating agencies will in turnprovide their own data quality assurance and adhere to mutually agreed upon data standards.These agencies will also have the opportunity to serve their data over the Internet, therebyenabling access to the data to any legitimate user that has access to the Internet. As part of theimplementation of CIMS, a relational database structure for managing Chesapeake Bay relatedenvironmental data was developed. The key to the success of a distributed, relational datamanagement system lies in the willingness of the data generators to take ownership of andresponsibility for their data and in their adherence to the established data standards for publicaccess to the data. This document describes the components of this structure.The benefits of a distributed data management system over the current, centralized datamanagement system are many. The most obvious benefit is that a distributed system empowers-4Water Quality Database Design and Data Dictionary

data generators, allowing for easier access to their data and hopefully increasing the utility of thedata beyond Chesapeake Bay Program activities. The other key benefits of the CIMS structurerelate to the implementation of a relational database structure. In a relational database, data arestored in tables that are related to one another by common fields. These common fields are set asprimary and/or foreign keys. The creation of relationships between tables using key fieldsallows for the enforcement of referential integrity. Referential integrity prohibits the datamanager from entering records into a child table containing a foreign key for which there is notan associated primary key in the parent table. While this document is not intended to provide acomplete discussion of the concepts of a relational database, it is the intention of the CBPO toprovide limited assistance to data managers at distributed data centers.-5Water Quality Database Design and Data Dictionary

INTRODUCTIONWater Quality DataHistorically, water quality data compiled by the Chesapeake Bay Program Office (CBPO) wasprocessed using the Statistical Analysis System (SAS) programming language and stored aspermanent SAS data sets. Each participating agency submitted either a SAS data set or anASCII delimited text file containing ambient water quality monitoring data associated withsamples collected during a particular month or entire year. The data set was then processedusing a quality assurance program called MONITOR, which performed a series of checkspertaining to station name, value ranges, method codes, etc.The resulting data sets were stored in the appropriate subdirectory on the CBPO DEC Alpha.Approximately 400 SAS data sets comprised all of the Chesapeake Bay and tributary waterquality monitoring data from 1983 to 1998. A program named BAYSTATS allowed users withDEC Alpha accounts to retrieve these data using an input form to narrow the data search.This type of data management system was inefficient both in terms of its structure and itsaccessibility to users. Each agency’s data were stored as individual monthly data sets with apredefined horizontal structure that precluded the entry of additional parameters and madequality assurance procedures difficult to execute. Furthermore, there was a significant amount ofredundant data. Each record in every data set contained information that pertained solely to thewater quality station, not to the samples and resultant parameter values. Access to the data wasalso limited to those users with DEC alpha accounts.Because of these weaknesses, the Chesapeake Bay Program decided to develop a relationaldatabase management system (RDBMS) for water quality data. Referential integrity was utilizedto enforce “business rules” related to value qualifiers, method codes, problem codes, weathercodes, etc. Additionally, the database was made available via the World Wide Web to anyonewith access to Internet browser software (e.g. Netscape, Internet Explorer).The Relational ConceptInformation obtained through water quality monitoring programs is most efficiently grouped intosubsets that are related to one another through common elements. In the Chesapeake BayProgram relational database, water quality monitoring information is stored in theWQ CRUISES, WQ EVENT, WQ CHLOROPHYLL, WQ KD, WQ DATA BMDL,WQ DATA, and WQ QAQC tables. Information related specifically to monitoring stations(e.g. latitude, longitude, basin, etc.) is stored in the WQ STATIONS table.When an agency collects water quality samples at a group of stations over a period of one ormore days, the information related to this “cruise” will be stored in the WQ CRUISES table.Information related to sampling events conducted at individual stations during a cruise will be-6Water Quality Database Design and Data Dictionary

stored in the WQ EVENT table. Parameter values obtained either by field measurement orlaboratory analysis will be stored in the WQ CHLOROPHYLL table, the WQ KD table, theWQ DATA BMDL table, the WQ DATA, or the WQ QAQC table depending upon its type.Information contained in the associated look-up tables supports the referential integrity of thedatabase. Additional tables were also developed for future data presentation and the data formatconversion between data collection programs.The Relational Database StructureThe following relational data structure for water quality data contains descriptions of the primarydata tables as well as the numerous lookup tables required to define in detail the codes containedin the primary tables. The table columns in this document used to describe the fields in thedatabase tables are described below. FIELD - This column contains the field name in the database table as well as the designation ofthe field as either a primary key (PK), a foreign key (FK), a not null (NN) field, or an optional (O) field.Primary key fields, by definition, are not null. Foreign key fields are restricted to a set list of values ina lookup table. If the value is not known, it is null. However, if a foreign key field is also part of acombined primary key, it can’t be null. Fields which are neither primary nor foreign key fields, butwhich have been designated as not null are those fields deemed essential to certain applications ofthe database. Optional fields will serve to increase the functionality of the database from a userinterface perspective. It will be up to the individual database manger/administrator as to whether ornot these fields will be included. DESCRIPTION - This column contains a description of the database table field. TYPE (FORMAT) - This column specifies the field type as text, number, small integer, ordate/time; it also includes the format of the field or the precision of the numeric value whereappropriate (primary tables only). LENGTH (BYTES) - This column specifies the maximum length, text or numeric, of a field as wellas the internal database storage requirement (primary tables only). For text fields, the internalstorage requirement is equal to its length.-7Water Quality Database Design and Data Dictionary

WATER QUALITY DATABASE STRUCTUREPrimary TablesWithin the proposed design, the primary tables WQ EVENT and WQ DATA are used to storethe vast majority of the data contained within the database. The WQ EVENT table contains allsampling event data that are not depth-dependent (e.g. weather parameters). It also containsfields specifying both the type and origin of the data. The WQ DATA table contains all depthdependent ambient water quality sampling event data as well as secchi depth data (not depthdependent).WQ CRUISESFIELDDESCRIPTIONCRUISECRUISE ID (PK, FK)SOURCE (PK, FK)START DATECBP cruise IDAgency cruise ID (YYYYMMA)Code identifying agency/contractorthat generated the dataCode identifying agency responsiblefor submitting/serving the dataAgency monitoring program codeCode identifying the project underwhich the monitoring was conducted(MAIN,TRIB)Starting date of cruiseEND DATEEnding date of cruiseCOMMENTSComments related to sampling event(e.g. parameters not sampled)AGENCY (PK, FK)PROGRAM (PK, FK)PROJECT (PK, YYYY)Memo88No limitThe WQ CRUISES table is used to store information pertaining to an agency cruise. TheCOMMENTS field can be used to store information about the cruise that was historicallysubmitted as part of the accompanying documentation file. For example stations not sampledduring a cruise could be noted in this field along with the reason why samples were not collected.WQ EVENTFIELDDESCRIPTIONSOURCE (PK, FK)Code identifying agency/contractorthat generated the dataCode identifying agency responsiblefor submitting/serving the dataAgency monitoring program codeAGENCY (FK)PROGRAM ater Quality Database Design and Data Dictionary

FIELDDESCRIPTIONPROJECT (PK, FK)Code identifying the project underwhich the monitoring was conducted(MAIN,TRIB)Agency cruise IDCBP station nameDate on which the sample wascollectedTime at which the sample wascollectedTotal station depthDepth of upper pycnoclineDepth of lower pycnoclineAir temperature (degrees Celsius)measured at beginning of samplingeventCode identifying wind speed rangeestimated at beginning of samplingeventCode identifying wind directionestimated at beginning of samplingeventCode identifying wave height rangeestimated at beginning of samplingeventCode identifying percent cloud coverrange estimated at beginning ofsampling eventCode identifying type of precipitationat beginning of sampling eventCode identifying tide stage at thebeginning of sampling eventGage height (ft)Barometric pressure (mm Hg)Comments related to sampling event(e.g. parameters not sampled)CRUISE (FK)STATION (PK, FK)SAMPLE DATE (PK)SAMPLE TIME (PK)TOTAL DEPTHUPPER PYCNOCLINELOWER PYCNOCLINEAIR TEMPWIND SPEEDWIND DIRECTIONWAVE HEIGHTCLOUD COVERPRECIP TYPETIDE STAGEGAGE mber (Single)Number (Single)Number (Single)Number (Single)81585,1 (4)5,1 (4)5,1 (4)4,1 (4)Text2Text3Text2Text2Text2Text2Number (Single)Number (Single)Memo5,1 (4)5,1 (4)No /Time(MMDDYYYY)Date/Time(HH24:MM)Number (Single)1588WQ DATAFIELDDESCRIPTIONSOURCE (PK, FK)Code identifying agency or contractorthat generated the dataCode identifying the project underwhich the monitoring was conducted(MAIN,TRIB)CBP station nameDate on which the sample wascollectedTime at which the sample wascollectedDepth at which the sample wasPROJECT (PK, FK)STATION (PK, FK)SAMPLE DATE (PK)SAMPLE TIME (PK)DEPTH (PK)85,1 (4)-9Water Quality Database Design and Data Dictionary

FIELDSAMPLE TYPE(PK, FK)SAMPLE ID(PK, FK)LAYER (PK, FK)PARAMETER(PK, FK)QUALIFIER (FK)VALUEUNITS (FK)METHOD (FK)LAB (FK)PROBLEM (FK)COMMENTSDESCRIPTIONcollectedCode identifying type of samplecollected (e.g. D discrete,C composite, etc.)Code identifying the sampledescription and number (e.g. FS1 firstsubsample of field split sample)Code identifying water column layer atwhich sample was collectedCode identifying parameter nameParameter value qualifier code (e.g. less than)Parameter valueAbbreviation for units of parametervalueMethod code identifyingfield/laboratory test procedureLaboratory code where analysis wasperformedProblem code associated withparameter valueComments related to sampledparameter 0Text5Number (Single)Text12,4 (4)10Text7Text15Text2MemoNo limitBoth these tables are used to store data which is essential to water quality data analysts in theirdaily tasks. The tables are linked or related by their five common fields (i.e. SOURCE,PROJECT, STATION, SAMPLE DATE, and SAMPLE TIME). Creating this one-to-manyrelationship between WQ EVENT and WQ DATA prevents the data manager from enteringparameter values into the WQ DATA table without first entering the associated sampling eventdata in the WQ EVENT table. This provides an automatic layer of quality assurance to thedatabase.WQ DATA BMDLFIELDDESCRIPTIONSOURCE (PK, FK)Code identifying agency or contractorthat generated the dataCode identifying the project underwhich the monitoring was conducted(MAIN,TRIB)CBP station nameDate on which the sample wascollectedTime at which the sample wascollectedDepth at which the sample wascollectedPROJECT (PK, FK)STATION (PK, FK)SAMPLE DATE (PK)SAMPLE TIME (PK)DEPTH Time(MMDDYYYY)Date/Time(HH24:MM)Number (Single)15885,1 (4)- 10 Water Quality Database Design and Data Dictionary

FIELDDESCRIPTIONSAMPLE TYPE(PK, FK)Code identifying type of samplecollected (e.g. D discrete,C composite, etc.)Code identifying the sampledescription and number (e.g. FS1 firstsubsample of field split sample)Code identifying water column layer atwhich sample was collectedCode identifying parameter nameSAMPLE ID(PK, FK)LAYER (PK, FK)PARAMETER(PK, FK)VALUEUNITS (FK)METHOD (FK)LAB (FK)PROBLEM (FK)COMMENTSParameter valueAbbreviation for units of parametervalueMethod code identifyingfield/laboratory test procedureLaboratory code where analysis wasperformedProblem code associated withparameter valueComments related to sampledparameter 0Number (Single)Text12,4 (4)10Text7Text15Text2MemoNo limitThis table is used to store parameter values below the method detection limit (MDL). Becauseof their sensitive nature, these data will be available only by request of the Chesapeake BayProgram Water Quality Data Manager. The table structure is identical to that of the WQ DATAtable except that the QUALIFIER field has been eliminated.WQ CHLOROPHYLLFIELDSOURCE (PK, FK)DESCRIPTIONSTATION (PK, FK)SAMPLE DATE (PK)Code identifying agency or contractorthat generated the dataCode identifying the project under whichthe monitoring was conducted(MAIN,TRIB)CBP station nameDate on which the sample was collectedSAMPLE TIME (PK)Time at which the sample was collectedDEPTH (PK)Depth at which the sample was collected(meters)Code identifying type of sample collected(D discrete sample, ISM in-situmeasurement)Code identifying the sample descriptionand number (e.g. FS1 first sub-sampleof field split sample)Code identifying water column layer atPROJECT (PK, FK)SAMPLE TYPE (PK, FK)SAMPLE ID (PK, FK)LAYER (PK, ime(MM/DD/YYYY)Date/Time(HH24:MM)Number (Single)1585,1 (4)Text5Text7Text38- 11 Water Quality Database Design and Data Dictionary

FIELDSAMPLE VOLUMEEXTRACT VOLUMELIGHT 4BOD665AOD750AOD750BMETHOD (FK)PROBLEM (FK)LAB (FK)COMMENTSDESCRIPTIONwhich sample was collectedChlorophyll sample volume (liters)Chlorophyll extract volume (milliliters)Light path used in analysis (centimeters)Optical density reading (uE/m**2/s) takenat a wavelength of 480 nm, beforeacidificationOptical density reading (uE/m**2/s) takenat a wavelength of 510 nm, beforeacidificationOptical density reading (uE/m**2/s) takenat a wavelength of 630 nm, beforeacidificationOptical density reading (uE/m**2/s) takenat a wavelength of 480 nm, beforeacidificationOptical density reading (uE/m**2/s) takenat a wavelength of 647 nm, beforeacidificationOptical density reading (uE/m**2/s) takenat a wavelength of 663 nm, afteracidificationOptical density reading (uE/m**2/s) takenat a wavelength of 663 nm, beforeacidificationOptical density reading (uE/m**2/s) takenat a wavelength of 664 nm, beforeacidificationOptical density reading (uE/m**2/s) takenat a wavelength of 665 nm, afteracidificationOptical density reading (uE/m**2/s) takenat a wavelength of 750 nm, afteracidification (corrected for turbidity)Optical density reading (uE/m**2/s) takenat a wavelength of 750 nm, beforeacidification (corrected for turbidity)Method code identifying fieldmeasurement procedureProblem code associated withchlorophyll analysisLab code identifying where the analysiswas conductedComments related to chlorophyll analysisTYPE(FORMAT)LENGTH(BYTES)Number (Single)Number (Single)IntegerNumber (Single)4,2 (4)4,2 (4)1 (2)6,3 (4)Number (Single)6,3 (4)Number (Single)6,3 (4)Number (Single)6,3 (4)Number (Single)6,3 (4)Number (Single)6,3 (4)Number (Single)6,3 (4)Number (Single)6,3 (4)Number (Single)6,3 (4)Number (Single)6,3 (4)Number (Single)6,3 (4)Text7Text2Text10MemoNo limitThe WQ CHLOROPHYLL table contains optical density readings used to calculate totalchlorophyll-a, active chlorophyll-a, pheophytin, trichromatic chlorophyll-a, trichromaticchlorophyll-b, and trichromatic chlorophyll-c.- 12 Water Quality Database Design and Data Dictionary

WQ KDFIELDDESCRIPTIONSOURCE (PK, FK)Code identifying agency or contractorthat measured the dataCode identifying the project underwhich the monitoring was conducted(MAIN,TRIB)CBP station nameDate on which the PAR readings weretakenTime at which the PAR readings weretakenDepth at which the PAR readings weretaken (meters)Code identifying the sampledescription and number (e.g. M1 firstmeasurement, M2 secondmeasurement)PAR reading measured from the boatwhile or just before PAR readings weretaken at depthPAR reading taken at depth (upsensor)PAR reading taken at depth (downsensor)Units for PAR (always UM/M**2/S)Method code identifying fieldmeasurement procedureProblem code associated with PARanalysisComments related to the collection ofPAR readingsPROJECT (PK, FK)STATION (PK, FK)SAMPLE DATE (PK)SAMPLE TIME (PK)DEPTH (PK)SAMPLE ID (PK, FK)EPAR SEPARU ZEPARD ZUNITS (FK)METHOD (FK)PROBLEM extDate/Time(MM/DD/YYYY)Date/Time(HH24:MM)Number (Single)158Text85,1 (4)7Number (Single)6,3 (4)Number (Single)6,3 (4)Number (Single)6,3 (4)TextText107Text2MemoNo limitPhotosynthetic active radiation (PAR) readings are taken in order to calculate a light attenuationcoefficient for the water column. Because these data are collected using the same type of meter(i.e. one method for all) and have the same units, they can be stored in one data record.WQ QAQCFIELDDESCRIPTIONPROJECT (PK, FK)Code identifying the project under whichthe monitoring was conducted(MAIN,TRIB)Code identifying agency/contractor thatgenerated the dataMonitoring station nameDate and time the sample was collectedSOURCE (PK, FK)STATION (PK, FK)SAMPLE DATE TIME Time15- 13 Water Quality Database Design and Data Dictionary

TYPE(FORMAT)LENGTH(BYTES)Depth at which the sample was collectedCode identifying water column layer atwhich sample was collectedCode identifying the sample descriptionan

Water Quality Database Design and Data Dictionary . TYPE LENGTH FIELD DESCRIPTION (FORMAT) (BYTES) PROJECT (PK, FK) Code identifying the project under which the monitoring was conducted (MAIN,TRIB) Text 10 CRUISE (FK) Agency cruise ID Text 8 STATION (PK, FK) CBP station name Text 15 Water Quality Database Design and Data Dictionary .