Introduction To SQL - University Of Iowa SAS Users Group

Transcription

Introduction to SQLUniversity of IowaSAS Users Group

Introduction to SQLSee Us on the Web:https://uisug.org.uiowa.edu/Subscribe to the UI SAS ListServ:Via the web:https://list.uiowa.edu/scripts/wa.exe?HOMEVia email:email to: listserv@list.uiowa.edusubject: anythingbody: subscribe sas-users yourfirstname yourlastname

Introduction to SQL1. Introduction and basic uses2. Joins and Views3. Reporting examples

Introduction to BloodTypeJoemale73A 1150Joemale73A Joemale73A 2153Sallyfemale88O-Joemale73A 3151Tommale71B 8O-3126Tommale71B 1163PatientVisit #WeightTommale71B 26Tom1163Tom2165“Old School” Flat File Structure“Really Old School” Flat File StructureRelational File Structure

Introduction to SQL SQL Oracle Database MySQLMicrosoft SQL Server AccessIBM DB2 InformixSAP SybaseTeradataPostgreSQLSQLite

Introduction to SQLSQLPerforms direct data access functionsRetrieves and Updates tablesProc SQL;

Introduction to SQLSASNative SQLSAS SQLPass-throughUses SQL syntax within thelocal SAS environmentEnables you to send DBMS-specificstatements to a DBMS and toretrieve DBMS dataproc sql;connect to oracle as myconn(user smith password secret path 'myoracleserver');create view salary view asselect * from connection to myconn(select empid, lastname, firstname, salaryfrom employees where salary 75000);disconnect from myconn;quit;

Introduction to SQLTable/Data set: 7.566.5Result: outputA Simple “select”proc sql;select name, sex, age from class;quit;---------proc print data class noobs;var name sex 1212151312111412151612151115

Introduction to SQLTable/Data set: 7.566.5Result: outputA Simple “select”proc sql;select * from class;quit;---------proc print data class 67.057.566.5

Introduction to SQLTable/Data set: 7.566.5Result: output“select” with “where”proc sql;select * from classwhere Sex ‘M’;quit;---------proc print data class noobs;where Sex .566.5

Introduction to SQLTable/Data set: 7.566.5Result: output“select” with “where”with nested “select”proc sql;select * from classwhere height (selectmedian(height) from 1156.557.359.862.562.559.051.356.357.5

Introduction to SQLTable/Data set: classNameSexAge .559.051.364.356.366.572.064.867.057.566.5Result: output“select” with “order by”proc sql;select * from classorder by sex, age;quit;---------proc sort data class;by sex age;proc print data class 6772

Introduction to SQLTable/Data set: homasWilliamSexMFFFMMFFMMFFFFMMMMMAge 72.064.867.057.566.5Result: (no output)“create table”proc sql;create table ClassCopy asselect * from class;quit;---------data ClassCopy;set class;run;NameSexAge .559.051.364.356.366.572.064.867.057.566.5

Introduction to SQLTable/Data set: homasWilliamSexMFFFMMFFMMFFFFMMMMM“insert”Age 72.064.867.057.566.5proc sql;insert into class(Name, Sex, Age, Age Height)values (‘Norm’, ‘M’, 15, 54.2);quit;---------data class;set class end eof;output;if eof then do;name ‘Norm’; Sex ‘M’;age 15; height 54.2;output;end;run;Result: (no output)NameSexAge 4.2

Introduction to SQLTable/Data set: classNameSexAge 4.2“delete”proc sql;delete from classwhere name ‘Norm’;quit;---------data class;set class;if name ‘Norm’ then delete;run;Result: (no output)NameSexAge .559.051.364.356.366.572.064.867.057.566.5

Introduction to SQLTable/Data set: homasWilliamSexMFFFMMFFMMFFFFMMMMMAge 72.064.867.057.566.5“update”proc sql;update classset height 69.3where name ‘Alfred’;set height 57.9, age 19where name ‘Alice’;quit;---------data class;set class;if name ‘Alfred’ then height 69.3;if name ‘Alice’ then do;height 57.9;age 19;end;run;Result: (no output)NameSexAge .559.051.364.356.366.572.064.867.057.566.5

Introduction to SQLTable/Data set: classNameSexAge ions”Result: output13.31579proc sql;select avg(age) as MeanAgefrom class;quit;---------proc means data class;var age;run;

Introduction to SQLTable/Data set: classNameSexAge e a macro variable”Result: logMean age is 13.31579proc sql;select avg(age) into :mnagefrom class;quit;%put Mean age is &mnage ;

Proc SQL SyntaxMore fun things we can do with Proc SQL1.2.3.4.5.Create new variablesGroup dataSubset grouped dataConditional ProcessingJoining Datasets (WITHOUT sorting!)

Proc SQL Syntax: Creating new variables using “as”Dataset: Baseball career Dataset: Baseball selectonly variables you pick will be in outputyou can rename variables with the “as” statement

Proc SQL Syntax: Creating new variables using “as”Dataset: Baseball careerDataset: newvar-or-

Proc SQL Syntax: Grouping DataDataset: Baseball careerResult: (no output)

Proc SQL Syntax: Grouping DataDataset: Baseball careerResult: (no output)

Proc SQL Syntax: Grouping DataDataset: Baseball careerResult: (no output)

Proc SQL Syntax: Conditional ProcessingDataset: Baseball careerDataset: baseball HR

Proc SQL Syntax: Joining DatasetsDataset: Baseball career No need for sortingDataset: Baseball 1986

Proc SQL Syntax: Joining DatasetsInner Joins Uses “where”keeps common observationsbased on keywordDataset innerball

Proc SQL Syntax: Joining DatasetsDataset outerballOuter Joins Uses “on”keeps all observations from bothdatasets

Proc SQL Syntax: Joining DatasetsDataset leftballLeft/Right Joins Uses “on”keeps observations in “left” dataset

Introduction to SQLPROC SQL and the Cloud Canvas Data– Star schema database in the AmazonCloud– Fact and dimension tables– Page view table 1 billion rows

Practical Need for PROC SQL Data Reports for Instructors– What are the distributions of scores?– Which students are participating indiscussion forums?

Using the Unizin DataPlatform The Office of Teaching, Learning and Technology configuredconnections to UDP via SAS SAS provides robust data management toolsSAS provides a variety of statistical analyses and tools for creating reportsVery simple to use ODBC connection to connect to RedshiftSQL in SAS can be read by DBAs or other application developersSome downsides of SAS Not able to process Requests table for many courses at onceThe Big Integer is PythonicNot commonly used by developers32

Overcoming some hurdles 33

Reporting the Outcomes ofMandatory Quiz The Department of Art andArt History requires studentsto take a mandatory quizabout safety. An Auditing and ComplianceDepartment must collectidentities of students whohave passed the quiz. We connect to UDP and ourSIS in same programmingsteps.34

Efficiency of assignment dim Ability to find assignments by their title Effective because submission dim does not havecourse id valuesPROC SQL;create table ELEMENTS quiz asSELECT A.TITLE, A.ID ASASSIGNMENT ID,(dbsastype (id 'char(20)'course id 'char(20)')) AINNER JOIN WORK.ELEMENTS BON A.COURSE ID B.IDWHERE title 'SAAH Quiz';QUIT;35

Report for College of MedicineGoal: Report of scores on all major assignments forstudents in the medical school1. Pull all courses for the Med School fromAccount dim and inner join with Course dim Where course dim.workflow state ‘available’ Enrollment term id is current semesterAccount dim Parent account id Course idCourse dim2. Pull all exams from assignment dim tableproc sql noprint;create table ASSIGNMENTS as select *, id asassignment id label "assignment id“,course id from u.ASSIGNMENT DIMAssignment dimwhere course id in (select course id fromMED V2) and prxmatch('m/exam/oi', title) 0and workflow state 'published';quit;Submission dim Course id Enrollment term id Workflow state Course id Assignment id Title Assignment id User id Grade Workflow state36

Report for College of MedicineGoal: Report of scores on all major assignments forstudents in the medical school3. Inner join assignment dim withsubmission dim table Workflow state ‘graded’ User id “ “ Remove % signs using TRANWRD functionAccount dim Parent account id Course idCourse dim4. Inner join with pseudonym dim table to getSIS user idAssignment dimselect TRANWRD(grade, %, “ “)Submission dim Course id Enrollment term id Workflow state Course id Assignment id Title Assignment id User id Grade Workflow state37

Preparation of a Discussion ReportGoal: Report of discussion posts, replies, and ‘views’ for each student User idPseudonym dimGetting Discussion Posts and Replies SIS user id1. Match user id to university idusing Pseudonym dim table Course id2. Pull Discussion entry fact by Discussion entry idDiscussion entry fact Topic idcourse id User id3. Inner join Discussion entry factwith Discussion entry dim If depth 1 output to Replies IDDiscussion entry dim Depth If depth 1 output to Posts4. Sort by User id Count overall user posts & IDDiscussion topic dimreplies Title5. Inner join withDiscussion topic dim6. Sort by User id & Topic id Count topic posts and repliesby user38

Preparation of a Discussion ReportGoal: Report of discussion posts, replies, and ‘views’ for each studentNote: Views were defined as rows in the requests table related to discussions, as long asthe subsequent rows were for different topic ids or 15 minutes apart1. Pull from requests by course id3. Sort by user id, topic id abbr, timestamp2. Extract discussion id from URL4. Count views by user id with conditional statementsand lag functionproc sql noprint;create table requests2 &course number asselect *, input(substr(url, 33, 6), 12.)as topic id abbrlabel 'topic id abbr'from requests &course number;quit;timestamp2 timestamp - (lag(timestamp));if first.sis user id then posts read 1;if sis user id lag(sis user id) andtopic id abbr lag(topic id abbr) and timestamp2 900then posts read posts read 0 ;39

sis user idVIEWSREPLIESPOSTEDuser idSTUDENT FIRST NAMESTUDENT LAST 006161113-3333333333333DukeCornwall40

Introduction to SQL SAS Native SQL Uses SQL syntax within the local SAS environment SAS SQL Pass-through Enables you to send DBMS-specific statements to a DBMS and to retrieve DBMS data proc sql; connect to oracle as myconn (user smith password secret path 'myoracleserver'); create view salary_view as select * from connection to myconn