Daedalus And Icarus, Statistical Expert Systems For .

Transcription

Daedalus and Icarus, statisticalexpert systems for agricultureLeen Nys and Luc Duchateauformer assistents of Prof. Paul Darius

Experimental design:theory versus application

TAXSY

DaedalusandIcarus

The agricultural experimental stations

Chaos and structure

A Rule-based Expert System Shell Developed with SAS SoftwareTAXSY

TAXSY – A Rule-based Expert SystemShell Developed with SAS Software Darius, P. (1984) Expert Systems and Statistics, SEAS Proceedings, SpringMeeting Darius, P. (1986) Building Expert Systems with the Help of ExistingSoftware, COMPSTAT 1986: Proceedings in Computational Statistics Darius, P. (1988) Statistical Expert Systems: Some Implementation andExperimentation Aspects. Osterreichisches Zeitschrift fur Statistik undInformatik Darius , P. (1990) A toolbox for adding knowledge-based modules toexisting statistical software. Annals of Mathematics and ArtificialIntelligence Demonstration of TAXSY at International Summer School onComputational Aspects of Model Choice, Charles University Prague, 114 July 1991

Expert systems use explicitly coded knowledge (oftenin the form of IF-THEN rules) to solve problems forwhich a (numerical) algorithmic solution is notappropriate TAXSY is an expert system shell completely written inSAS It consists of a set of SAS programs which, with theaddition of datasets with rules and code, form aflexible system for knowledge-based consultation

The heart of TAXSY is the inference engineThe inference engine is capable of backward chainingon rulesOne needs to specify an attribute as the goal of theinference process (e.g. name of a test)The inference engine will repeatedly invoke the rule tofind a value for the goal attribute.TAXSY(AF Application)

TAXSY needs a rule base in the form of a SASdataset, with RULES of the following format: IF (attribute) (operator) (value) AND (attribute) (operator) (value) THEN (attribute) (operator) (value)RULES(SAS Datasets)TAXSY(AF Application)

To obtain a value that cannot be inferred from rules, TAXSYinvokes an appropriate interface. The PROMPTS dataset should contain, for each such attribute, thename of the AF-application TAXSY has to start: Simple menu Sophisticated applications involving the construction of SASprograms based on information previously obtained andprocessing of their results.RULES(SAS Datasets)TAXSY(AF Application)PROMPTS(SAS Datasets)

STRUCTURE dataset contains metadata stores information about variables and observationsand about relations between variablesRULES(SAS Datasets)TAXSY(AF Application)PROMPTS(SAS Datasets)STRUCTURE(SAS Datasets)

STRATEGY dataset inference process generally needs in a given stageonly a limited number of rules rule dataset is splitted in a number of modulesthrough the STRATEGY dataset to speed up the search processRULES(SAS Datasets)TAXSY(AF Application)PROMPTS(SAS Datasets)STRATEGY(SAS Datasets)STRUCTURE(SAS Datasets)

Description, Analysis and Experimental Design for AgricuLtUral SystemsDAEDALUS

Innovative aspects

Innovative aspects - OOPS

Innovative aspects–No name approach

Innovative aspects – Mixed model

Object “Experiment”

Instance variables

Instance variables and methods

High level strategyAttributeValueTaskRules-PromptsAnalysis specificationspecdesAnalysis specificationdoneResponse variablerespvarResponse variableknownResponse typeResptypeResponse typeknownDesign for analysisdesianalDesign for analysisallowedFit modelFitmodelFit modeldoneCheck outliersOutlierCheck outliersdoneVariance functionvarfuncVariance functioncorrectDensity functionDensityfDensity functioncorrectIndependence assumptionIndependIndependence assumptioncorrectFull analysisanalysis

Low level strategy - Analysis specificationPartAttributeOperatorValueiffilled datasetispresentanddesign structure for analysisisspecifiedthenanalysis specificationisdoneiffilled datasetIsabsentandmessage for missing datasetIsgiventhenanalysis specificationisabandonedifdesign structure for analysisisnot specifiedandmessage for non specificationisgiventhenanalysis specificationisabandonedItem nameItem value ( method-name)filled datasetCheck-existence of datasetdesign structure for analysisDetermine designmessage for missing datasetShow message missing datasetmessage for non specificationShow message non specification

Determine design method Determine design method: three optionso Already available through ICARUSo Choose from a list of designs and assign variableso Construct a design

Variables type

Variables relationship

Low level strategy – Design for analysisPartAttributeOperatorValueifset of variance componentsisestimablethendesign for analysisisallowedifset of variance componentsispartially estimableandmessage for partial estimabilityIsgiventhendesign for analysisisallowedifset of variance componentsisnot estimableandmessage for non estimabilityisgiventhendesign for analysisisnot allowedItem nameItem value ( method-name)set of variance components datasetCheck estimability of variance componentsmessage for partial estimabilityShow message partial estimabilitymessage for estimabilityShow message non estimability

check estimability of variancecomponents Non estimability due to– Specific missing value pattern– Misspecification of the design structure Method based on stratum concept and REML– Stratum needs to have at least rank one afterprojection on the space spanned by the treatmentfactors, otherwise no degrees of freedomremaining to estimate residual variance

Implementation SAS did a bad job in estimating variancecomponents IML is used instead The obtained rank for the different strata isstored in the cov parms list

Updating the experiment object

Beyond Daedalus

Prototype to ‘explore’ experimental designsICARUS

ICARUSPrototype to ‘Explore’ ExperimentalDesignsNys, M., P. Darius and M. MarasingheAn interactive window-based environment for experimental design.COMPSTAT, 1992, NeuchatelNys, M., P. Darius and M. MarasingheAn interactive window-based environment to explore design of anexperimentSOFTSTAT, 1993, HeidelbergInvited talk given by P. Darius and M. Nys at theUniversity of Augsburg, 1994Invited talk for reserachers in chemistry given by P. Darius and M. Nysat Research Center of Boehringer Mannheim, Tutzing, July 1994

Tool to assist statisticians and experimenters to‘explore’ experimental designs during the designphase ‘Design’ as an object– Interactive– GraphicalWindow-based Strategy based: incorporates statisticalknowledge Implementation in Objectworks/Smalltalk on SUNworkstations

Representation of a Design TextFields were chosen in 5 countries.Each field was divided into 4 plots,. Name Model (design-matrix)Factorial 24, Latin Square, Split-plot, .Y ij µ α i β j ε ijBlock Treatment Dataset Others Graph of crossing/nestingrelationship of factors111222321213e.g. Kirk: SPF-p.qrVARIETY *PLANT

Hasse Diagramsnesting partial orderingB*Hasse diagramA is nested in B (each level of A occurs only within 1 level of B)B* B is fixed factorADIET*HORMONE*Two-way Completely Randomized DesignCHICKENBATCHSAMPLENested (hierarchical ) Split-Plot DesignAGE GROUPSUBJECTTREATMENT*TIME*SUBJECT-TIMERepeated Measures Design

SEX*DIET*HORMONE*CHICKEN3-way factorialROWCOLUMNPLOTLatin-squareTREATMENT*

A model for the experimental design processOne-shot approach ?Interactive process ?Our model :Defining/re-defining the problemEntering/re-entering the designDesign representationDesign evaluation: practical aspects analysis aspects

Defining/re-defining the problemEntering/re-entering the designDesign representationDesign evaluationDefining the problemAn engineer wants to study the stretch of a piece of metal punched in a die.Three facors are considered:Lubricant (L): None, mill oil and added lubricantThickness of the steel (T): 8 and 10 mm.Steel type (S): standard and AK steelFor a given combination of L, T and S it is easy to punch 3 pieces, the one after the other.The experimenter plans to repeat the entire experiment a week later.(based on Lorenzen and Anderson, 1993)DESIGN ?ANALYSIS?

Defining/re-defining the problemEntering/re-entering the designDesign representationDesign evaluationEntering the designProblem description (abstract)Computer representation (formal)Two options are proposedoption 1 describe all elements (treatment factors, blocking design factors,experimental units, .) of the design ‘easy’ questions are asked to find the relationship between all theelements

Description of the elementsNameTotal number of levels Random/fixedlubricant3fixedthickness2fixedSteel type2fixedweek2randompiece72random‘easy’ questions to find relationship

option 2 Treatment factor (name) applied on (experimental unit)Treatment factor random/fixedDesign factor (name)Response variables (name) measured on (observational unit)‘More difficult’ questions are asked to find the relationshipbetween all the elements

Defining/re-defining the problemEntering/re-entering the designDesign representationDesign evaluationDesign representation Hasse diagram Model DatasetSee later

Design evaluation : - analysis aspects (see later)- practical aspectsProblems:- Very difficult to run exp. in random orderExampleRandomized order ?Solutions:- If # treatm. 1 then group exp. units

Defining/re-defining the problemEntering/re-entering the designDesign representationDesign evaluationRe-defining the problem Since it takes a long time to wipe down a die, one lubricant was selected andall combinations of thickness and steel type were run before anotherlubricant was usedAll 3 pieces are punched before going to the next thickness and steel typeRe-entering the design Piece is not the experimental unit of lubricant, steel type or thickness, but itis the observational units on which the stretch is measuredDie is the experimental unit for lubricantA group of 3 pieces is the experimental unit for thickness and steel type

Defining/re-defining the problemEntering/re-entering the designDesign representationDesign evaluationDesign representation Hasse diagram of factors Model Dataset

(I) For designs with a ‘(balanced) complete response response structure’ (Taylor and Hilton) An interaction term for factors not connected by a line in a Hasse diagram, is added to the model( maximal model) The degrees of freedom are calculated by the rules given in Taylor and Hilton (and in most text books)Effects of the maximal modelWeekLubricantWeek*lubricantDieSteel 220111222111222000047Maximal model: some effects can have 0 df

(II) Effects with 0 df are confounded with effects with df 0Effects of the maximal modelWeekLubricantWeek*lubricant dieSteel icknessWeek*type*thicknessWeek*lubricant*type die*typeWeek*lubricant*thickness die*thicknessWeek*lubricant*type*thickness die*type*thickness groupPiecedf12211122211122247Based on EMS e.g. EMS (die) EMS(week*lubricant) x variance(die)

(III) The interactions between the treatment and design ( blocking factors) are pooledweekdiegrouppieceErrorstrataEffects of the maximal modelWeekLubricantWeek*lubricant dieSteel icknessWeek*type*thicknessWeek*lubricant*type die*typeWeek*lubricant*thickness die*thicknessWeek*lubricant*type*thickness die*type*thickness groupPiecedf122111222947

Up to now:Balanced complete response structureModel I ( maximal model)rules Taylor and Hilton (effects of model and df)confoundingModel IIpoolingModel IIIFor orthogonal structure: Hasse diagram of ‘relevant’ effects (factors of cross-classifications) Add μ on top of Hasse diagram For each effect: total number of levels For dfStart at top df for μ is 1 df for a specific effect total number of levels for specific effect –sum of df of effects on top of specific effect

Hasse diagram of effects (model III) 4 error-strataNo interaction between week (design facor) and treatment factorsRemark: 2 pieces per groups instead of 3 (in orignal design description)

Defining/re-defining the problemEntering/re-entering the designDesign representationEvaluation of design: analysis aspectsDesign evaluation:-Practical aspects-Analysis aspects

Evaluation of design: analysis aspects- powerStandardized minimal detectable differenceαPower for lubricant (if # weeks 2)

Defining/re-defining the problemEntering/re-entering the designDesign representationDesign evaluation:-Practical aspects-Analysis aspectsRe-entering of the design3 weeks instead of 2 weeks - significant increase in powerConclusion:researchersEfficient way to explore the experimentaldesign during the design phasestatistical consultants

A Personal Note to End .

Daedalus and Icarus, statistical expert systems for agriculture Leen Nys and Luc Duchateau former assistents of Prof. Paul Darius . Experimental design: theory versus application . TAXSY . Daedalus and Icarus . The agricultural experimental stations . Chaos and structure . TAXSY A Rule-based Expert System Shell Developed with SAS Software . TAXSY – A Rule-based Expert System Shell .