Transcription
Data Science TutorialEliezer Kanal – Technical Manager, CERTDaniel DeCapria – Data Scientist, ETCSoftware Engineering InstituteCarnegie Mellon UniversityPittsburgh, PA nDistributionis isUnlimitedUnlimited1
About usEliezer KanalDaniel DeCapriaTechnical Manager, CERTData Scientist, ETCRecent projects: ML-based Malware Classifier Network traffic analysis Cybersecurity questionnaireoptimizationRecent projects: Cyber risk situationaldashboard Big Learning benchmarksData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited2
Today’s presentation – a tale of two rolesThe call center managerThe master carpenterIntroduction todata science capabilitiesOverview of thedata science toolkitData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited3
Call center managerFirst day on job welcome!Goal: Reduce costsTask: Keep calls short!Data:Average call time:5.14 minutes (5:08) very long!Number of employees:300Average calls per day: 28,000Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited4
Call center manager – Gather dataGet the data! Where is it? What will you use to analyze it? How accurate it is? How complete is it? Is it too big to easily read?Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited5
Data cleaning 90% of the work2 weeks (10 days) 9 cleaning, 1 analyzingData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited6
Cleaning the Data – Structuring the DataGoal: Organize data in a table, where Columns descriptor (age, weight, height)Row individual, complete recordsHow can you get data out of these documents?Less structureMore structureData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited7
Cleaning the DataEven when you think your data should be clean, it might not be Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited8
Cleaning the Data – Call Center ExampleNameMgrDirBeth JonesDan ThomasAnne KimBeth Jones❶Jones, BethDan ThomasAnne KimDan ThomasPhoneLineProblemsolved?1:301Y3YAnne Kim1:52❶ 90❺ 2Y Mark RyanTim Pike882Mark Ryan❷Tom Keane Kevin WoodTim Pike1443Tim Pike200 ❹Tom KeaneKevin WoodTim Pike❻ Tom KeaneKevin WoodTom KeaneTom KeaneStringCallLength94511❸Tim Pike421CommentN❶No Yes 2No 2Yes IntegerData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University“Nominal” Unstructured2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited9
Call Center manager – Exploring dataData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited10
Exploratory Data Analysis (EDA) Mean Median Standard deviation Histograms!Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited11
Distributions The majority of data willfollow SOME distributiono Weight of all Americans:Gaussiano phone call length:Exponential Determining distribution is acommon Data Science task Multidimensional outliers:Insider Threat exampleImage Copyright 2001-2016 The Apache Software Foundation. See Copyright slide for more details.Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited12
EDA – Smart visualizationsData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited13
Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited14
Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited15
Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited16
Brief interruptionSkeptics inthe audienceData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited17
Brief interruptionData Science helps youuse data to get results.This is it.Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited18
Call center manager – call duration histogramAverage (5:08)Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited19
Call Center manager – Insights!Strategy update: Goodbye “reduce call time” Hello “reduce callbacks”How to measure?“callbacks” isn’t currently capturedData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited20
Feature EngineeringNeed more useful data?Create it yourself!Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited21
Feature Engineering Feature Engineering: coming up with new, useful (i.e.,informative) datao mean, sums, medians, etc.o x2, xy, sqrt(xy), etc. Our case:o # of callbackso Call during peak time?o Overall agent performance? (combination of factors)Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited22
The role of Listening in Data ScienceData science finds hidden patterns in dataExperts know what data & patterns are importantTalk to subject matter expertsData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited23
Call Center manager – Predictive analyticsCan we predict staffing levels one day ahead? one week ahead? one month ahead?Can we determine what types of calls to expect for a product we haven’t had before? for a market we’ve never seen before?Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited24
Example Predictive Analytics QuestionsPredicting Current UnknownsOnline:Which ads are malicious?Security:Is the bank transaction fraudulent?IC:Which names map to the same person (entity resolution)?Predicting Future EventsRetail:What will be the new trend of merchandise that a company should stock?Security:Where will a hacker next attack our network?IC:Who will become the next insider threat?Determining Future ActionsSales:How can a company increase sales revenues?Health:What actions can be taken to prevent the spread of flu?IC:How will a vulnerability patch affect our knowledge/preparedness for future attacks?Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited25
Call Center manager – Predictive analyticsMany techniques available, explored in next sectionData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited26
Call Center manager – ReviewGet dataClean dataEDA, VisualizationInterpretationAction!PredictionData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited27
Because we know our data, we can ask more intelligent questions action-oriented questions questions that can be answeredData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited28
This slide intentionally left blankData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited29
The master carpenter“The right tool for the job”Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited30
Feature Engineering – Part 2”With the wrong wood, I can make nothing”The fuel of data science is dataData preparation is criticalData quality algorithm choiceThat will come up Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited31
Types of Machine Learning AlgorithmsClassification Naïve Bayes Logistic Regression Decision Trees K-Nearest Neighbors Support Vector MachinesRegression Linear Regression Support Vector MachinesClustering K-Means ClusteringData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited32
Types of Machine Learning AlgorithmsApplications: Everywhere Banking Weather Sports scores Economics Environmental science CybersecurityData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited33
Linear Regression – PredictionProblem:If I have examples of X and Y,when I learn a new X, can Ipredict Y?Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited34
Linear Regression – PredictionSolution: Find the line that isclosest to every pointSaid differently: Find the line thatthe SUM of all errors is smallestData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited35
Linear Regression – PredictionThree dimensions,same conceptHUNDREDS of dimensions,same conceptData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited36
Linear RegressionVery widely used Simple to implement Quick to run Easy to interpret Works for many problems First identified in early 1800’s; very well studiedWhen applicable: Works best with numeric data (usually) Works for predicting specific numeric outcomeData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited37
Logistic Regression – ClassificationIdea: Classification using a discriminative model Predict future behavior based on existing labeled data Draws a line to assign labelsMainly used for binary classification: either “red” or “blue”Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited38
Logistic Regression – ClassificationLook at distribution, what’s likely based on current dataProbably blueProbably redData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited39
Logistic RegressionThree dimensions,same conceptHUNDREDS ofdimensions,same conceptData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited40
Classification: Support Vector MachineBarometric PressureIdea: The optimal classifier is the one that is the farthest from bothclassesDew PointData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited41
Classification: Support Vector MachineBarometric PressureIdea: The optimal classifier is the one that is the farthest from bothclassesDew PointData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited42
Classification: Support Vector MachineAlgorithm: Find lines like beforeBarometric Pressure Assign a cost to misclassified data points based on distancefrom the classification line𝝃𝟐𝝃𝟏Dew PointData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited43
Classification: Decision TreesIdea: Instead of drawing a single complicated line through the data,draw many simpler lines.Algorithm: Scan through all values of all features to find the one that “helpsthe most” to determine what data gets what label.Barometric Pressure Divide the data based on that value, and then repeat recursivelyon each part.Dew PointData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited44
Classification: Decision TreesIdea: Instead of drawing a single complicated line through the data,draw many simpler lines.Algorithm: Scan through all values of all features to find the one that “helpsthe most” to determine what data gets what label. Divide the data based on that value, and then repeat recursivelyon each part.Barometric Pressure𝐷𝑃 60Dew PointData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited45
Classification: Decision TreesIdea: Instead of drawing a single complicated line through the data,draw many simpler lines.Algorithm: Scan through all values of all features to find the one that “helpsthe most” to determine what data gets what label (“informationgain”).Barometric Pressure Divide the data based on that value, and then repeat recursively𝐷𝑃 60on each part.𝐵 760Dew PointData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited46
Classification: Decision TreesBenefits: Works well when small. Very easy to understand!Challenges: Trees overfit easily Very sensitive to data; Random ForestsBarometric Pressure𝐷𝑃 60𝐵 760𝐷𝑃 55Dew PointData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited47
Classification: K-Nearest NeighborsIdea: A new point is likely to share the same label as points aroundit.Algorithm: Pick constant k as number of neighbors to look at.Barometric Pressure For each new point, vote on new label using the k neighborlabels.Dew PointData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited48
Classification: K-Nearest NeighborsIdea: A new point is likely to share the same label as points aroundit.Algorithm: Pick constant k as number of neighbors to look at.Barometric Pressure For each new point, vote on new label using the k neighborlabels.Dew PointData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited49
Classification: K-Nearest NeighborsIdea: A new point is likely to share the same label as points aroundit.Algorithm: Pick constant k as number of neighbors to look at.Barometric Pressure For each new point, vote on new label using the k neighborlabels.Dew PointData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited50
Classification: K-Nearest NeighborsWorks well when there is a good distance metric and weighting function to voteon classificationChallenges: Not a smooth classifier; points near each other may getclassified differently Must search all your data every time you want to classify a newpoint When k is small (1,2,3,4), essentially it is overfitting to the datapointsData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited51
Clustering Unsupervised learning Structure of un-labled data Organize records into groups based on some similarity measure Cluster is the collection of records which are similarData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited52
Clustering: K-meansIdea: Find the clusters by minimizing distances of cluster centers to data.Algorithm: Instantiate k distinct random guesses 𝜇- of the cluster centers Each data point classifies itself as the 𝜇- it is closest to it Each 𝜇- finds the centroid of the points that were closest to it andjumps thereBarometric Pressure Repeat until centroids don’t moveDew PointData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited53
Clustering: K-meansIdea: Find the clusters by minimizing distances of cluster centers to data.Algorithm: Instantiate k distinct random guesses 𝜇- of the cluster centers Each data point classifies itself as the 𝜇- it is closest to it Each 𝜇- finds the centroid of the points that were closest to it andjumps thereBarometric Pressure Repeat until centroids don’t moveDew PointData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited54
Clustering: K-meansIdea: Find the clusters by minimizing distances of cluster centers to data.Algorithm: Instantiate k distinct random guesses 𝜇- of the cluster centers Each data point classifies itself as the 𝜇- it is closest to it Each 𝜇- finds the centroid of the points that were closest to it andjumps thereBarometric Pressure Repeat until centroids don’t moveDew PointData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited55
Clustering: K-meansIdea: Find the clusters by minimizing distances of cluster centers to data.Algorithm: Instantiate k distinct random guesses 𝜇- of the cluster centers Each data point classifies itself as the 𝜇- it is closest to it Each 𝜇- finds the centroid of the points that were closest to it andjumps thereBarometric Pressure Repeat until centroids don’t moveDew PointData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited56
Clustering: K-meansIdea: Find the clusters by minimizing distances of cluster centers to data.Algorithm: Instantiate k random guesses 𝜇- of the clusters Each data point classifies itself as the 𝜇- it is closest to it Each 𝜇- finds the centroid of the points that were closest to it andjumps thereBarometric Pressure Repeat until centroids don’t moveDew PointData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited57
Clustering: K-meansWorks well when there is a good distance metric between the points the number of clusters is known in advanceChallenges:Barometric Pressure Clusters that overlap or are not separable are difficult to clustercorrectly.Dew PointData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited58
InfluencersGoal: Detect the people who control or distribute informationthrough a network.Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited59
Influencers: Degree CentralityIdea: Influential people have a lot of people watching them.Equation Degree centrality number of directed edges to the node- High degree centrality people are those with large numbers offollowers. If undirected graph, transform to bi-directional and computeData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited60
Influencers: Degree CentralityIdea: Influential people have a lot of people watching them.Equation Degree centrality number of directed edges to the node- High degree centrality people are those with large numbers offollowers. If undirected graph, transform to bi-directional and compute3111213Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University12017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited61
Influencers: Betweenness CentralityIdea: Influential people are “information brokers” who connectdifferent groups of people.Algorithm Find all shortest paths from all nodes to all other nodes in thegraph. Betweenness centrality for a node sum over all start and endnodes of the number of shortest paths in the graph that includethe nodeData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited62
Influencers: Betweenness CentralityIdea: Influential people are “information brokers” who connectdifferent groups of people.Algorithm Find all shortest paths from all nodes to all other nodes in thegraph. Betweenness centrality for a node sum over all start and endnodes of the number of shortest paths in the graph that includethe node33314721211Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University52017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited63
Indicator communitiesBut what if we aren’t startingwith a reference indicator?We assume that indicatorsgenerated by a coherent realworld process will be more likelyto co-occur in tickets thanarbitrary pairs of indicators.Find groups of highly similarindicators in complete indicatorticket graph.Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited64
Indicator-ticket graphA subset of the ticket-indicator graph(for a small set of selected indicators) Tickets are grey triangles Indicators are black circles Edges connect tickets to the indicators theycontainData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited65
Machine Learning Is GrowingPreferred approach for many problems Speech recognition Natural language processing Medical diagnosis Robot control Sensor networks Computer vision Weather prediction Social network analysis AlphaGO, Watson Jeopardy!Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited66
This slide also intentionally left blank, just like the earlier oneData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited67
What we did todayNameMgrDirBeth JonesDan Thomas Anne KimBeth JonesDan Thomas Anne KimJones, BethDan Thomas Anne Kim❶Tom KeaneMark RyanTim PikeLength LineSolved?Comment1:301Y❺ 1:523Y 902Y 882N ❶Average (5:08)Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited68
What we did todayData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited69
Data Science helps youuse data to get results.Data Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited70
Eliezer KanalTechnical Manager(412) 268-5204ekanal@sei.cmu.eduDaniel DeCapriaData Scientist(412) 268-2457djdecapria@sei.cmu.eduData Science TutorialAugust 10, 2017 2017 Carnegie Mellon University2017 SEI Data Science in Cybersecurity SymposiumApproved for Public Release; Distribution is Unlimited71
2017 SEI Data Science in Cybersecurity Symposium Approved for Public Release; Distribution is Unlimited About us Eliezer Kanal Technical Manager, CERT Recent projects: ML-based Malware Classifier Network traffic analysis Cybersecurity questionnaire optimization Daniel DeCapria Data