Forensic Investigation Of Event Logs By Automatic Anomaly Detection

Transcription

Forensic Investigation of Event Logsby Automatic Anomaly DetectionA thesis submitted for the degree ofDoctor of PhilosophybyHudan StudiawanMurdoch University2020

Copyright byHudan Studiawan2020ii

DeclarationI declare that: a) The thesis is my own account of my research, except where other sourcesare acknowledged, b) All co-authors, where stated and certified by my principal Supervisoror Executive Author, have agreed that the works presented in this thesis represent substantial contributions from myself and c) The thesis contains as its main content, work that hasnot been previously submitted for a degree at any other university.Hudan StudiawanSeptember, 2020iii

Statement of AcknowledgementThe research work and contribution in each chapter has been undertaken by the student.He was responsible for doing the research practically and preparing the papers with advice,suggestions, and corrections by the co-authors. The student is the principle author of eachpaper.StudentHudan StudiawanPrincipal supervisorA/Prof. Ferdous Soheliv

AbstractAttacks on an operating system have become a significant and increasingly common problem. This type of security incident is recorded in forensic artifacts, such as log files. Forensicinvestigators will generally examine the logs to analyze such incidents. An anomaly is highlycorrelated to an attacker’s attempts to compromise the system. This thesis proposes a novelframework to automatically detect an anomaly in a forensic timeline constructed from logfiles. Before identifying anomalies, an automatic log parser is built so that the investigatorsdo not need to define a rule-based parser. Parsing is modeled as named entity recognitionproblem and a deep learning technique, namely the bidirectional long short-term memory,is exploited to parse log entries.This thesis proposes three major methods as the base of the framework. First, a methodfor automatic cluster-based anomaly detection is proposed. The anomaly decision is madebased on the estimated threshold derived from the clustering results. It considers severalstatistical properties, including frequency and inter-arrival rate. Second, anomalies areidentified by establishing a baseline model for normal activities from log files. Another deeplearning technique, namely the deep autoencoders, is employed to construct the baseline.Third, this research proposes an anomaly detection using sentiment analysis of log messages.A negative sentiment means that the investigated log entry is an anomaly. Two methods,specifically the attention-based deep learning and the gated recurrent unit, are proposed toperform the sentiment analysis. This work also addresses the class imbalance issue in thelog data using the Tomek link method.Finally, a fusion technique is applied to combine the aforementioned major methods.The weighted majority voting is used for the final anomaly decision. The detection resultsare then displayed in a forensic timeline to assist the investigators. Experiments on various public datasets indicate that the proposed framework achieves superior performancecompared to other log anomaly detection methods.v

AcknowledgementsI thank God almighty for giving me His blessings during my studies for a doctoral degree.I would like to thank my supervisors, A/Prof. Ferdous Sohel and Dr. Christian Payne.A/Prof. Ferdous has provided incredible support assisting me to manage the research, tounderstand how to propose a novel idea in an academic paper, and to write papers thatwould be accepted for publication. Dr. Christian helped me to develop the critical thinkingskills required for research. Back in 2016, when no one else was interested in my researchproposal, Dr. Christian welcomed and assisted me to enrol at Murdoch University.I also acknowledge my sponsor, Indonesia Endowment Fund of Education (LembagaPengelolaan Dana Pendidikan (LPDP)) that enabled me to study under the BUDI (BeasiswaUnggulan Dosen Indonesia) Scholarship scheme.I am immensely grateful to my parents, Mr and Mrs Tambeh, who gave me their wonderful support throughout this journey. Finally, I would not have been able to accomplish anyof this without the consistent encouragement of my wife Amalia and my daughter Athiyya.vi

PublicationsRefereed journal articles1. H. Studiawan, C. Payne, and F. Sohel, “Graph clustering and anomaly detection ofaccess control log for forensic purposes,” Digital Investigation, vol. 21, pp. 76-87,2017.2. H. Studiawan, F. Sohel, and C. Payne, “A survey on forensic investigation of operatingsystem logs,” Digital Investigation, vol. 29, pp. 1-20, 2019.3. H. Studiawan, F. Sohel, and C. Payne, “Sentiment analysis in a forensic timeline withdeep learning,” IEEE Access (Special Issue on Deep Learning: Security and ForensicsResearch Advances and Challenges), vol. 8, pp. 60664-60675, 2020.4. H. Studiawan, F. Sohel, and C. Payne, “Anomaly detection in operating system logswith deep learning-based sentiment analysis,” IEEE Transactions on Dependable andSecure Computing (Special Issue on AI/ML for Secure Computing), 2020.5. H. Studiawan and F. Sohel, “Anomaly detection in a forensic timeline with deepautoencoders,” under second review in Journal of Information Security and Applications.Refereed international conference papers1. H. Studiawan, F. Sohel, and C. Payne, “Automatic log parser to support forensicanalysis,” Proceedings of the 16th Australian Digital Forensics Conference, pp. 1-10,2018. (Best Paper Award).2. H. Studiawan, C. Payne, and F. Sohel, “Automatic graph-based clustering for security logs,” Proceedings of the 33rd International Conference on Advanced InformationNetworking and Applications, pp. 914-926, 2019.3. H. Studiawan, F. Sohel, and C. Payne, “Automatic event log abstraction to supportforensic investigation,” Proceedings of the Australasian Computer Science Week Multiconference (Session: Australasian Information Security Conference), pp. 1-9, 2020.(CORE Student Travel Award).vii

4. H. Studiawan and F. Sohel, “Performance evaluation of anomaly detection in imbalanced system log data,” Proceedings of the World Conference on Smart Trends inSystems, Security and Sustainability, pp. 239-246, 2020.viii

ContentsDeclaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .iiiStatement of Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . .ivAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .viPublications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Introductionxx11.1Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.2Research aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21.3Scope of the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31.4Contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . .31.5Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42 A survey on forensic investigation of operating system logs82.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82.2Relevant surveys and our contributions . . . . . . . . . . . . . . . . . . . . .102.3Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .112.4Survey methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .112.4.1Forensic framework for classifying studies . . . . . . . . . . . . . . .112.4.2Inclusion and exclusion criteria for literature . . . . . . . . . . . . .132.4.3Paper collection15. . . . . . . . . . . . . . . . . . . . . . . . . . . . .ix

2.5Pre-processing step as forensic readiness of OS logs . . . . . . . . . . . . . .162.5.1OS log security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .162.5.2OS logs as digital evidence . . . . . . . . . . . . . . . . . . . . . . .222.6Acquisition of OS logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .232.7Main analysis of OS log investigation . . . . . . . . . . . . . . . . . . . . . .242.7.1OS log retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .242.7.2Tamper detection of OS logs . . . . . . . . . . . . . . . . . . . . . .272.7.3Event correlation and reconstruction . . . . . . . . . . . . . . . . . .292.7.4Anomaly detection . . . . . . . . . . . . . . . . . . . . . . . . . . . .352.7.5Event log abstraction . . . . . . . . . . . . . . . . . . . . . . . . . .38Visualization of OS logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .392.8.1Forensic timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . .392.8.2Tree-based log visualization . . . . . . . . . . . . . . . . . . . . . . .392.8.3Graph-based log visualization . . . . . . . . . . . . . . . . . . . . . .40Post-process of OS log investigation . . . . . . . . . . . . . . . . . . . . . .412.10 Tools for OS log forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . .422.10.1 General tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .422.10.2 Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .442.11 Public datasets for OS log forensics . . . . . . . . . . . . . . . . . . . . . . .452.11.1 Digital Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .452.11.2 Digital Forensic Research Workshop (DFRWS) Challenge . . . . . .462.11.3 Computer Forensic Reference Data Sets (CFReDS) Project . . . . .472.11.4 The Honeynet Project . . . . . . . . . . . . . . . . . . . . . . . . . .472.11.5 SecRepo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .482.12 Open issues and future directions . . . . . . . . . . . . . . . . . . . . . . . .492.12.1 Pre-processing step as forensic readiness of OS logs . . . . . . . . . .492.12.2 Acquisition of OS logs . . . . . . . . . . . . . . . . . . . . . . . . . .492.12.3 Main analysis of OS log investigation . . . . . . . . . . . . . . . . . .502.12.4 Visualization of OS logs . . . . . . . . . . . . . . . . . . . . . . . . .512.12.5 Post-process of OS log investigation . . . . . . . . . . . . . . . . . .512.82.9x

2.12.6 Tools for OS log forensics . . . . . . . . . . . . . . . . . . . . . . . .522.12.7 Public datasets for OS log forensics . . . . . . . . . . . . . . . . . . .532.13 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Automatic log parser to support forensic analysis53543.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .543.2Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .573.3The proposed event log parser: nerlogparser . . . . . . . . . . . . . . . . . .573.4Log parsing as named entity recognition problem . . . . . . . . . . . . . . .583.5Word and character embedding as input representation . . . . . . . . . . . .593.6Bidirectional long short-term memory (BLSTM) as the main architecture .613.7Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .623.7.1Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .623.7.2Training of nerlogparser . . . . . . . . . . . . . . . . . . . . . . . . .653.7.3Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . .663.8Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Automatic graph-based clustering for security logs68704.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .714.2Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .724.3Proposed method: ASLoC . . . . . . . . . . . . . . . . . . . . . . . . . . . .734.3.1Log preprocessing and the graph model . . . . . . . . . . . . . . . .734.3.2Weighted maximal clique percolation as the clusters . . . . . . . . .754.3.3Finding the optimal parameters using simulated annealing . . . . . .76Experimental results and analysis . . . . . . . . . . . . . . . . . . . . . . . .774.4.1Description of the security log datasets . . . . . . . . . . . . . . . . .784.4.2Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . .784.44.5Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .835 Graph clustering and anomaly detection of access control log for forensicpurposes84xi

5.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .855.2Proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .875.2.1Log preprocessing and proposed graph model . . . . . . . . . . . . .875.2.2Event log clustering based on improved MajorClust . . . . . . . . .925.2.3Anomaly detection of possible attacks . . . . . . . . . . . . . . . . .975.2.4Visualization of access control anomaly . . . . . . . . . . . . . . . .100Experimental results and discussions . . . . . . . . . . . . . . . . . . . . . .1025.3.1Functionality testing for SecRepo dataset . . . . . . . . . . . . . . .1025.3.2Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . .1055.3.3Comparison with existing methods . . . . . . . . . . . . . . . . . . .1055.3.4Experiment on the Kippo log . . . . . . . . . . . . . . . . . . . . . .1095.35.4Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Automatic event log abstraction to support forensic investigation1091106.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1106.2Related work and motivation . . . . . . . . . . . . . . . . . . . . . . . . . .1126.2.1Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1136.2.2Motivating example . . . . . . . . . . . . . . . . . . . . . . . . . . .1146.3Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1156.4The proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1166.4.1Event log preprocessing . . . . . . . . . . . . . . . . . . . . . . . . .1176.4.2Grouping based on word count . . . . . . . . . . . . . . . . . . . . .1186.4.3Graph model for log messages . . . . . . . . . . . . . . . . . . . . . .1196.4.4Grouping with automatic graph clustering . . . . . . . . . . . . . . .1216.4.5Extraction of event log abstraction . . . . . . . . . . . . . . . . . . .123Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1256.5.1Public digital forensic datasets . . . . . . . . . . . . . . . . . . . . .1256.5.2Experimental settings . . . . . . . . . . . . . . . . . . . . . . . . . .1266.5.3Comparison with existing methods . . . . . . . . . . . . . . . . . . .1286.5.4Over-clustering and under-clustering . . . . . . . . . . . . . . . . . .1316.5xii

6.6Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Anomaly detection in a forensic timeline with deep autoencoders1321337.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1337.2Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1367.2.1Forensic timeline analysis . . . . . . . . . . . . . . . . . . . . . . . .1367.2.2Anomaly detection with autoencoders . . . . . . . . . . . . . . . . .1377.2.3Anomaly detection in event log data . . . . . . . . . . . . . . . . . .1387.2.4Anomaly detection in a forensic investigation . . . . . . . . . . . . .1397.3Threat model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1407.4The proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1417.4.1Preprocessing of log files . . . . . . . . . . . . . . . . . . . . . . . . .1417.4.2Generating events . . . . . . . . . . . . . . . . . . . . . . . . . . . .1427.4.3Extracting features . . . . . . . . . . . . . . . . . . . . . . . . . . . .1447.4.4Building baseline with autoencoders . . . . . . . . . . . . . . . . . .1467.4.5Detect anomalies from reconstruction errors . . . . . . . . . . . . . .1487.4.6Building a forensic timeline . . . . . . . . . . . . . . . . . . . . . . .149Experimental results and analysis . . . . . . . . . . . . . . . . . . . . . . . .1497.5.1Digital forensic datasets . . . . . . . . . . . . . . . . . . . . . . . . .1507.5.2Experiment settings . . . . . . . . . . . . . . . . . . . . . . . . . . .1517.5.3Choosing the best threshold . . . . . . . . . . . . . . . . . . . . . . .1527.5.4Comparison with other log anomaly detection methods. . . . . . .1537.5.5Anomaly analysis in a forensic timeline . . . . . . . . . . . . . . . .1567.5.6Anomaly detection on untrained datasets . . . . . . . . . . . . . . .1577.57.6Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 Sentiment analysis in a forensic timeline with deep learning1581598.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1608.2Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1638.2.1Forensic timeline analysis . . . . . . . . . . . . . . . . . . . . . . . .1638.2.2Sentiment analysis in event logs . . . . . . . . . . . . . . . . . . . . .164xiii

8.2.38.38.48.5Anomaly detection in a forensic timeline . . . . . . . . . . . . . . . .164The proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1658.3.1Event log preprocessing . . . . . . . . . . . . . . . . . . . . . . . . .1668.3.2Word embedding layer . . . . . . . . . . . . . . . . . . . . . . . . . .1668.3.3Context attention layer . . . . . . . . . . . . . . . . . . . . . . . . .1688.3.4Content attention layer . . . . . . . . . . . . . . . . . . . . . . . . .1708.3.5Softmax layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1718.3.6Building a forensic timeline . . . . . . . . . . . . . . . . . . . . . . .172Experimental results and analysis . . . . . . . . . . . . . . . . . . . . . . . .1738.4.1Public forensic datasets . . . . . . . . . . . . . . . . . . . . . . . . .1738.4.2Building ground truth for sentiment analysis . . . . . . . . . . . . .1748.4.3Experiment settings . . . . . . . . . . . . . . . . . . . . . . . . . . .1758.4.4Comparison with other methods . . . . . . . . . . . . . . . . . . . .1768.4.5Displaying negative sentiments on a forensic timeline . . . . . . . . .180Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1819 Anomaly detection in operating system logs with deep learning-basedsentiment analysis1839.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1849.2Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1869.2.1Anomaly detection in event log data . . . . . . . . . . . . . . . . . .1869.2.2Deep learning for sentiment analysis . . . . . . . . . . . . . . . . . .1889.2.3Sentiment analysis in event log data . . . . . . . . . . . . . . . . . .1899.2.4Class imbalance in sentiment analysis . . . . . . . . . . . . . . . . .1899.3Threat model and assumptions . . . . . . . . . . . . . . . . . . . . . . . . .1919.4The proposed method: pylogsentiment . . . . . . . . . . . . . . . . . . . . .1919.4.1Preprocessing of operating system logs . . . . . . . . . . . . . . . . .1929.4.2Word embedding as input layer . . . . . . . . . . . . . . . . . . . . .1939.4.3Solving class imbalance with the Tomek link . . . . . . . . . . . . .1949.4.4Gated Recurrent Unit (GRU) layer . . . . . . . . . . . . . . . . . . .195xiv

9.4.59.59.6Softmax as output layer . . . . . . . . . . . . . . . . . . . . . . . . .196Experimental results and analysis . . . . . . . . . . . . . . . . . . . . . . . .1969.5.1Operating system log datasets. . . . . . . . . . . . . . . . . . . . .1979.5.2Experiment settings . . . . . . . . . . . . . . . . . . . . . . . . . . .1989.5.3Comparison with other log anomaly detection methods. . . . . . .2009.5.4Comparison of the Tomek link with other class balancing methods .2039.5.5Comparison with other deep learning-based sentiment analysis . . .2059.5.6Detecting anomalies on unseen datasets . . . . . . . . . . . . . . . .2099.5.7Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .210Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 Fusion method for anomaly detection in a forensic timeline21121310.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21310.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21410.2.1 Fusion methods for anomaly detection . . . . . . . . . . . . . . . . .21410.2.2 Anomaly detection in a forensic investigation . . . . . . . . . . . . .21510.3 The proposed fusion method. . . . . . . . . . . . . . . . . . . . . . . . . .21610.3.1 Anomaly detection based on normal baseline . . . . . . . . . . . . .21610.3.2 Aspect-based sentiment analysis for anomaly detection . . . . . . . .21710.3.3 Anomaly detection with data balancing and sentiment analysis . . .21810.3.4 Classifier fusion for anomaly detection . . . . . . . . . . . . . . . . .21810.3.5 Constructing a forensic timeline . . . . . . . . . . . . . . . . . . . . .21910.4 Experimental results and analyses . . . . . . . . . . . . . . . . . . . . . . .22010.4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22010.4.2 Experiments and analyses . . . . . . . . . . . . . . . . . . . . . . . .22010.4.3 Displaying anomalous events in a forensic timeline . . . . . . . . . .22210.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11 Conclusion and future work22322411.1 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .22411.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .227xv

A Sample list of anomalous log entries229B List of datasets used in this thesis231Bibliography232xvi

List of Tables2.1A summary of key publications in OS log security . . . . . . . . . . . . . . .212.2A summary of key publications in OS log retrieval for forensic purposes . .272.3A summary of key publications in OS log correlation and reconstruction . .292.4A summary of key publications in anomaly detection in event log . . . . . .352.5List of event log forensics tools . . . . . . . . . . . . . . . . . . . . . . . . .432.6OS logs from public forensic case studies and datasets . . . . . . . . . . . .453.1List of event log datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .643.2Training with 15 Epochs for the nerlogparser . . . . . . . . . . . . . . . . .663.3Comparison of performance metrics in percent (%) for traditional methodsand two settings of the nerlogparser . . . . . . . . . . . . . . . . . . . . . .4.167Comparison of CH (the higher, the better) and DB (the lower, the better)for all datasets. Note that the bold values indicate the best results . . . . .815.1Parameter tuning -s for slct and --support for LogCluster . . . . . . . . .1065.2Comparison of proposed technique and the other methods . . . . . . . . . .1075.3The most frequent events in the authentication log (testing dataset) . . . .1086.1List of public forensic datasets . . . . . . . . . . . . . . . . . . . . . . . . .1266.2Parameter settings for experiments . . . . . . . . . . . . . . . . . . . . . . .1276.3F-measure value comparison (in %) of the proposed method and five other7.1methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .129A list of public forensic datasets used for the experiments . . . . . . . . . .151xvii

7.2Performance comparison (%) on the Casper dataset . . . . . . . . . . . . .1537.3Performance comparison (%) on the Jhuisi dataset. . . . . . . . . . . . . . .1547.4Performance comparison (%) on the Nssal dataset . . . . . . . . . . . . . .1557.5Performance comparison (%) on the Honey7 dataset . . . . . . . . . . . . .1557.6Performance of the proposed method (%) on untrained datasets . . . . . . .1578.1A list of public system logs datasets used in this chapter . . . . . . . . . . .1738.2Comparison with other methods on the Casper dataset (%) . . . . . . . . .1778.3Comparison with other methods on the Jhuisi dataset (%) . . . . . . . . . .1778.4Comparison with other methods on the Nssal dataset (%) . . . . . . . . . .1798.5Comparison with other methods on the Honey dataset (%) . . . . . . . . .1799.1A list of public OS logs datasets used in this work . . . . . . . . . . . . . .1989.2Performance comparison (%) of pylogsentiment with other anomaly detectiontechniques on the Casper dataset . . . . . . . . . . . . . . . . . . . . . . . .9.3Performance comparison (%) of pylogsentiment with other anomaly detectiontechniques on the Jhuisi dataset. . . . . . . . . . . . . . . . . . . . . . . . .9.4203Performance comparison (%) of pylogsentiment with other anomaly detectiontechniques on the BlueGene/L dataset . . . . . . . . . . . . . . . . . . . . .9.9203Performance comparison (%) of pylogsentiment with other anomaly detectiontechniques on the Hadoop dataset . . . . . . . . . . . . . . . . . . . . . . .9.8202Performance comparison (%) of pylogsentiment with other anomaly detectiontechniques on the Zookeeper dataset . . . . . . . . . . . . . . . . . . . . . .9.7202Performance comparison (%) of pylogsentiment with other anomaly detectiontechniques on the Honey7 dataset . . . . . . . . . . . . . . . . . . . . . . . .9.6201Performance comparison (%) of pylogsentiment with other anomaly detectiontechniques on the Nssal dataset . . . . . . . . . . . . . . . . . . . . . . . . .9.5201203Performance comparison (%) of pylogsentiment with other deep learning techniques on the Casper dataset . . . . . . . . . . . . . . . . . . . . . . . . . .2069.10 Performance comparison (%) of pylogsentiment with other deep learning techniques on the Jhuisi dataset . . . . . . . . . . . . . . . . . . . . . . . . . . .xviii206

9.11 Performance comparison (%) of pylogsentiment with other deep learning techniques on the Nssal dataset . . . . . . . . . . . . . . . . . . . . . . . . . . .2069.12 Performance comparison (%) of pylogsentiment with other deep learning techniques on the Honey7 dataset . . . . . . . . . . . . . . . . . . . . . . . . . .2079.13 Performance comparison (%) of pylogsentiment with other deep learning techniques on the Zookeeper dataset . . . . . . . . . . . . . . . . . . . . . . . .2089.14 Performance comparison (%) of pylogsentiment with other deep learning techniques on the Hadoop dataset . . . . . . . . . . . . . . . . . . . . . . . . . .2089.15 Performance comparison (%) of pylogsentiment with other deep learning techniques on the BlueGene/L dataset . . . . . . . . . . . . . . . . . . . . . . .2099.16 Comparison of detected anomalies by GRU and pylogsentiment . . . . . . .2099.17 Performance of pylogsentiment (%) on unseen datasets . . . . . . . . . . . .21010.1 Attributes of the two public log datasets used in the experiments . . . . . .22010.2 Performance evaluation on the Windows logs dataset . . . . . . . . . . . . .22110.3 Performance evaluation on the Linux logs dataset . . . . . . . . . . . . . . .221xix

List of Figures2.1Proposed framework of OS log forensics based on Generic Computer ForensicInvestigation Model [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .132.2A taxonomy of OS log forensics literature . . . . . . . . . . . . . . . . . . .142.3A typical model for OS log security using centralization and cryptographicapproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .202.4A typical model for OS log retrieval using XML-based approach . . . . . . .252.5A typical model for event correlation and reconstruction with timestampbased approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.634A typical model for OS log anomaly with profiling and machine learningapproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .372.7Graph-based visualization of Windows event logs from Timesketch . . . . .413.1An example of event log parsing from a syslog file [2] . . . . . . . . . . . . .553.2Log parsing position in a typical forensic analysis . . . . . . . . . . . . . . .573.3IOB tag for named entity recognition from Figure 3.1 . . . . . . . . . . . .583.4Character and word embedding for word init: based on a model by Lampleet al. [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.560An illustration of named entity recognition with BLSTM for a log entryexcluding the main message . . . . . . . . . . . . . . . . . . . . . . . . . . .634.1A block diagram of ASLoC as the proposed clustering method744.2The example of CH for an authentication log (red-triangle dot is the optimum. . . . . . .value and the green crosses are the trial values in ASLoC) . . . . . . . . . .xx79

5.1Block diagram of the proposed method . . . . . . . . . . . . . . . . . . . . .885.2Proposed graph model for authentication log . . . . . . . . . . . . . . . . .895.3The drawback of MajorClust with more vertices . . . . . . . . . . . . . . . .935.4The improvement of MajorClust algorithm. . . . . . . . . . . . . . . . . .945.5Overfitting cluster produced by improved MajorClust . . . . . . . . . . . .955.6The result of refine cluster phase . . . . . . . . . . . . . . . . . . . . . . . .965.7The final clustering of improved MajorClust algorithm . . . . . . . . . . . .975.8Initial clustering result of the first day in the dataset (November 30, 2014) .1015.9The result of refine cluster phase from Figure 5.8 . . . . . . . . . . . . . . .1025.10 Anomaly detection in refine cluster phase . . . . . . . . . . . . . . . . . . .1035.11 Final result of the forensic analysis in the first day of authentication log(November 30, 2014) . . . .

Forensic investigators will generally examine the logs to analyze such incidents. An anomaly is highly correlated to an attacker's attempts to compromise the system. This thesis proposes a novel framework to automatically detect an anomaly in a forensic timeline constructed from log les.