A Review Of Intrusion Detection System Using Machine Learning Approach PDF Free Download

1y ago

26 Views

1 Downloads

667.77 KB

8 Pages

Report/dmca

Download PDF

Transcription

International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 12, Number 1 (2019), pp. 8-15 International Research Publication House. http://www.irphouse.comA Review of Intrusion Detection System using Machine Learning Approach1SH Kok, 2Azween Abdullah, 3NZ Jhanjhi, 4Mahadevan Supramaniam1,2,3School of Computer and IT (SoCIT), Taylor’s University, Malaysia.4Research & Innovation Management Centre, SEGI University, Malaysia.1ORCIDs: (0000-0001-9477-8988), 2(0000-0003-4425-8604), 3(0000-0001-8116-4733), 4(0000-0002-3734-0899)The second approach is anomaly-based, or behaviour-based,where IDS will determines an attack when the system operatesout of the norm. This approach can detect both known andunknown attacks. However, the drawback of this approach islow accuracy with high false alarm rate.Lastly, hybrid-based approach uses both signature-based andanomaly-based approaches. This approach uses signaturebased approach to detect known attacks, and anomaly-basedapproach to detect unknown attacks. Combining bothapproaches can ensure a more effective detection, but mayincrease computational cost.Machine Learning (ML) uses statistical modeling approach tolearn past data pattern, and then predicts the most likelyoutcome using new data. Therefore, ML algorithm has beenapplied to IDS using anomaly-based approach. As statedabove, the challenge here is to build a model that can givehigh accuracy with low false alarm rate.Therefore, this study aims to analyse recent researches in IDSusing ML approach; with specific interest in dataset, MLalgorithms and metric. Dataset selection is very important toensure model build is suitable for IDS use. In addition, datasetstructure can affect effectiveness of ML algorithm. Thus, MLalgorithm selection is dependent on the structure of theselected dataset. After that, metric will provide a quantitativeevaluation of ML algorithms towards specific dataset.AbstractIntrusion Detection System (IDS) is an important tool use incyber security to monitor and determine intrusion attacks Thisstudy aims to analyse recent researches in IDS using MachineLearning (ML) approach; with specific interest in dataset, MLalgorithms and metric. Dataset selection is very important toensure model build is suitable for IDS use. In addition, datasetstructure can affect effectiveness of ML algorithm. Thus, MLalgorithm selection is dependent on the structure of theselected dataset. After that, metric will provide a quantitativeevaluation of ML algorithms towards specific dataset. Thisstudy found that soft computing techniques are gettingconsiderable attention, as many have applied it here. Inaddition, many researchers are focusing on the classificationof IDS, which is beneficial in determining known intrusionattacks. However, it may pose a problem in detectinganomalous intrusion, which may include new or modifiedintrusion attacks. For dataset, many researchers were stillusing KDDCup99 and its variant NSL-KDD, although theyare almost 20 years old. This continuous trend could result instatic progress in IDS, while intrusion attacks continue toevolve together with new technologies and user behaviours.Ultimately, this situation will result in the obsolete use of IDSas part of a cyber security tool. Three most used metrices forperformance evaluation for IDS are accuracy, True PositiveRate (TPR) and False Positive Rate (FPR). This is expected,because these metrices provide important indications that arevery relevant to IDS functionality.Keywords - Computation Intelligence, Dataset, IntrusionDetection System, Machine Learning, Soft Computing.II. APPROACHIn order to ensure we review researches of interest only, wepreset some important criteria. Firstly, the article must bepublished in year 2015 and later. This is to ensure we get onlythe most recent researches, so that our study is relevant andnot outdated.Secondly, the article must be published in scientific journal orconference. This is to ensure the validity of the content, whichhave been peer reviewed and approved.Thirdly, the article must use ML for IDS. This is our objectivefor this study, so we must work within the scope of our study.I. INTRODUCTIONIntrusion Detection System (IDS) is an important tool use incyber security to monitor and determine intrusion attack.There are three types of IDS; network IDS, host IDS, andApplication IDS. Network IDS monitors network packet todetect intrusion attack. While host IDS monitors a single host(server or computer). Lastly, application IDS monitors severalknown high risk applications.To determine whether an intrusion attack has occurred or not,IDS depends on few approaches. First is signature-basedapproach, where known intrusion attack signature is stored inthe IDS database to match with current system data. When theIDS finds a match, it will recognise it as an intrusion. Thisapproach provides a fast and accurate detection. However, thedrawback of this is to have periodic update of the signaturedatabase. In addition, the system could be compromisedbefore the newest intrusion attack can be updated.III. MACHINE LEARNINGML algorithm can be categorized into 11 categories. This isshown in Fig. 1. Bayesian category uses Bayes Theorem ofprobability, which determines the probability of specificoutcome to come true. The most popular algorithm in thiscategory is Naïve Bayes.Decision tree has a tree like structure that starts from rootnodes, which is the best predictor. Then progresses through itsbranches until it reach a leave node. This is the decisionoutcome.8

International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 12, Number 1 (2019), pp. 8-15 International Research Publication House. http://www.irphouse.comDimensional reduction is to find features that are important tothe outcome. This will removes irrelevant and redundantfeatures. It is mostly performed during the pre-processingphase. The most popular algorithm is Principal ComponentAnalysis (PCA)analysis. The most popular algorithm in this category isLogistic Regression.Neural network is inspired by the brain cell called neuron thatforms the biological neural network. This category findspatterns from the data to make its prediction. Normally itwould require large amount of data to produce a goodprediction. The most popular algorithm in this category isPerceptron.Instance-based is also known as memory-based learning. Thiscategory of algorithm finds the most similar instances, ortraining data, that matches the new data to make prediction.The most popular algorithm in this category is k-NearestNeighbour (kNN).Ensemble is a method of combining the result of severalalgorithms before producing the final outcome. There aretypically 2 methods, bagging and boosting.Clustering is grouping of data points that are close together toform its own group. This category of algorithm works well inunsupervised learning approach, which do not require labelleddata. The most popular algorithm in this category is k-Means.Table 1 is the summary of researches article found in thisstudy. Information extracted and summarized in this table aredataset, method (or algorithm) and accuracy metric being usedin their researches.Regression algorithm try to build model that can represent therelationship between variables. It is derived from statisticalTable 1 List of recent researches in IDS from 2015 to 2018Reference DatasetMethodAccuracy (%)[2]TRAbID(Probe, DoS)Decision Tree (DT) and Naïve Bayes (NB)Probe; DT (98.42), NB (97.29)DoS; DT (99.90), NB (99.66)[3]CIDDS-001k-Nearest Neighbour (kNN) and k-meanskNN (99.0), k-means (99.7)[4]ISCX 2012Recursive Feature Addition (RFA) with SVM92.90[5]10% KDDKDD DoSNSL-KDDUNSW-NB15Constrained-optimization-based Extreme Learning Machines Binary (98.90),(cELM)(99.90)[6]Real networktrafficPrincipal ComponentOptimization[7]Real networktrafficFuzzy Logic96.50[8]KDDCup99Multi-level hybrid Support Vector Machine (SVM) and ELM95.80[9]NSL-KDDSVM-Radial Basis Function (RBF)98.10[10]NSL-KDDSingle hidden layer feed-forward neural network (SLFN)84.10[11]KDDCup99Hybrid k-means and SVM-RBF88.70[12]NSL-KDDUNSW-NB15Hybrid Artificial Bee Colony (ABC) and Artificial Fish Swarm NSL-KDD (99.00)(AFS)UNSW-NB15 (98.90)[13]KDDCup99UNSW-NB15Genetic Algorithm (GA) as search and Logistic Regression (LR) KDDCup99 (99.90)as learning algorithmUNSW-NB15 (81.40)[14]NSL-KDDHypergraph based Genetic Algorithm (HG-GA)]15\Self-generatedHierarchical Neuron Architecture based Neural Network (HNA- 93.10SCADA network NN)[16]NSL-KDDTime-varying chaos particle swarm optimization (TVCPSO)97.20[17]NSL-KDDMarginal density ratio99.20[18]NSL-KDDClustering ELM (Clus-ELM)77.00Analysis (PCA)9andAntmulti-classColony 96.0097.10

International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 12, Number 1 (2019), pp. 8-15 International Research Publication House. http://www.irphouse.comReference DatasetMethodAccuracy (%)[19]KDDCup99KyotoUniversityBenchmarkDataset (KUBD)Anomaly-detection method based on the change of cluster KDDCup99 (93.30)centres (ADBCC), then k-NNKUBD (95.80)[20]NSL-KDD(exclude U2R)Discrete wavelet transform (DWT)96.70[21]NSL-KDDWeighted one-against-rest SVM (WOAR-SVM)80.70[22]NSL-KDDTwo-layer classification, Genetic Algorithm for Detectors 98.600Generartions (GADG), Random Forest Tree[23]NSL-KDDISCX 2012Hybrid Artificial Bee Colony (ABC) and AdaBoost98.90[24]Gure-KDDKDDCup99Improved many-objective optimization (I-NSGA-III)Gure-KDD (99.60)KDDCup99 (99.40)[25]KDDCup99Cluster center and nearest neighbor (CANN)99.9[26]NSL-KDDHybrid J48, Meta Pagging, RandomAdaBoostM1, Decision Stump, Naïve Bayes[27]KDDCup99NSL-KDDUNSW-NB15Dendron (DT and GA)Tree,REPTree, Binary(98.60)(99.80),MulticlassKDDCup99 (98.90), NSlKDD (97.60), UNSQ-NB15(84.30)Further analysis of this study found that 65% of recentresearches focus on classification, utilizing supervisedmachine learning techniques (as shown in Fig. 2). Therefore,only labelled datasets were used in these researches. However,it may pose a problem in detecting anomalous intrusion,which includes new or modified intrusion attacks.Fig. 3 Approach used in IDS research from 2015-2018IV. DATASETDataset is the key component to train machine learning todetect anomaly threats. However, the analysis from this studyshows that many researchers are still relying on an outdateddataset, KDDCup99 and NSL-KDD (a variant of KDD00dataset), which have been criticized by many as outdated andnot relevant in current network infrastructure. This datasetwas produced in 1999, which is almost 20 years old. Rapiddevelopment and changes in Information Technology such ascloud computing, social media and Internet of Things arechanging the landscape of network infrastructure. Thesechanges have the driving force in changing threat attack itself.Therefore, many research results that demonstrate highaccuracy is being viewed as overstated, because the datasetFig. 2 Research focus area of IDS from 2015-2018In addition, this study also found that 44% of recentresearches used the soft-computing (ensemble and hybrid)approach to tackle IDS problem, as shown in Fig. 3. Thisproves that soft-computing techniques are gettingconsiderable attention from researchers in IDS.10

International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 12, Number 1 (2019), pp. 8-15 International Research Publication House. http://www.irphouse.combeing used does not represent the current threat orinfrastructure.packets from a heavy traffic load. More importantly, there issome confusion about the attack distributions of these datasets.According to an attack analysis, Probe is not an attack unlessthe number of iterations exceeds a specific threshold, whilelabel inconsistency has been reported [26].The KDDCup99 dataset is a popular dataset and has beenused for the Third International Knowledge Discovery andData Mining Tools Competition. Each connection instance isdescribed by 41 attributes (38 continuous or discretenumerical attributes and 3 symbolic attributes). Each instanceis labelled as either normal or a specific type of attack. Theseattacks fall under one of the four categories: Probe, DoS, U2R,and R2L [9], as described below.Thirdly, the emergence of new technologies such as cloudcomputing, social media and the Internet of Things haschanged the network infrastructure drastically. These changeswill also result in new types of threat.The other two popular datasets are ISCX 2012 and UNSWNB15. ISCX 2012 is a dataset created by Information SecurityCentre of Excellence (ISCX) at University of New Brunswickin 2012. This dataset consists of seven days of data withlabelling of normal (one) or attack (two). The dataset has noclassification of the types of attack, thus it will only providebinary classification. However, this dataset is no longeravailable. This is because the centre has created a new dataset,called CICIDS2017 [28]. The centre has also changed itsname to Canadian Institute for Cybersecurity (CIC).Unfortunately, no article was found using this new dataset atthe time of this study.Probing: This type of attack collects information of targetsystem prior to initiating an actual attack.Denial of Service (DoS): This type of attack results inunavailability of network resources to legitimate requests byexhausting the bandwidth or by overloading computationalresources.User to Root (U2R): In this case, an attacker starts out withaccess to a normal user account on the system and is able toexploit the system’s vulnerabilities to gain root access to thesystem.Another popular dataset is UNSW-NB15, this dataset wascreated by Australia Centre for Cyber Security (ACCS) usingIXIA PerfectStorm to generate nine types of attack. Thesenine types of attack are namely fuzzers, analysis, backdoors,DoS, exploits, generic, reconnaissance, shellcode, and worms.The dataset has a total of 47 features with two labels. First isnamed as ‘Label’, where zero indicates normal and oneindicates an attack. Second label is named as ‘attack cat’,which provides the type of attack [29].Remote to Local (R2L): In this case, an attacker who does nothave an account on a remote machine sends a packet to thatmachine over a network and exploits some vulnerabilities togain local access as a user of that machine.V. METRICMetric is the quantitative evaluation of ML algorithmperformance towards specific dataset. It provides a way forcomparison, to determine which model performance betterand by how much. Most metrices can be derived from aconfusion matrix table, as shown in Table 2 below.Accuracy is the most often used metric. This metric providesthe ratio of correctly predicted outcome compared to totalobserved outcome [15]. Therefore it is being used as theprimary metric for comparison in this study. The formula isshown in equation 1:Fig. 4 Dataset being used in IDS research from 2015-2018The NSL-KDD dataset was developed in 2009, but it isactually an improved version of the KDDCup99 dataset. NSLKDD tries to improve KDDCup99 dataset by removingredundant records, including the imbalanced number ofinstances and the variety of attack classes [2]. However, it stillinherited the fundamental limitation of the dataset.𝑇𝑁 𝑇𝑃𝑇𝑁 𝑇𝑃 𝐹𝑃 𝐹𝑁(1)True Positive Rate (TPR) has three other names, but all usedthe same formula. These names are recall, sensitivity, anddetection rate. This metric is the ratio of correctly predictedpositive outcome compared to actually positive observation[15]. The formula is shown in equation 2 below:KDDCup99 has many drawbacks. Firstly, this dataset wasdeveloped in 1999 using a Solaris-based operating system tocollect a wide range of data due to its easy deployment.However, there are significant differences in today's operatingsystems which barely resemble Solaris. In this age of Ubuntu,Windows and MAC, Solaris has almost no market share.𝑇𝑃𝑇𝑃 𝐹𝑁Secondly, the traffic collector used in KDD datasets,TCPdump, is very likely to become overloaded and drop11(2)

International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 12, Number 1 (2019), pp. 8-15 International Research Publication House. http://www.irphouse.comTable 2. Confusion matrix tableThis study found that two metrices were used in more than 70%of researches. These are accuracy and TPR. Accuracyprovides good indication of how well the algorithm canpredict the correct outcome. This is important, because itshows how much the result can be trusted to be correct.Predicted k)FalseNegative(FN)TruePositive(TP)ActualClassTPR, or better known as detection rate, provide an indicationof how well the algorithm can detect and intrusion attack. Thepurpose of IDS is to detect an attack, thus this metric isimportant.Another metric that was used in more than 50% of researchesis FPR. Another name for this metric is False Alarm Rate(FAR). This metric provides indication whether the algorithmwill produce many false alarms. This is important, because itshows how much more work is needed to further filter outthese false alarms observation, after the IDS. This is mostprobably performed by a human expert.False Positive Rate (FPR) is also called false alarm rate (FAR)or fall-out. This metric is the ratio of wrongly predictedpositive outcome compared to actual negative observation[15]. The formula is shown in equation 3 below:𝐹𝑃𝐹𝑃 𝑇𝑁(3)True Negative Rate (TNR) is also called specificity. Thismetric is the ratio of correctly predicted negative outcomecompared to actually negative observation [15]. The formulais shown in equation 4 below:𝑇𝑁𝑇𝑁 𝐹𝑃%(4)False Negative Rate (FNR) is also called miss rate. Thismetric is the ratio of wrongly predicted negative outcomecompared to actually positive observation [21]. The formula isshown in equation 5 below:𝐹𝑁𝐹𝑁 𝑇𝑃MetricFig. 5 Percentage of metric being used in IDS research from2015-2018(5)VI. CONCLUSIONPrecision is the ratio of correctly predicted positive outcomecompared to positive prediction [15]. The formula is shown inequation 6 below:𝑇𝑃𝑇𝑃 𝐹𝑃Soft computing techniques are getting considerable attentionfrom researchers in IDS. This is because this technique is easyto apply and often produce better result compared to singlealgorithm. Proper combination of multiple algorithms is theway forward. Most researchers are focusing on theclassification of IDS, which is beneficial in determiningknown intrusion attacks. However, it may pose a problem indetecting anomalous intrusion, which may include new ormodified intrusion attacks. Therefore to produce a morerobust IDS, clustering algorithm should be considered forfuture development. KDDCup99 and its variant NSL-KDDdatasets are the two most widely used datasets, although theyare almost 20 years old. This continuous trend could result instatic progress in IDS, while intrusion attacks continue toevolve together with new technologies and user behaviours.Ultimately, this situation will result in the obsolete use of IDSas part of a cyber security tool. Therefore new dataset thatrepresent current environment setup, both software andhardware, is important. The latest publicly available dataset isCICIDS2017, should be explored.(6)F-measure is also called F-score. This metric provideperformance evaluation based on precision and recall [22].The formula is shown in equation 7 below:2𝑇𝑃2𝑇𝑃 𝐹𝑃 𝐹𝑁1009080706050403020100(7)Time is the measurement of efficiency. Two phases ofmeasurement can be performed. One measurement duringtraining phase, and the second during testing phase. There areother metrices found in this study, but are less common, thusthose will not be discussed here.12

International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 12, Number 1 (2019), pp. 8-15 International Research Publication House. http://www.irphouse.com[11] U. Ravale, N. Marathe, and P. Padiya, “Featureselection based hybrid anomaly intrusion detectionsystem using K Means and RBF kernel function,”Procedia Comput. Sci., vol. 45, no. C, pp. 428–435,2015.Three most used metrices for performance evaluation for IDSare accuracy, TPR and FPR. This is expected, because thesemetrices provide important indications that are very relevantto IDS functionality. In order to simply the evaluation process,it is possible to develop a metric that can combine all threemetrices.[12] V. Hajisalem and S. Babaie, “A hybrid intrusiondetection system based on ABC-AFS algorithm formisuse and anomaly detection,” Comput. Networks,vol. 136, pp. 37–50, 2018.REFERENCES[1]J. Brownlee, “A Tour of Machine LearningAlgorithms”, e-learning-algorithms/ 2013[2]E. K. Viegas, A. O. Santin, and L. S. Oliveira,“Toward a reliable anomaly-based intrusion detectionin real-world environments,” Comput. Networks, vol.127, pp. 200–216, 2017.[3]A. Verma and V. Ranga, “Statistical analysis ofCIDDS-001 dataset for Network Intrusion DetectionSystems using Distance-based Machine Learning,”Procedia Comput. Sci., vol. 125, pp. 709–716, 2018.[4]T. Hamed, R. Dara, and S. C. Kremer, “Networkintrusion detection system based on recursive featureaddition and bigram technique,” Comput. Secur., vol.73, pp. 137–155, 2018.[5]C. R. Wang, R. F. Xu, S. J. Lee, and C. H. Lee,“Network intrusion detection using equalityconstrained-optimization-based extreme learningmachines,” Knowledge-Based Syst., vol. 147, pp.68–80, 2018.[6]G. Fernandes, L. F. Carvalho, J. J. P. C. Rodrigues,and M. L. Proença, “Network anomaly detectionusing IP flows with Principal Component Analysisand Ant Colony Optimization,” J. Netw. Comput.Appl., vol. 64, pp. 1–11, 2016.[7][8][9][13] C. Khammassi and S. Krichen, “A GA-LR wrapperapproach for feature selection in network intrusiondetection,” Comput. Secur., vol. 70, pp. 255–277,2017.[14] M. R. Gauthama Raman, N. Somu, K. Kirthivasan, R.Liscano, and V. S. Shankar Sriram, “An efficientintrusion detection system based on hypergraph Genetic algorithm for parameter optimization andfeature selection in support vector machine,”Knowledge-Based Syst., vol. 134, pp. 1–12, 2017.[15] S. Shitharth and D. Prince Winston, “An enhancedoptimization based algorithm for intrusion detectionin SCADA network,” Comput. Secur., vol. 70, pp.16–26, 2017.[16] S. M. Hosseini Bamakan, H. Wang, T. Yingjie, andY. Shi, “An effective intrusion detection frameworkbased on MCLP/SVM optimized by time-varyingchaos particle swarm optimization,” Neurocomputing,vol. 199, pp. 90–102, 2016.[17] H. Wang, J. Gu, and S. Wang, “An effectiveintrusion detection framework based on SVM withfeature augmentation,” Knowledge-Based Syst., vol.136, pp. 130–139, 2017.[18] S. Roshan, Y. Miche, A. Akusok, and A. Lendasse,“Adaptive and online network intrusion detectionsystem using clustering and Extreme LearningMachines,” J. Franklin Inst., vol. 355, no. 4, pp.1752–1779, 2018.A. H. Hamamoto, L. F. Carvalho, L. D. H. Sampaio,T. Abrão, and M. L. Proença, “Network AnomalyDetection System using Genetic Algorithm andFuzzy Logic,” Expert Syst. Appl., vol. 92, pp. 390–402, 2018.[19] C. Guo, Y. Ping, N. Liu, and S. S. Luo, “A urocomputing, vol. 214, pp. 391–400, 2016.W. L. Al-Yaseen, Z. A. Othman, and M. Z. A. Nazri,“Multi-level hybrid support vector machine andextreme learning machine based on modified Kmeans for intrusion detection system,” Expert Syst.Appl., vol. 67, pp. 296–303, 2017.[20] S. Y. Ji, B. K. Jeong, S. Choi, and D. H. Jeong, “Amulti-level intrusion detection method for abnormalnetwork behaviors,” J. Netw. Comput. Appl., vol. 62,pp. 9–17, 2016.I. Sumaiya Thaseen and C. Aswani Kumar,“Intrusion detection model using fusion of chi-squarefeature selection and multi class SVM,” J. King SaudUniv. - Comput. Inf. Sci., vol. 29, no. 4, pp. 462–472,2017.[21] A. A. Aburomman and M. Bin Ibne Reaz, “A novelweighted support vector machines multiclassclassifier based on differential evolution for intrusiondetection systems,” Inf. Sci. (Ny)., vol. 414, pp. 225–246, 2017.[10] R. A. R. Ashfaq, X. Z. Wang, J. Z. Huang, H. Abbas,and Y. L. He, “Fuzziness based semi-supervisedlearning approach for intrusion detection system,” Inf.Sci. (Ny)., vol. 378, pp. 484–497, 2017.[22] A. S. Amira, S. E. O. Hanafi, and A. E. Hassanien,“Comparison of classification techniques applied fornetwork intrusion detection and classification,” J.Appl. Log., vol. 24, pp. 109–118, 2017.13

International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 12, Number 1 (2019), pp. 8-15 International Research Publication House. http://www.irphouse.com[23] M. Mazini, B. Shirazi, and I. Mahdavi, “Anomalynetwork-based intrusion detection system using areliable hybrid artificial bee colony and AdaBoostalgorithms,” J. King Saud Univ. - Comput. Inf. Sci.,2018.[24] Y. Zhu, J. Liang, J. Chen, and Z. Ming, “Animproved NSGA-III algorithm for feature selectionused in intrusion detection,” Knowledge-Based Syst.,vol. 116, pp. 74–85, 2017.[25] W. C. Lin, S. W. Ke, and C. F. Tsai, “CANN: Anintrusion detection system based on combiningcluster centers and nearest neighbors,” KnowledgeBased Syst., vol. 78, no. 1, pp. 13–21, 2015.[26] S. Aljawarneh, M. Aldwairi, and M. B. Yassein,“Anomaly-based intrusion detection system throughfeature selection analysis and building hybridefficient model,” J. Comput. Sci., vol. 25, pp. 152–160, 2018.[27] D. Papamartzivanos, F. Gómez Mármol, and G.Kambourakis, “Dendron: Genetic trees driven ruleinduction for network intrusion detection systems,”Futur. Gener. Comput. Syst., vol. 79, pp. 558–574,2018.[28] A. H. L. and A. A. G. Iman Sharafaldin, “TowardGenerating a New Intrusion Detection Dataset andIntrusion Traffic Characterization,” Proc. 4th Int.Conf. Inf. Syst. Secur. Priv., no. Cic, pp. 108–116,2018.[29] N. Moustafa and J. Slay, “UNSW-NB15: acomprehensive data set for network intrusiondetection systems (UNSW-NB15 network data set),”2015 Mil. Commun. Inf. Syst. Conf., no. November,pp. 1–6, 2015.[30] A. A. Aburomman and M. B. I. Reaz, “A survey ofintrusion detection systems based on ensemble andhybrid classifiers,” Comput. Secur., vol. 65, pp. 135–152, 2017.14

Intrusion Detection System (IDS) is an important tool use in cyber security to monitor and determine intrusion attack. There are three types of IDS; network IDS, host IDS, and Application IDS. Network IDS monitors network packet to detect intrusion attack. While host IDS monitors a single host (server or computer).