Evaluation Of Classifier For Efficient Intrusion Detection System .

Transcription

International Journal of Information Communication Technology and Digital ConvergenceVol. 4, No. 1, June 2019, pp. 1-7Evaluation of Classifier for Efficient Intrusion Detection SystemImplementationKrishna PandeyESC Pvt Ltd, Kathmandu, NepalE-mail: kp.krispandey@gmail.comAbstractAbstract: Security breach has been recorded in high volume and has compromised severalInformation Systems and critical applications as well. An Intrusion Detection is the process ofanalyzing the events occurring in an information system in order to detect different security threatsand vulnerabilities. Research and development communities are putting their extra effort foroptimizing Intrusion Detection System performance as network data traffic includingvulnerabilities are found to be complex and have shown dynamic properties. The idea to explore ifcertain classifier perform better for certain attack classes constitutes the motivation for thisresearch work. In this research, performance of a comprehensive set of potential classifiers usingKnowledge Discovery and Data (KDD99) dataset has been evaluated. Based on evaluated results,maximum accurate classifier for high attack detection rate and low false alarm rate has beenchosen and suitable classifier has been proposed. The comparison of simulation result indicatesthat noticeable performance improvement can be achieved with the proposed classifier to detectdifferent kinds of network attacks and security vulnerabilities.Keywords: Intrusion Detection System (IDS), KDD99 dataset, Classifier selection, Security.1. IntroductionDue to remarkable escalation in networked computer resources, a variety of networkbased applications have been developed to provide services in many different areas, e.g.,e-Commerce services, e-Procurement services, entertainment etc. The increase in thenumber of networked machines has show the way to an increase in unauthorized activity,not only from outside attackers, but also from inside attackers, such as discontentedemployees and people abusing their privileges for personal gain. An intrusion is definedas any set of events that compromise the integrity, confidentiality or availability of aresource. If a system is capable to assure that these three security tokens are fulfilled, it isconsidered as secure. Intrusion detection (ID) is an approach of security managing schemefor computers and networks. ID is the process of monitoring and analyzing the actionsoccurring in a computer system in order to detect signs of security problems [1].Intrusion Detection Systems (IDS) are primarily focused on identifying probableincidents, monitoring information about them, tries to stop them, and reporting them tosecurity administrators in real-time environment, and those that exercise audit data withsome delay (non-real-time). The latter approach would in turn delay the instance ofdetection.Manuscript Received: 21 Jan. 2019 / Revised: 28 Mar. 2019 / Accepted: 10 Jun. 2019Corresponding Author: Krishna PandeyAuthor’s affiliation: ESC Pvt LtdE-mail: kp.krispandey@gmail.comISSN: 2466-0094Copyright IJICTDC

2Evaluation of Classifier for Efficient Intrusion Detection System ImplementationFigure 1. Simple Intrusion DetectionIn addition, organizations apply IDSs for other reasons, such as classifying problemswith security policies, documenting existing attacks, and preventing individuals fromviolating security policies. IDSs have become a basic addition to the securityinfrastructure of almost every organization. A usual Intrusion Detection System isdemonstrated in Figure 1.Intrusion Detection Systems are broadly classified into two types. They are host-basedand network-based intrusion detection systems. Host-based IDS employs audit logs andsystem calls as its data source, whereas network-based IDS employ network traffic as itsdata source. A host-based IDS consists of an agent on a host which identifies differentintrusions by analyzing audit logs, system calls, file system changes (binaries, passwordfiles, etc.), and other related host activities. In network-based IDS, sensors are placed atstrategic position within the network system to capture all incoming traffic flows andanalyze the contents of the individual packets for intrusive activities such as denial ofservice attacks, buffer overflow attacks, etc. Each approach has its own strengths andweaknesses. Some of the attacks can only be detected by host-based or only by networkbased IDS [2].There are two main strategies of ID: misuse detection and anomaly detection. Misusedetection attempts to match patterns and signatures of already recognized attacks in thenetwork traffic. A continuously updated database is usually used to accumulate thesignatures of known attacks. It cannot identify a novel attack until trained for them.Anomaly detection attempts to recognize behavior that does not conform to normalbehavior. This technique is based on the detection of traffic anomalies. The anomalydetection systems are adaptive in nature, they can deal with new attack, but they cannotidentify the specific type of attack.The main disadvantage of this method is that there is no clear-cut method for definingnormal behavior. Therefore, such type of IDS can report intrusion, even when the activityis legitimate. One of the major problems encountered by IDS is large number of falsepositive alerts that is the alerts that are mistakenly analyzed normal traffic as securityviolations. An ideal IDS does not produce false or inappropriate alarms. In practice,signature-based IDS found to produce more false alarms than expected. This is due to thevery general signatures and poor built in verification tool to authenticate the success ofthe attack. The large number of false positives in the alert logs generates the course oftaking corrective action for the true positives, i.e. delayed, successful attacks, and laborintensive.The ideal goal of efficient IDS is to detect novel attacks by unauthorized users innetwork traffic. We consider an attack to be novel if the vulnerability is unknown to thetarget's owner or administrator, even if the attack is generally known and patches and

IJICTDC 20193detection tests are available. There are basically four types of remotely launched attacks:denial of service (DOS), U2R, R2L, and Probes. A DoS (Denial of Service) attack is atype of attack in which the unauthorized users build a computing or memory resources toobusy or too full to provide reasonable networking requests and hence denying users accessto a machine e.g. ping of death, neptune, back, smurf, apache, UDP storm, mail bomb etc.are all DoS attacks. A remote to user (U2R) attack is an attack in which a user forwardsnetworking packets to a machine through the internet, which he/she does not have right ofaccess in order to expose the machines vulnerabilities and exploit privileges which a localuser would have on the computer e.g. guest, xlock, xnsnoop, sendmail dictionary, phf etc.A R2L attacks are regarded as the exploitations in which the unauthorized users start offon the system with a normal user account and tries to misuse vulnerabilities in the systemin order to achieve super user access rights e.g. xterm, perl. A probing is an attack inwhich the hacker scans a machine or a networking device in order to determineweaknesses or vulnerabilities that may later be exploited so as to negotiate the system.This practice is commonly used in data mining e.g. portsweep, saint, mscan, nmap etc.[2].1.1 Problem StatementIDS are a Rule Based Monitoring and Controlling System, therefore, selection ofalgorithm used to define standard rule base is a major challenge. The selection ofimproper algorithm and model can maximize the occurrence of false alarm rate, highresource consumption, and low intrusion detection rate and may result inefficiencyto entire system and may even lead to security vulnerabilities. The proper selectionof classifier algorithm leads to increase in efficiency of IDS being implemented.1.2 Research PurposeThe main objective of this research work is to recommend the best classificationalgorithm with performance assessment on different rule-based classificationtechnique to detect intrusions and anomalies for efficient and reliable IDSimplementation.This research work has recommended the best classification algorithm based oncomparative assessment of different algorithms that has been studied within the timeframe of this project duration.Several research works have already been done and many research papers havebeen published regarding improvement of intrusion detection system (IDS). Since,each of the papers has focused on different algorithmic techniques beingimplemented in IDS with their resulted output in simulation tools as well. However,the comparative analysis is very rare and proposed research is crucial in today’stime in order get the de-facto standard for efficient IDS implementation. There is aresearch work performed by G. Kalyani, A. Jaya Lakshmi on Nov 2012, entitled“Performance Assessment of Different Classification Techniques for IntrusionDetection”. The paper presents the comparison of different classification techniquesto detect and classify intrusions into normal and abnormal beha viours using WEKAtool. The algorithms or methods tested are Naive Bayes, j48, OneR, PART and RBFNetwork Algorithm. With a total data of 2747 rows and 42 columns have been usedto test and compare performance and accuracy among the classification methods thatare used. [1].The research work done by Hassan [3] “Current Studies on Intrusion DetectionSystem, Genetic Algorithm and Fuzzy Logic” includes that with the concept offuzzy logic, the false alarm rate in establishing intrusive activities can be reduced. Aset of efficient fuzzy rules can be used to define the normal and abnormal behaviorsin a computer network. [2]. Janakiraman and Vasudevan [4] performed a research

4Evaluation of Classifier for Efficient Intrusion Detection System Implementationwork entitled “An Intelligent Distributed Intrusion Detection System using GeneticAlgorithm”, this covers the distributed intrusion detection and prevention plays anincreasingly important role in securing computer networks. To overcome thelimitations of conventional intrusion detection systems, alerts are made indistributed intrusion detection system which are exchanged and correlated in acooperative fashion. This paper presents an intelligent learning approach usingGenetic Algorithm (GA) for distributed Intrusion Detection System (DIDS) [3].Abdullah et al. [5] have performed a research work entitled “PerformanceEvaluation of a Genetic Algorithm Based Approach to Network Intrusion DetectionSystem”. The purpose of the work was to apply genetic algorithm (GA) to networkintrusion detection system. Markey [6] performed the research work entitled “UsingDecision Tree Analysis for Intrusion Detection: A How-To Guide”. The worksuggested that data mining techniques, such as decision tree analysis, offer a semiautomated approach to detect adversely threats. Either approach allows corporationsor security teams to quickly, easily, and inexpensively implement decision treeanalysis and gain unique security insights based on the corporation’s network data.[5]2. MethodologyThe methodologies followed for this research work is shown in Figure 2.Figure 2. Implementation and Comparison Model of the System2.1 Implementation and Comparison ModelAs per the requirements of an Intrusion Detection system, the construction of theimplemented system consists of four major components as shown in in Figure. 2. Feature

IJICTDC 20195extraction, Instance labeling, perform classification using different classifiers and theidentification of the strong classifier.2.2 Experimentation ModalityThe focus of the work is to improve the network attack detection rate and to reduce thefalse alarm rate to a minimum level. The choice of proper data mining tools playsconsiderable role based on their ease of use, cost, and numbers of data mining algorithmsbeing supported. Popular open source data mining packages include WEKA, YALE,TANAGRA, KNIME, etc. [5] This experiment has been conducted by using an opensource data mining software tool WEKA (Waikato Environment for Knowledge Analysis),which comprises a group of machine learning packages for classification of samples.Using WEKA, the measurable features from the test data set have been extracted alongwith filtration of proper attributes. The instances of the dataset have been labeled either asnormal or any of the anomaly’s categories. Various classifiers have been tested withprovided KDD cup 99 dataset to obtain the required output and measurable quantity.Based on the comparison result, the appropriate classifier has been suggested to beimplemented as a rule in Snort IDS for its effectiveness.3. Performance EvaluationTo observe the performance evaluation of classifiers, various comparisons have beendone and comparison chart has been presented. This has included the Comparison ofperformance of different classifiers including False Alarm rate comparison, CorrectlyClassified Instances, RMS error, True Positive Rate, Receiver Operating CharacteristicsArea or Area under curve, 160.98Test950.90.10.180.98ROC Area0.7RMSError73TypeTrainCorrectlyclassifiedFP RateOne RTP RateBayesNetinstances (%)NaïveBayesDatasetClassifierTable 1. Performance evaluation table.4. Analysis of WEKA results and outputThe WEKA output is broken down into run information, classifier model and crossvalidation results. This experiment focuses mainly on the classifier model of WEKA

6Evaluation of Classifier for Efficient Intrusion Detection System Implementationoutput. The classifier model section is the most pertinent for intrusion detection. Whenusing the WEKA decision table classifier, the section shows a set of rules identified todetermine whether or not the connection is malicious. This decision table listed abovestates that connections with a 24488 and b 704 are anomalies.For provided KDD cup’99 data set, the Table 1 summary shows that the model had97% accuracy in differentiating anomalies from normal traffic as well as some other errorstatistics. The detailed accuracy by class section presents a number of statistics for use indata mining however these statistics can be difficult to interpret. The confusion matrixpresents how the decision table algorithm classified the data as compared to the actualcategory of the data. Decision table rule correctly classify 97% of the network dataexamined. After building this model, this rule is incorporated into SNORT to identifymalicious activity in real time.4.1. IDS implementationSNORT IDS is implemented in Linux (Ubuntu) environment with following steps:Installation of Linux Ubuntu, installation of Apache 2, installation of MySQL 5.5,installation of PHP, configuration of SNORT in snort.conf file, creation of Snort DB forcapturing the data traffic, feeding of data in MySQL DB for signature DB and ultimately,configuring of BASE for reporting alerts.5. Conclusion and Future WorksAll anomaly-based intrusion detection systems work on the assumption that normalactivities differ from the anomalies (intrusions). In this research, the best classifier basedon the available standard data set for performing training and testing purpose in intrusiondetection has been evaluated by experiments. The experiments have been done with theKDD ‘99 audit data and have shown the approach to effectively detect intrusive programbehavior. Also, comparisons on different classifier using normal and anomaly basedvarious attributes has been done. Experiments on Naïve Bayes Classifier, Bayes Net, OneR, Decision Table classifier has been done and results are satisfactory in classifyingnormal and abnormal traffic in various rounds of iteration.Different algorithms and rules have been tested with training data set and verified bytesting data set. This has included the decision table, Naïve Bayes, OneR, Bayes Netclassifier as well. The result thus obtained with experimentation of various algorithms hasbeen compared and best observed classifier has been recommended to be implemented inIDS.By evaluating the performance of various classifiers, as a result, decision table haspresented the better accuracy in classifying the dataset. This particular decision table ruleset been configured in SNORT and implemented in IDS. The rule set configured hasclassified the intrusions and tested across the real environment setup. The SNORTdatabase has been created in MySQL and the data being captured by SNORT in real timeis stored in particular database as a strong signature DB for further packet analysis.Additionally, BASE has been configured to generate the reports of traced alerts withrespect to various protocol types.Thus, the objective of the research work entitled “Evaluation of Classifier for EfficientIntrusion Detection System Implementation” has been successfully accomplished.Research work has undergone the following methodologies: dataset analysis, featureextraction, instance labeling, perform classification using different classifiers, evaluateand identify strong classifier, implementation of rule in IDS, creation of signature DB for

IJICTDC 20197effective IDS implementation and configuration of BASE to generate the reports of tracedalerts with respect to various protocol types; to achieve the target goal set.In this study, particularly we carried out with selection of 5 different attributes and withthis particular combination the decision table classifier gives maximum correctlyclassification instances result. However, different combination of attributes may generatevarious results and may declare another classifier as the better one. The effort to carry outexperiment with selection of various combinations of attributes to generate the maximumclassified result and even greater than 97% accuracy can be another research work andpart of future work. However, the suitable mapping between the selected attributes andclassifier needs to be considered while performing the future work in respective domain.References[1]Y.Z.Wang and M.L. Her, Compact microstrip bandstop filters using stepped-impedance resonator (SIR)and spur-line sections. IEE Proceedings-Microwaves, Antennas and Propagation, vol. 153(5), pp.435440.[2]A. J. Lakshmi and G. Kalyani, Performance Assessment of Different Classification Techniques, IOSRJournal of Computer Engineering (IOSRJCE), (2012), pp. 1-5.[3]M. M. M. Hassan, Current Studies on Intrusion, Detection System, Genetic Algorithms and Fuzzy Logic,International Journal of Distributed and Parallel Systems (IJDPS) vol.4 (2) (2013), pp. 1-13.[4]V. Vasudevan and S. Janakiraman, An Intelligent Distributed Intrusion Detection System using GeneticAlgorithm,Journal of Convergence Information Technology, vol. 4(1), pp. 70-76.[5]B. Abdullah, I. Abd-Alghafar, G. I. Salama, A. Abd-Alhafez, Performance evaluation of a geneticalgorithm-based approach to network intrusion detection system. In Proceedings of the InternationalConference on Aerospace Sciences and Aviation Technology, (2009, May).[6]A. A. J. Markey, Using Decision Tree Analysis for Intrusion Detection- A How to Guide, SANSInstitute, Global Information Assurance Certification Paper, (2011).[7]P. P. Balasubramanie and G. G. Natesan, Improving the Attack Detection Rate in Network IntrusionDetection using Adaboost Algorithm, Journal of Computer Science, vol. 8 (7), pp.1-8.

Algorithm", this covers the distributed intrusion detection and prevention plays an increasingly important role in securing computer networks. To overcome the limitations of conventional intrusion detection systems, alerts are made in distributed intrusion detection system which are exchanged and correlated in a