Leopard: Understanding The Threat Of Blockchain Domain .

Transcription

Leopard: Understanding theThreat of Blockchain DomainName Based MalwareZhangrong Huang1,2, Ji Huang1,2, andTianning Zang21.School of Cyber Security, UCAS2.Institute of Information Engineering, CAS

Existing Techniques Used by Malware IP FluxIP Flux is a technique which enables malwarechange IP addresses of their C&C 7 Domain Flux (Domain Generation Algorithm)It is another way for malware to evadedetection by generating pseudorandomdomains or dictionary-based domains of rver.comevil5.ccserver.com172.16.10.5

New Threat: Blockchain Domain Name Based Malware Blockchain domain based name malware (BDNbased malware) is a new type of malware whichleverages Blockchain DNS (BDNS). Some authors of malware offered an updatedvariant of malware that included blockchaindomains support. More than 140K domains registered in bothNamecoin and Emercoin. Pioneers of Blockchain DNS.(Figure is from FireEye report)[1] FireEyE report: ucture-use.html

Related Works Patsakis C. et al. analyzed related security issues of introducingblockchain-based DNS and offered some advice to mitigatecorresponding threats. Pleiades, FANCI, Error-Sensor, and BotMiner: They are prior works ofdetecting malware (botnet) based on error information, DNS traffic orHTTPS traffic. Drawback: No suitable solutions to detecting malicious blockchaindomains, due to the special mechanism of BDNS

Our Contributions Leopard: The first prototype of the automatic detection ofmalicious blockchain domains (BDNs). Great performance: System reaches an AUC of 0.9980 on thereal-world datasets and it has an ability to discover 286unknown malicious BDNs. Two datasets: The set of malicious BDNs and the list of DNSservers providing BDNs resolution service.

Outline1. Background2. Automatic Detection3. Evaluation4. Limitations5. Conclusion

Outline1. Background2. Automatic Detection3. Evaluation4. Limitations5. Conclusion

Blockchain Domains Blockchain domains havespecial TLDs that different fromgeneric TLDs and country-codeTLDs.OrganizationsTLDsDNS ServersNamecoin.bitEmercoin.coin .emc .lib .bazarseed1.emercoin.comseed1.emercoin.com Blockchain domains are ofinherent properties. Anonymity Censorship-resistance[1] Block 103341 :https://explorer.emercoin.com/block/103341

Blockchain DNS (Architecture)Root SeversTLD SeversAuthoritative SeversRecursive SeversUsers can issue a BDN query to anyserver which has blockchain domainresource records.

Blockchain DNS (Workflow) Third-party BDNSLeverage proxy or browserplugins to forward DNSrequests to third-partyBDNS.com.org domainresolutionrequestsTLD analysis Local BDNSIf users download chains inadvance, the requests canbe resolved locally.Look up local blockchainresource records.bit.coin DNS resolver(Traditionalprocedure)BlockchainDNS resolver

Outline1. Background2. Automatic Detection3. Evaluation4. Limitations5. Conclusion

Overview of LeopardThird-parityDNSTrafficFilter andAggregateDNS TrafficDatabaseDNS LogsSupplementmissing valueData CollectionExtractFeaturesData dModelMalicious BDNs DiscoveryValidationDataset

Module (Data Collection)ThreatBook Cloud SandboxReport400 samples152 Name servers(NS-list)InternetDig(DNS lookuputility)DNS packetsTransformISP routerCaptured traffic files169 BDNs (malicious)DNS logs

ODNs stands forordinary domainnames with genericTLDs or country-codeTLDs.Module (Data Processing)Alexa listDNS logsODNsAggregationDatasetFilterBDNsVirusTotal169 rs

Module (Malicious BDNs Discovery)Four types of algorithm:Training set TrainTest trainClassificationUnknown setReportL2 Logistic RegressionLinear Support Vector MachineRandom ForestNeural Network

Outline1. Background2. Automatic Detection3. Evaluation4. Limitations5. Conclusion

Goals of The System Q1: Is the system able to distinguish malicious BDNs in realworld network traffic? Q2: Does the system have an ability to detect unknownBDNs (have not been discovered by a vendor likeVirusTotal)?

Summary of Datasets We collected nine-day traffic(about 59GB raw data) andobserved a total of 13,035 IPs. Aggregation format:(domain name, request IP) : src list, rdata setsrc list [(IP1, port1, time1), (IP2, port 2, time2 ), ]rdata set {(record1, ttl1), (record2, ttl2 ), } Aggregated data were dividedinto three sets. Dunknown only hasthe records of unknown BDNs.

Feature Engineering Three categories of features. Time Sequence feature set Source IP feature set Resource Records feature set

Cross-Validation on Training Set The metric used to evaluate theperformance of classifiers is AUC ROC(the area under the receiver operatingcharacteristic curve). The random forest classifier outperformsthe other classifiers and reaches an AUC of0.9941. Linear models are not suitable to solve thisquite difficult problem.

Feature Analysis (1) We assessed the importance of each feature through the mean decreaseimpurity which is a measure of the random forest algorithm to selectfeatures.

Feature Analysis (2) Also, the different combinations of feature sets were assessed by trainingthe same classifier with different features.

Evaluation on Dtest Leopard achieves an AUC of 0.9980. When the detection rate reaches 0.98125,the false positive rate is only 0.1010. Q1: Is the system able to distinguishmalicious BDNs in real-world networktraffic?Answer: Leopard can accurately detectmalicious BDNs

Evaluation on Dunknown Leopard reported 309 malicious records out of 403 and the reportedrecords included 286 unique BDNs and 23 server IPs. Rules to verify the result: Any of the historical IPs of the BDN is malicious. Any of the client IPs of the BDN is compromised. Any threat intelligence related to the BDN exists. All BDNs are malicious. Q2: Does the system have an ability to detect unknown malicious BDNs?Answer: Leopard can successfully detect unknown malicious BDNs.

Insight into Dunknown Phenomenon: 271 BDNs which come from87.98.175.85 are meaningless and looklike randomly generated. The remaining 15BDNs are readable. It seems that cybercriminals may try tocombine the domain generation algorithm(DGA) technique with BDNs. LeveragingDGArchive, we confirmed that BDNs from87.98.175.85 were generated by Necurs.

Outline1. Background2. Automatic Detection3. Evaluation4. Limitations5. Conclusion

Limitations Design Rely on feature engineering and expert knowledge. The system is easily passed by if attackers know features. Rely on “clean” data. Only dealing with BDN-based malware. Evaluation The dataset is a little biased due to selecting the top 5K domains ofAlexa in the training phase. Lacking effective methods to correctly label benign BDNs.

Outline1. Background2. Automatic Detection3. Evaluation4. Limitations5. Conclusion

Conclusion We attempt to appeal on researchers to notice the newthreat. We are the first to propose an automatic detection ofmalicious blockchain domain names and evaluate it withreal-world traffic. We get an insight into detected BDNs and discover a variantmalware which combined DGA and BDN techniques. We present two datasets related to the study of BDN-basedmalware.

Thanks!huangzhangrong@iie.ac.cnData available at: https://drive.google.com/open?id 1YzVB7cZiMspnTAERBATyvqWKGj0CqGT-

Leopard: Understanding the Threat of Blockchain Domain Name Based Malware Zhangrong Huang1,2, Ji Huang1,2, and Tianning Zang 2 1.School of Cyber Security, UCAS 2.Institute of Information Engineering, CAS. Existing Techniques Used by Malwa