A Look At Targeted Attacks Through The Lense Of An NGO

Transcription

A Look at Targeted Attacks Through the Lenseof an NGOStevens Le Blond, Adina Uritesc, and Cédric Gilbert, Max Planck Institute for SoftwareSystems (MPI-SWS); Zheng Leong Chua and Prateek Saxena, National University ofSingapore; Engin Kirda, Northeastern dThis paper is included in the Proceedings of the23rd USENIX Security Symposium.August 20–22, 2014 San Diego, CAISBN 978-1-931971-15-7Open access to the Proceedings ofthe 23rd USENIX Security Symposiumis sponsored by USENIX

A Look at Targeted Attacks Through the Lense of an NGOStevens Le Blond1Zheng Leong Chua21 MPI-SWS2 NationalAdina Uritesc1Prateek Saxena2Univ. of SingaporeAbstract3 NortheasternUniv.fice of Tailored Access Operations (TAO) [3] and thePeople’s Liberation Army’s Unit 61398 [15]. Recently,researchers also attributed attacks in the Middle East tothe governments of Bahrain, Syria, and the United ArabEmirates [16].There now exists public evidence that virtually everycomputer system connected to the internet is susceptibleto targeted attacks. The Stuxnet attack even successfullycompromised air-gapped Iranian power plants [19] andwas able to damage the centrifuges in the facility. Morerecently, Google, Facebook, the New York Times, andmany other global companies have been compromisedby targeted attacks. Furthermore, political dissidents andNon-Governmental Organizations (NGOs) are also beingtargeted [10, 11, 16].In this paper, we analyze 1,493 suspicious emails collected over a four-year period by two members of theWorld Uyghur Congress (WUC), an NGO representingan ethnic group of over ten million individuals mainlyliving in China. WUC volunteers who suspected thatthey were being specifically targeted by malware sharedthe suspicious emails that they received with us for analysis. We find that these emails contain 1,176 maliciousattachments and target 724 unique email addresses belonging to individuals affiliated with 108 different organizations. This result indicates that, despite their targetedcontent, these attacks were sent to several related victims(e.g., via Cc). Although the majority of these targeted organizations were NGOs, they also comprised a few highprofile targets such as the New York Times and US embassies.We leverage this dataset to perform an empirical analysis of targeted attacks in the wild. First, we analyzethe engineering techniques and find that the languageand topic of the malicious emails were tailored to themother tongue and level of specialization of the victims.We also find that sender impersonation was common andthat some attacks in our dataset originated from compromised email accounts belonging to high-profile ac-We present an empirical analysis of targeted attacksagainst a human-rights Non-Governmental Organization(NGO) representing a minority living in China. In particular, we analyze the social engineering techniques, attack vectors, and malware employed in malicious emailsreceived by two members of the NGO over a four-yearperiod. We find that both the language and topic ofthe emails were highly tailored to the victims, and thatsender impersonation was commonly used to lure theminto opening malicious attachments. We also show thatthe majority of attacks employed malicious documentswith recent but disclosed vulnerabilities that tend toevade common defenses. Finally, we find that the NGOreceived malware from different families and that over aquarter of the malware can be linked to entities that havebeen reported to engage in targeted attacks against political and industrial organizations, and Tibetan NGOs.1Cédric Gilbert1Engin Kirda3IntroductionIn the last few years, a new class of cyber attacks hasemerged that is more targeted at individuals and organizations. Unlike their opportunistic, large-scale counterparts, targeted attacks aim to compromise a handful ofspecific, high-value victims. These attacks have receivedsubstantial media attention, and have successfully compromised a wide range of targets including critical national infrastructures [19], Fortune 500 companies [23],news agencies [20], and political dissidents [10, 11, 16].Despite the high stakes involved in these attacks, theecosystem sustaining them remains poorly understood.The main reason for this lack of understanding is that victims rarely share the details of a high-profile compromisewith the public, and they typically do not disclose whatsensitive information has been lost to the attackers. According to folk wisdom, attackers carrying out targetedattacks are generally thought to be state-sponsored. Examples of national organizations that have been reportedto be engaged in targeted attacks include the NSA’s of1USENIX Association23rd USENIX Security Symposium 543

tivists. Second, whereas recent studies report that malicious archives and executables represented the majorityof the targeted-attack threat [15, 22], we find that malicious documents were the most common attack vector inour dataset. Although we do not find evidence of zeroday vulnerabilities, we observe that most attacks used recent vulnerabilities, that exploits were quickly replacedto adapt to new defense mechanisms, and that they often bypassed common defenses. Third, we perform ananalysis of the first-stage malware delivered over thesemalicious emails and find that WUC has been targetedwith different families of malware over the last year. Wefind that over a quarter of these malware samples exhibited similarities with those used by entities reported tohave carried out targeted attacks.Our work complements existing reports on targeted attacks such as GhostNet, Mandiant, and Symantec Internet Security Threat (ISTR) 2013 [11, 15, 22]. Whereasthe GhostNet and Mandiant reports focus on the attacklifecycle after the initial compromise, this study providesan in-depth analysis of the reconnaissance performed before the compromise. We note that both approaches havepros and cons and are complementary: While it is hardfor the authors of these reports to know how a system became compromised in retrospect, it is equally hard for usto know if the observed attacks will compromise the targeted system(s). Finally, whereas ISTR provides somenumbers about reconnaissance analysis for industrialespionage attacks [22], we present a thorough and rigorous analysis of the attacks in our dataset.Finally, to foster research in this area, we release ourdataset of targeted malware to the community [4].Scope. Measuring real-world targeted attacks is challenging and this paper has a number of important biases. First, our dataset contains mainly attacks againstthe Uyghur and human-rights communities. While thespecifics of the social engineering techniques (e.g., useof Uyghur language) will vary from one targeted community to another, we argue that identifying commonlyused techniques (e.g., topic, language, senders’ impersonation) and their purpose is a necessary step towardsdesigning effective defenses. Another limitation of ourdataset is that it captures only targeted attacks carried outover email channels and that were detected by our volunteers. Although malicious emails seem to constitutethe majority of targeted attacks, different attack vectorssuch as targeted drive-by downloads are equally important. Finally, we reiterate that the goal of this study is tounderstand the reconnaissance phase occurring before acompromise. Analyzing second-stage malware, monitoring compromised systems, and determining the purposeof targeted attacks are all outside of the scope of this paper and are the topic of recent related work [10, 16]. Wediscuss open research challenges in Section 6.Figure 1: Screenshot of a malicious email with an impersonated sender, and a malicious document exploiting Common Vulnerabilities and Exposures (CVE) number 2012-0158 and containing malware. The email replays an actual announcement about a conference inGeneva and was edited by the attacker to add that allfees would be covered.2 OverviewContext. WUC, the NGO from which we have receivedour dataset, represents the Uyghurs, an ethnic minorityconcentrated in the Xinjiang region in China. Xinjiangis the largest Chinese administrative division, has abundant natural resources such as oil, and is China’s largestnatural gas-producing region. WUC frequently engagesin advocacy and meeting with politicians and diplomatsat the EU and UN, as well as collaborating with a varietyof NGOs. Rebiya Kadeer, WUC’s current president, wasthe fifth richest person in China before her imprisonmentfor dissent in 1996, and is now in exile in the US. Finally, WUC is partly funded by the National Endowmentfor Democracy (NED), a US NGO itself funded by theUS Congress to promote democracy. (We will see belowthat NED has been targeted with the same malware asWUC.)WUC has been a regular target of Distributed Denial of Service (DDoS) attacks and telephone disruptions, as well as targeted attacks. For example, theWUC’s website became inaccessible from June 28 toJuly 10, 2011 due to such a DDoS attack. Concurrentlyto this attack, the professional and private phone lines ofWUC employees were flooded with incoming calls, andthe WUC’s contact email address received 15,000 spamemails in one week.Data acquisition. In addition to these intermittentthreats, WUC employees constantly receive suspiciousemails impersonating their colleagues and containing2544 23rd USENIX Security SymposiumUSENIX Association

by our volunteers, originating from well-known webmaildomains (i.e., aol.com, gmx.de, gmx.com, gmail.com,googlemail.com, hotmail.com, outlook.com, and yahoo.com), and verified via Sender Policy Framework(SPF) and DomainKeys Identified Mail (DKIM). SPFand DKIM are methods commonly used to authenticatethe sending server of an email message. By verifyingthat these malicious emails originated from well-knownwebmail servers, we obtain 568 malicious emails whoseheaders are very unlikely to have been tampered with bythe attacker. By repeating our above analysis on theseemails only, we obtain 724 unique email addresses and108 organizations. Other organizations besides WUCinclude NED (WUC’s main source of funding and itself funded by the US congress), the New York Times,and US embassies. In summary, while we obtained ourdataset from two volunteers working for a single organization, it offers substantial coverage not only of oneNGO, but also of those attacks against multiple NGOs inwhich attackers target more than one organization withthe same email. We show the full list of organizationstargeted in our dataset in Appendix A.malicious links and attachments. These emails consistently evade spam and malware defenses deployed bywebmail providers and are often relevant to WUC’s activities. In fact, our volunteers claim that the emails areoften so targeted that they need to confirm their legitimacy with the impersonated sender in person. For example, Figure 1 shows the screenshot of such an emailthat replays the actual announcement for a conference inGeneva organized by WUC. As a result, WUC membersare wary of any emails containing links or attachments,and some of them save these emails for future inspection. We came in contact with two WUC employees whoshared the suspicious emails that they had received (withconsent from WUC). The authors of this work were notinvolved in the data collection.Characteristics of the dataset. The two volunteersshared with us the headers and content of 1,493 suspicious emails that they received over a four-year period.1,178 (79%) of these emails were sent to the privateemail addresses of the two NGO employees from whomwe obtained the data, 16 via the public email address ofthe WUC, and the remaining 299 emails were forwardedto them (126 of these by colleagues at WUC). Overall,89% of these emails were received directly by our volunteers or their colleagues at WUC. As we will see below,they also contain numerous email addresses in the To andCc fields belonging to individuals that are not affiliatedwith WUC.The emails contained 209 links and 1,649 attachments,including 1,176 with malware (247 RAR, 49 ZIP, 144PDF, and 655 Microsoft Office files, and 81 files in otherformats). Our analysis revealed 1,116 malicious emailscontaining malware attachments. (We were not able toverify the maliciousness of the links as most of themwere invalid by the time we obtained the data.) In the following, we analyze malicious emails exclusively and werefer to malicious archives or documents depending onwhether they contained RAR or ZIP, PDF or MicrosoftOffice documents, respectively. Finally, the volunteerslabeled the data wherever necessary, enabling us, for example, to establish that the sender of the emails was impersonated for 84% of the emails. Table 1 summarizesthe main characteristics of these malicious emails.Scope of the dataset. Analyzing the headers of the malicious emails revealed a surprisingly large number of recipients in the To or Cc fields. In particular, we observedthat malicious emails had been sent to 1,250 uniqueemail addresses and 157 organizations. A potential explanation for this behavior could be that the attacker tampered with the email headers (e.g., via a compromisedSMTP server) as part of social engineering so theseemails were only delivered to our volunteers, despitethe additional indicated recipients. To test this hypothesis, we considered only those emails received directlyWhat are targeted attacks? There is no precise definition of targeted attacks. In this paper, we loosely definethese attacks as low-volume, socially engineered communication which entices specific victims into installingmalware. In the dataset we analyze here, the communication is by email, and the mechanism of exploitation isprimarily using malicious archives or documents. A targeted victim, in this work, refers to specific individuals,or an organization as a whole. When necessary, we alsouse the term volunteer(s) to distinguish between our twocollaborators and other victims.The terms targeted attacks and Advanced PersistentThreats (or APTs) are often used interchangeably. Asthis paper focuses on the reconnaissance phase of targeted attacks (occurring before a compromise), we cannot measure how long attackers would have remained incontrol of the targeted systems (i.e., their persistency).As a result, we simply refer to these attacks as targetedattacks, and not APTs, throughout the rest of this paper. We discuss specific social engineering characteristics that make targeted attacks difficult to detect by unsuspecting average users in Section 3, the attack vectorsused in our dataset in Section 4, and the malware families they install in Section 5. Finally, we will discussopen research challenges in Section 6.Ethics. The dataset was collected prior to our contacting WUC and for the purpose of future security analysis.Furthermore, WUC approved the disclosure of all the information contained in this paper and requested that theorganization’s name not be anonymized.3USENIX Association23rd USENIX Security Symposium 545

Table 1: Summary of our dataset originating from two volunteers. Malicious indicates the fraction of emails containingmalware, Impersonated the fraction of emails with an impersonated sender, # recipients and # orgs the number ofunique email addresses that were listed in the To and Cc fields of the malicious emails and the corresponding numberof organizations, respectively.1st volunteer2nd volunteerTotal3Beginning - endSept 2012 - Sept 2013Sept 2009 - Jul 2013Sept 2009 - Sept 2013Size98 MB818 MB916 MBMalicious154/241 (64%)962/1,252 (77%)1,116/1,493 (75%)Analysis of social engineering# recipients124666724# orgs25102108indicated otherwise, the analysis below was performedon emails that were coded by one of the author.) Thetopic was determined by reading the emails’ titles andbodies and, in cases where emails were not written in English, we also used an online translation service. Emailswhose topic was still unclear after using the translatorwere labeled as Unknown.Targeted victims. To determine the targeted victimsof these attacks, we searched the email addresses andfull names of the senders and receivers for the malicious emails originating from trustworthy SMTP servers.When available, we used their public profiles availableon social media websites such as Google, Facebook, andSkype to determine their professional positions and organizations. We assume we have found the social profileof a victim if one of the three following rules applies (inthat order): First, if the social profile refers directly tothe email address seen in the malicious email; second,if the social profile refers to an organization whose domain matches the victims’ email address; or third, if wefind contextual evidence that the social profile is linkedto WUC, Uyghurs, or the topic of the malicious email.Out of 724 victims’ email addresses, we found the profile of 32% (237), 4% (30), and 23% (167) using the first,second, and last rule, respectively.Organizations and industries. In the following, WUCrefers to victims directly affiliated with the organization (including our volunteers). Other Uyghur NGOsinclude Australia, Belgium, Canada, Finland, France,Japan, Netherlands, Norway, Sweden, and UK associations. Other NGOs include non-profit organizations suchas Amnesty International, Reporters Without Borders,and Tibetan NGOs. Academia, Politics, and Businesscontain victims working in these industries. Finally, Unknown corresponds to victims for which we were not ableto determine an affiliation.Ranks. We also translated the professional positionsof the victims into one of the three categories: High,Medium, and Low profile. We consider professional leadership positions such as chairpersons, presidents, and executives as high-profile, job positions such as assistants,and IT personnel as medium-profile, and unknown andshared email addresses (e.g., NGO’s contact information) as low-profile.The GhostNet, Mandiant, ISTR, and other reports [11,15, 22] mention the use of socially-engineered emails tolure their victims into installing malware, clicking onmalicious links, or opening malicious documents. Forexample, the GhostNet report refers to one spoofed emailcontaining a malicious DOC attachment, and the Mandiant report to one email sent from a webmail accountbearing the name of the company’s CEO enticing severalemployees to open malware contained in a ZIP archive.Concurrent work reports the use of careful social engineering against civilians and NGOs in the Middle East[16] and also Tibetan and human-rights NGOs [10]. Despite this anecdotal evidence, we are not aware of anyrigorous and thorough analysis of the social engineeringtechniques employed in targeted attacks. In this section,we seek to answer the following questions in the contextof our dataset: What social traits of victims are generally exploited? Do attackers generally impersonate asender known to the victim and if so who do theychoose to impersonate? Who are the victims? Are malicious emails sentonly to specific individuals, to entire organizations,or communities of users? When are users being targeted? When do usersstart being targeted? Are the same users frequentlybeing targeted and for how long? Are severalusers from the same organization being targetedsimultaneously?3.1Impersonated141/154 (92%)802/962 (83%)943/1,116 (84%)MethodologyThe analysis below focuses on 1,116 malicious emailsreceived between 2009 and 2013.Topics and language. To attempt to understand howwell the attacker knows his victims, we manually categorized the emails (coded) by topic and language. (Unless4546 23rd USENIX Security SymposiumUSENIX Association

Topics of malicious emailsFraction of malicious 921493515743733670.257517185731760.10All2012 2013631560.40.31728100.5259known. Our assumption is that, because our volunteersreceived most of the malicious emails directly, they werelikely to recognize cases where their contacts were being impersonated. We note that labeling is conservative:Our volunteers may sometimes label Spoofed or Typo addresses as Unknown because they do not know the personimpersonated in the attack. This may happen, for example, in cases where they were not the primary target ofthe attack (e.g., they appeared in Cc).Limitations. Our dataset originates from WUC and islimited to those victims that were targeted together withthat organization. We will see that these victims were often NGOs. As a result, the social engineering techniquesobserved here may differ from attacks against differententities such as companies, political institutions, or evenother NGOs. Despite these limitations, we argue that thisanalysis is an important first step towards understandingthe human factors exploited by targeted attacks.130484276UnknownOtherHuman rightsUyghurWUC2009 2010 2011 2012 2013YearsFigure 2: Distribution of the topics of the maliciousemails for each year of the dataset shared by our twovolunteers. The left bar corresponds to the data sharedby both volunteers, and the next two bar groups to eachyear of the data shared by our first and second volunteer, respectively. The content of malicious emails istargeted to the victims.3.2Languages of malicious emailsFraction of malicious 60340.51042591302161121980.3110.20.1042267All2012 2013326668In this subsection, we discuss the results of our analysis of the social engineering techniques used in the malicious emails.Topics and language. The topic of malicious emails inour dataset can generally be classified into one of threecategories: WUC, Uyghur, and human-rights. In particular, we observed 51% (575) of malicious emails pertaining to WUC, 29% (326) to Uyghurs, 12% (139) tohuman-rights, and 3% (28) to other topics. In addition,the native language of the victim is often used in the malicious emails. In fact, 69% (664) of the emails sent to thesecond volunteer were written in the Uyghur language,and 62% (96) for the first one. These results indicate thatattackers invested significant effort to tailor the contentof the malicious emails to their victims, as we see in Figure 2 and Figure 3.Specialized events. In addition to being on topic, wealso observed that emails often referred to specific eventsthat would only be of interest to the targeted victims.Throughout our dataset, we found 46% of events (491)related to organizational events (e.g., conferences). Wenote that these references are generally much more specialized than those used in typical phishing and otherprofit-motivated attacks. For example, Figure 1 shows ascreenshot of an attack that replayed the announcementof a conference on a very specialized topic. The malicious email was edited by the attacker to add that all feeswould be covered (probably to raise the target’s interest).Impersonation. We find that attackers used carefullycrafted email addresses to impersonate high-profile identities that the victims may directly know. That is, attackers used one of the following four techniques to add legitimacy to a malicious email: First, 41% (465) of 009 2010 2011 2012 2013YearsFigure 3: Distribution of languages for each year of ourdataset. Malicious emails employ the language of theirvictims.Impersonation. Finally, to understand the social context of the attack, each of our volunteers coded (basedon her experience within the organization) all the emailaddresses of the senders into one of five categories:Spoofed, Typo, Name, Suspicious, or Unknown. (Codingwas done based exclusively on the personal knowledgeof the volunteers.) An email is marked as Spoofed if itbears the exact sender email address of a person knownto our volunteers, as Typo if it resembles a sender emailaddress known to the receiver but is not identical, and asName if the attacker used the full name of a volunteer’scontact (with a different email address). Finally, emailaddresses that look as if they had been generated bya computer program (e.g., uiow839djs93j@yahoo.com)are labeled as Suspicious and all remaining emails as Un5USENIX Association23rd USENIX Security Symposium 547

Impersonation 1340.78788231303436153130111891155887136612012 20132592422Ranks of impersonated senders292Fraction of malicious emailsFraction of malicious 2141319710046339119041920.11702009 2010 2011 2012 2013YearsFigure 4: Distribution of senders’ impersonation techniques for each year of our dataset. Malicious emailsspoof the email address of a contact of the volunteers,use a very similar address controlled by the attacker,or a contact’s full name.All2012 2013UnknownLowMediumHigh2009 2010 2011 2012 2013YearsFigure 5: Distribution of impersonated senders’ ranks foreach year of our dataset. Malicious emails often impersonate high-profile individuals.Correlation between topics and languages1Fraction of malicious emailsemail addresses have Typos (i.e., the email address resembles known sender addresses, but with minor, subtle differences). These email addresses are identical tolegitimate ones with the exception of a few charactersbeing swapped, replaced, or added in the username. Second, 12% (134) of the senders’ full names correspondedto existing contacts of the volunteers. Third, we findthat most email addresses belonged to well-known emailproviders — Google being the most prominent with 58%of all emails using the Gmail or GoogleMail domains,followed by Yahoo with 16%.Fourth, we find that 30% (337) of the sender emailswere spoofed (i.e., the email was sent from the address ofa person that the volunteer knows). This observation suggests that the attacker had knowledge of the victim’s social context, and had either spoofed the email header, orcompromised the corresponding email account. To identify a subset of compromised email accounts, we consider spoofed emails authenticated by the senders’ domains using both SPF and DKIM. To reduce the chancesof capturing compromised servers instead of compromised accounts, we also consider only well-known, trustworthy domains such as GMail. This procedure yieldsmalicious emails that were likely sent from the legitimateaccount of the victims’ contacts. We found that threeemail accounts belonging to prominent activists, including two out of 10 of the WUC leaders, were compromised and being used to send malicious emails. We havealerted these users and are currently working with themto deploy defenses and more comprehensive monitoringtechniques, as we will discuss in Section 6.We show the distributions of malicious emails sentwith spoofed, typo, suspicious, or unknown email addresses in Figure 4, and the ranks of the 2Others UnknownFigure 6: Distribution of languages employed to writeabout the main topics of malicious emails. There is astrong correlation between malicious emails’ topicsand the language in which they are written.senders in Figure 5. (We do not show the corresponding ranks for receivers because NGOs generally functionwith a handful of employees, all playing a key role in theorganization.)Targeted victims. For the analysis below, which leverages other recipients besides our two volunteers, we further filter emails to keep only those originating fromwell-known domains (as described in Section 2). Doingthis leaves us with 568 malicious emails that are likelyto have indeed been sent to all the email addresses in theheader. We find that the attacks target more organizations than WUC, including 38 Uyghur NGOs, 28 OtherNGOs, as well as 41 Journalistic, Academic, and Political organizations. (See Appendix A for the completelist of targeted organizations.) Interestingly, we find astrong correlation between the topic of an email and thelanguage in which the email was written, as we show inFigure 6. Our results show that English was more and6548 23rd USENIX Security SymposiumUSENIX Association

Summary of Findings. We now revisit the initial questions posed at the beginning of this section. First, wesaw that most emails in our dataset pertained to WUC,Uyghurs, or human-rights, were written in the recipient’s mother tongue, and often referred to very specialized events. We also found that sender impersonationwas common and that some email accounts belonging toWUC’s leadership were compromised and used to spreadtargeted attacks. (We note that many more accounts maybe compromised but remain dormant or do not appearas compromised in our dataset.) Second, we showedthat numerous NGOs were being targeted simultaneouslywith WUC and that the specialization of emails varied depending on the recipient(s). Finally, we observedthat the most targeted victims received several maliciousemails every month and that attacks were sprayed overseveral organizations’ employees.4 Analysis of attack vectorsWe now analyze the techniques used to execute arbitrarycode on the victim’s computer. The related work reports the use of malicious links, email attachments, andIP tracking services [10, 16]. Whereas ISTR 2013 reports that EXE are largely used in targeted attacks, andthe Mandiant report that ZIP is the predominant formatthat they have observed in the last several years, we findthat these formats represent 0% and 4% (49) of maliciousattachments in our dataset, respectively. Instead, we findRAR archives and malicious documents to be the mostcommon attack vectors. Hypotheses that may explainthese discrepancies with the Mandiant report include thetuning of attack vectors to adapt to the defenses mechanisms used by different populatio

specifics of the social engineering techniques (e.g., use of Uyghur language) will vary from one targeted com-munity to another, we argue that identifying commonly used techniques (e.g., topic, language, senders’ imper-sonation) and their purpose is a necessary step toward