Google Dorks: Use Cases And Adaption Study - Utupub.fi

Transcription

Google dorks: Use cases and Adaption studyUNIVERSITY OF TURKUDepartment of Future TechnologiesMaster of Science in Technology ThesisNetworked Systems SecurityOctober 2020Reza AbasiSupervisors:Dr. Ali FarooqDr. Antti HakkalaThe originality of this thesis has been checked in accordance with the University of Turku qualityassurance system using the Turnitin OriginalityCheck service.i

UNIVERSITY OF TURKUDepartment of Future TechnologiesReza Abasi: Google dorks: Use cases and adaption studyMaster of Science in Technology Thesis, 93 pages.Networked Systems SecurityOctober 2020The information age brought about radical changes in our lives. More and more assets aregetting connected to the Internet. On the one hand, the connectivity to this ever-growingnetwork of connected devices and assets (the Internet) precipitates more convenience andaccess to various resources. However, on the downside, the Internet could be the hotbedfor malicious actors like hackers, attackers, and cybercriminals’ communities.Continuous Penetration testing and monitoring of the sites, and forums providing illicitdigital products and services is a must-do task nowadays. Advanced searching techniquescould be employed for discovering such forums and sites. Google dorks that are utilizingGoogle’s advanced searching techniques could be applied for such purpose. Google dorkscould be used for other areas that we will explain during this thesis in more detail likeinformation gathering, vulnerability detection, etc.The purpose of this thesis is to propose advanced searching techniques that will helpcybersecurity professionals in information gathering, reconnaissance, vulnerabilitydetection as well as cyber criminal investigative tasks. Further, a usability study has beenconducted to examine the acceptance of these techniques among a group of cybersecurityprofessionals. In this usability study, we will measure the significance of 5 variables inthe innovation diffusion model (IDT) namely Complexity, Compatibility, Relativeadvantage, Trialability, and observability in the adoption of Google dorks for searchrelated tasks for cybersecurity professionals.Keywords: Google dorks, Cybercriminal forums, Information gathering, Dark web,Defaced sites, Innovation diffusion theoryii

Table of ContentsList of figures . vi1 Introduction . 12 Problem statement and literature review . 32.1 Practical Problems . 32.2 Literature Review . 53 Search engine hacking . 103.1 Description of the process . 103.2 Most commonly used search queries . 113.3 Basic search queries: intitle, intext, inurl, site, ext, intitle:” indexof” 113.4 Creating more advanced queries. 143.4.1 Use Case: Finding mail subdomain of governmental sites with googledorks . 143.5 Duckduckgo and Bing search queries . 153.6 Stop words for search engines . 184 Information gathering . 204.1 Introduction to information gathering . 204.2 Sensitive information disclosure, utilizing exploit-db . 224.2.1 Files containing interesting information . 234.2.2 Files containing usernames . 254.2.3 Files containing passwords . 274.2.4 Enumeration using robots.txt and sitemap.xml files . 294.2.5 Find emails and passwords from file sharing sites, paste sites . 314.2.6 Find files containing emails and passwords in sites designed by WordPress 344.2.7 Find emails and passwords from random sites . 354.3 Vulnerability detection and enumeration . 354.3.1 Finding sites probably vulnerable to SQL injection, XSS . 364.3.2 Web server detection . 364.3.3 Popular CMS, forum software sensitive file and folder enumeration . 394.3.4 Error Messages, log files . 434.3.5 Vulnerable servers . 444.3.6 Vulnerable files . 454.3.7 Web asset and online device discovery. 454.4 Automation tools for using dorks . 485 Cyber investigation Use Cases . 495.1 Threat intelligence hunting . 49iii

5.2 Defaced Sites . 495.2.1 Finding defaced sites by the same defacer group. 525.2.2 Finding defacers’ telegram, Facebook, twitter, skype, and other social mediaaccounts . 535.3 Cybercriminal activities . 555.3.1 Cybercriminal forums . 555.3.2 Finding cybercrime channels in telegram . 575.3.3 Finding cybercrimes in paste sites . 585.3.4 Finding autobuy shops using Google dorks: selly.gg, shoppy.gg . 585.3.5 Finding ICQ and vk.com channels, and Facebook groups offeringcybercriminal products . 595.3.6 Find carding, hacking sites in darknet using dorks and pastebin sites . 605.3.7 Finding darknet criminal sites using Google dork and web2tor proxies . 615.3.8 Use Case: Find Clearnet of an onion site with Google dorks . 636 Usability Study . 656.1 Experimental design . 656.2 Measures . 696.3 Research Method . 706.4 Result and Discussion . 706.4.1 Scales reliability and validity testing . 726.4.2 Descriptive statistics . 736.4.3 Correlation and Multiple Regression Analysis . 756.5 Discussion, Limitation and Future Research . 807 Conclusion. 81References . 82Appendix: Measurement items . 92iv

List of tablesTable 1: Duckduckgo.com search syntax . 16Table 2: Bing.com search syntax . 17Table 3: Demographic statistics . 71Table 4: Standard deviation, mean, and Cronbach‘s alpha reliability . 73Table 5: Comparison between the mean and standard deviation of pretest and posttestsurveys (Paired Sample Test) . 74Table 6: Inter-correlation of constructs . 76Table 7: Inter-correlation of constructs . 76Table 8 :Multiple Regression(Pretest) . 77Table 9: Hypothesis test for pretest . 78Table 10: Multiple Regression(Post-test) . 79Table 11: Hypothesis test for posttest . 79v

List of figuresFigure 1: Google dorks usage . Error! Bookmark not defined.Figure 2: Google dorks example . 13Figure3: Detecting site's web technology . 14Figure 4: Finding contact us page . 21Figure 5: Google Hacking Database statistics . 23Figure 6: Finding datababases.yml file . 24Figure 7: Finding a user's profile . 26Figure 8: Carding forum 's member page . 27Figure 9: Finding db.conf file . 28Figure 10: Finding robots.txt . 30Figure11: Finding the sitemap.xml file . 31Figure12: Finding pasting sites . 32Figure13: Credentials in pasting sites . 33Figure14: Credentials in file sharing sites . 34Figure15: SQLi vulnerable sites . 36Figure 16: Web server detection . 37Figure17: Sites protected by WAF . 38Figure18: Finding the phpinfo.php file . 39Figure19: Wordpress admin login page . 40Figure 20: Backup files in wordpress . 41Figure 21: Wordpress dorks in Google hacking Database . 42Figure 22: Finding error pages . 43vi

Figure 23: Error pages revealing emails . 44Figure 24: VBulletin dorks in Google Jacking Database . 45Figure 25: Mikrotik hotspot login page . 46Figure 26: Marshal video login portal . 47Figure 27: Defaced sites . 50Figure 28: Defaced page in /upload/ directory. 51Figure 29: Defaced page found in sites built with wordpress . 52Figure 30: Indonesian error system . 53Figure 31: The Facebook account of defacer . 54Figure 32: Telegram account of defacer . 54Figure 33: Carding forum . 56Figure 34: Hacking forum . 56Figure 35: Credentials in telegram channels . 57Figure 36: Credentials in vk.com . 59Figure 37: Carding groups on Facebook . 60Figure 38: Finding onion sites . 61Figure 39: HQER counterfeits . 62Figure 40: Darknet site mirror in surface web . 63Figure 41: Server info of onion site . 64Figure 42: Conceptual model . 68Figure 43: Research stages . 70vii

Figure 44: Mean,std.deviation-pretest,posttest comparison . 74viii

1 IntroductionThanks to the information age our life has been highly affected by the tremendous growthof access to 24/7 resources. By the time of writing this thesis, there are almost more than4.5 billion users connected to the Internet throughout the world and the number isconstantly increasing [1]. Using social media sites and applications like Facebook,Twitter, WhatsApp, and Telegram has become more and more common. Simultaneouslythe number of websites and other resources connected to the internet growing rapidly.However, locating the most useful resources for our day in day out search-relatedactivities through this huge amount of resources not always easily possible. Searchengines like Google, Yahoo, Bing, Yandex, and Duckduckgo try their best to index moreand more of the resources connected to the Internet so the users’ searches are moreefficient. By far the most widely used search engine is Google [2]. The number ofsearches for Google search engines is above 5.8 billion daily by the time of writing thisthesis [1]. To facilitate the search, Google provided syntaxes that will limit the searchresults, which will increase the effectiveness of the search [3].It could be highly beneficial for actors in the cybersecurity community, both white hat,and black hat hackers, to adopt using such syntaxes and even combining them to producecustomized advanced Google searching techniques, also known as Google dorks, thatfacilitate their search-related tasks. Google Hacking Database (GHDB) [4] that is part ofthe exploit-db.com bestows a good set of such Google dorks. Advanced google searchingtechniques or Google dorks can be beneficial for various activities for cybersecurityactors including information gathering, vulnerability detection, discovering filescontaining sensitive information, credentials finding through paste services and filesharing sites, finding criminal forums, locating defaced sites, and even searching in thedark web.This thesis is organized as follows. Chapter two covers the background and alreadyexistent literature that examines Google dorks for various areas, as well as practical issuesand areas we believe that applying Google dorks will be beneficial.1

In chapter three we cover searching syntaxes of not only Google but other famous searchengines like Yahoo, Bing, and Duckduckgo.In chapter four we discuss how Google dorks can be applied for information gatheringabout the targets both by white hat hackers and black hat hackers. We briefly explainsome of the categories of GHDB such as files containing interesting information,usernames or passwords, login portals, and online devices discovery. Additionally, wereview Google dorks that are useful for finding credentials. Another area that Googledorks could be beneficial is finding vulnerabilities or potential points for starting attackslike SQL injection that is going to be covered in this chapter. Eventually, in the chapter‘slast section, we will briefly examine some of the well-known tools for automation ofGoogle dorks like Bingoo, xgdork, Zeus, and pagodo. The pagodo automation tool isdiscussed in more detail.In chapter five we shed light on some of the areas where Google dorks could be highlybeneficial but are less studied. Detection of cybercriminal forums like carding sites andforums, defaced websites detection, the social media channels and groups cybercriminaland defacers actively participate in, Google dorks for locating cybercrime sites in the darkweb, discovering the Clearnet site of a dark web site (if exist) and simple still useful dorksfor deanonymization of dark web cybercriminal sites will be covered in this chapter.Last but not least, in chapter six we analyze results from two surveys that have beendesigned based on the Innovation Diffusion Theory (IDT), together with a workshopabout Google dorks based on quasi-experimental time-series design to measure howcybersecurity actors in our sample group adopt Google dorks for their search-relatedtasks.Finally, the thesis ends with concluding remarks in chapter seven that includes a briefoverview of the overall process of this thesis.2

2 Problem statement and literature reviewIn this chapter, we will cover some practical issues from the author of this thesis‘s pointof view applying Google dorks could be beneficial for solving these issues. Additionallythe next part of this chapter includes the literature review of using Google dorks betweenthe cyber security actors.2.1 Practical ProblemsIn both hacking and penetration testing the reconnaissance and information gathering isa lengthy process. Google dorks make it easier time-wise. For instance, checking thecompanies' addresses, emails, the staff is not always easily found but with dorks moreeasily especially time-wise they can be accessed.Another practical problem that Google dorks can be utilized is database verification bysecurity experts, of the data breaches, which is possible by different methods. One typicalapproach is to discover registration links, login pages, or reset password links andchecking whether the registration is possible with the credentials that exist in the breach.Finding login, resetting the password, registering pages of sites is not alwaysstraightforward. The category “pages containing login portal” helps and facilitates thesetasks. This category already exists in exploit-db/GHDB and some of the existing dorksregistered by the author of this thesis. Besides, from the hackers and penetration testers'side, the login pages can also be a pivot point for “Brute Force Attacks”. Similarly, loginpages of online devices could be targeted for Brute force attacks.The next practical area that Google dorks can be applied is to finding communicationchannels for cybercriminals, like the Telegram channels that are somewhat commonlyused by cybercriminals, is not convenient. Telegram channels sharing hacked accounts,carding channels, hacking channels as well as Facebook groups and vk.com, and othersocial media applications are a practical problem for the organizations monitoring thesechannels and groups. Thanks to Google dorks these are more convenientlyaccomplishable.3

Besides there are companies that inform their clients about their cyber exposure and theyneed to gather publicly available shared accounts from any possible resources, like pastesites, telegram channels, and so on. They also require to be informed about cybercriminalactors’ malicious activities in terms of their business and monitor the cybercriminal siteand forums. Finding such activities is a real challenge especially if not using Googledorks.Moreover, from the cyber defensive perspective as well as the law enforcement side,being informed about the defacer groups, and the already defaced sites that potentiallycould be compromised is essential. Google dorks as we will examine in this thesis arequite useful. We will provide examples of sites that have been defaced without beingnoticed. Furthermore, with these dorks, it is possible to find all the defaced sites by asingle defacer or a defacer group. So if the security researchers try to identify as much aspossible information about the activities of hacking and defacing groups these dorkscould facilitate their tasks.Another practical challenge especially for law enforcement authorities is to find the darkwebsites and forums and deanonymize them, find if the same dark websites and forumsare active, and exist on the surface web, and generally speaking gather as much aspossible information about such sites and forums. Using Google dorks and web2tor sites[5] it is made possible to search for dark web content on the surface web.Last but not least, passive reconnaissance has priority over active reconnaissance in thissense that there is no need to clear the track for the hackers and penetration testers sothough there are multiple tools for doing these phases of penetration testing and hacking,the priority is to do it passively. Google dorks facilitate this phase as well as even somemore steps like vulnerability detection, WAF detection, Web server detection.Generally speaking, many of the above-mentioned tasks may be possible without Googledorks but the point is that especially in the industry that lots of tasks need to be donewithin a short time frame, disregarding Google dorks makes the tasks much lengthier aswell as less precise and with more false-positive. Almost all of the topics that will becovered in this thesis are based on real tasks that needed to be done in a limited timeframe.4

2.2 Literature ReviewIn this section, we will try to shed light on the existing literature regarding the areasGoogle dorks could be beneficial for search-related tasks in the cybersecurity community.However, at the same time, other areas exist that Google dorks can be applied andfacilitate the tasks of the cybersecurity actors but less work has been done in those areas.Considering the existing literature, phases like vulnerability detection, files containingsensitive information, information gathering are scrutinized while other areas likediscovering cybercriminal forums and their communication channels or searching fordarknet sites with Google dorks are not covered in the existing literature but have beencovered in the other sections of this thesis. The diagram is a schematic of these two areas.Findingcybercriminalforums forinvestigativetasks andmonitoringFindingcredentialsfrom pastingservices andfile sharingsitesInformationgatheringabout targetsVulnerabilitydetection oftargetsGoogle dorksSearching indarknetFindingdefaced mmunicationchannels forinvestigativetasks andmonitoringFigure 1: Google dorks usage(Blue color indicates areas covered in existing literature and orange color indicatesareas not covered in literature but we covered in this thesis)5

The existing literature covered the four below areas:1. Definition of Google dorks from various resources2. How the attackers used Google dorks3. How the security researchers used Google dorks4. Tools that utilized Google dorks as their core for activitiesThough in some context there is a definition for Google dorks in some other context thereis not a specific definition of Google dorks. For instance, Google dorking is a conceptthat lacks definition but simply they are queries that use advanced operators offered byGoogle search engine to retrieve sensitive information or vulnerable systems [5].Additionally, others elucidated Google dorking is a hacking technique that uses Google’scapabilities to locate specific files and vulnerabilities in web applications(using Googledorks for vulnerability detection) [6]. As an example of the usage of Google dorks forvulnerability detection, Google dorks were used to locating the monitor and control ofthe sluice gate [7]. There are studies that considered the Google dorks capabilities tolocate the login portals and passwords, that is part of information gathering about targets,so according to them what is known as google dorking originates from the “googlehacking” community and was used for locating login information/passwords orvulnerable systems [8]. Detection of vulnerable sites made pretty straightforward usingGoogle dorks, so Google dorks could be scrutinized as one of the simplest methods tolocate vulnerable sites, to put it simply it is a specific search request that uncoverswebsites that match the parameters in the request [9].The so-called term “Google dorks” was coined by Johnny Long on his sitejohnny.ihackstuff.com first time and he explained the search techniques for findinginformation left on websites that revealing information about their assets that could beexploited by hackers [10].Gathering the information about the target is one of the reasons attackers and penetrationtesters use Google dorks. Employing the indexing power of Google as a powerful searchengine, Google dorks capable of pinpointing the sensitive information about targets whilein some of the cases it is unknown by the site owners that those files containing the6

sensitive information exist(another example of Google dorks for information gatheringabout targets and vulnerability detection) [11]. Additionally, locating the informationfrom a site that is not possible or barely possible applying the typical searching by securitypractitioners is another usage of Google dorks [12]. Locating the error messages,documents and files, Network devices, open directories as well as numerous otherinteresting sources of information are made possible using Google dorks [13]. Maliciousattackers, as well as cybersecurity actors like penetration testers, security researchers, andcyber investigators, utilize Google dorks also known as Google hacking techniques insome contexts, to acquire interesting information about their target. This worthyinformation in the majority of cases left unintentionally and mistakenly by organizationsor companies on publicly available servers so they can be accessed by Google dorks thatare advanced Google searching techniques.In a study in 2014,1000 various Google dorks were used on numerous targets and foundthat around 300,000 sites that potentially were vulnerable[14]. This was part of the studywork for proving that vulnerability detection of the websites is more convenientlypossible utilizing well-crafted Google dorks. However, the questions that whether theattackers using Google dorks for spotting vulnerabilities or not or even which Googledorks checked by them and how they will apply Google dorks against their targetsremained unanswered in that academic work [15].Prominent hacking communities and groups benefit Google dorks for their objectives.For instance two famous hacking groups namely Lulzsec and Anonymous enjoyed thebenefits of using Google dorks and used it as their main method of locating vulnerabilitiesof their targets [16]. Especially in the last decade, the attackers employed Google dorksto locate vulnerable computers all over the US [17].Attackers benefit from Google dorks to locate potentially vulnerable sites to a web attackknown as SQLi (Structured Query language Injection). Additionally the attackers able tosearch for terms like DOT SQL or DOTPWD and having the list of sites that contains theinformation about the site [18]. As an example of how the attackers can utilize the benefitsof Google dorks in their objective, they can find the credentials of the websites that7

mistakenly left in the site directories or files with a well-crafted Google dork. One of thebenefits of such cases is that the victim is unaware that the attacker obtained thecredentials illegitimately [19]. Attackers also employ Google dorks for their objectivesespecially to figure out where to launch their attack against their targets, though notalways this could assist them in revealing their desired information. For instance, Googledorks were used in 2004 to an e-commerce package named Comersus that was ASPbased and had flaws in one of its files namely cemersus message.asp. One of thecategories that Google dorks can be beneficial is to discover the default pages, especiallythe login pages or pages containing con

customized advanced Google searching techniques, also known as Google dorks, that facilitate their search-related tasks. Google Hacking Database (GHDB) [4] that is part of the exploit-db.com bestows a good set of such Google dorks. Advanced google searching techniques or Google dorks can be beneficial for various activities for cybersecurity