Open Source Software And Global Entrepreneurship: A .

Transcription

Open Source Software and GlobalEntrepreneurship: A Virtuous CycleNataliya Langburd WrightFrank NagleShane GreensteinWorking Paper 20-139

Open Source Software andGlobal Entrepreneurship: AVirtuous CycleNataliya Langburd WrightHarvard Business SchoolFrank NagleHarvard Business SchoolShane GreensteinHarvard Business SchoolWorking Paper 20-139Copyright 2020, 2021 by Nataliya Langburd Wright, Frank Nagle, and Shane Greenstein.Working papers are in draft form. This working paper is distributed for purposes of comment and discussion only. It maynot be reproduced without permission of the copyright holder. Copies of working papers are available from the author.Funding for this research was provided in part by Harvard Business School.

1Open Source Software and Global Entrepreneurship: A Virtuous Cycle1August 2021Nataliya Langburd Wright (Harvard Business School)Frank Nagle (Harvard Business School)Shane Greenstein (Harvard Business School)ABSTRACTWe consider the relationship between open source software (OSS) and entrepreneurship aroundthe globe. This study measures how country-level participation on the GitHub OSS platformaffects the founding of innovative ventures globally, and subsequently, how new ventureformation affects OSS contributions. We estimate these effects using cross-country variation innew venture founding and OSS participation. The study finds that an increase in GitHub“commits” from people residing in a given country generates an increase in the number of newtechnology ventures within that country. This holds particularly for contributions of code fromthose not affiliated with an organizational account, and for globally-oriented, mission-oriented,and high-quality ventures. The reverse relationship also holds more weakly: an increase in thefounding of new IT ventures generates an increase in OSS commits for only globally-orientedand high-quality entrepreneurship. Together, the results suggest a virtuous cycle between OSSand entrepreneurship in some but not all settings. Policymakers may utilize OSS as a lever tostimulate innovative entrepreneurial ecosystems, and investors may use OSS as an earlyindicator of high-quality entrepreneurial activity across geographies.1nlangburdwright@hbs.edu, fnagle@hbs.edu, sgreenstein@hbs.edu. We thank Rem Koning for helpful comments.We also would like to thank Yanuo Zhou for excellent research assistance and Maggie Kelleher for excellent copyediting. All errors remain our own.

2I.IntroductionOpen source software (OSS) became mainstream without much fanfare (DiBona and Ockman,1999, DiBona, Stone, and Cooper, 2005). Mainstream software development and applicationswidely employ open source software today. Two decades of experience have routinized resourcesharing (Lakhani and Wolf, 2003, Lakhani and von Hippel, 2003) and communications betweenprogrammers with different backgrounds (Aksulu and Wade, 2010, Krogh et. al., 2012). Opensource reduces time to develop innovative software modules, eliminates hassles from negotiatingintellectual property, and reduces friction associated with raising capital for software development(Nagle, 2019b; Wen, Ceccagnoli, and Forman, 2016). Today open source is an essentialcomponent of artificial intelligence, of web-enabled commerce, and of most software for big data.While the benefits to participating in open source communities have been documentedwithin high-income countries (Lerner and Schankerman, 2010; Nagle, 2018), the focus ondeveloped economies limits the observation and ignores the state of global labor markets.Programmer workforces have grown in the middle-income countries of Central Europe and Asia,and account for tens of billions of dollars of services a year (Agrawal, Lacetera, and Lyons, 2016;Stanton and Thomas, 2015; Barach, Golden and Horton, 2020). Just like their counterparts indeveloped economies, programmers around the globe employ open source tools, speak thevocabulary of open source, and interact with open source libraries (Nagle et. al, 2020). Further,the dynamism and accessibility of open source could represent an opportunity for low- and middleincome countries to reach the technological frontier more quickly than if they needed to developsuch software from scratch or obtain it from costly sources, lowering the challenges of “catchingup” in areas where knowledge about software and related business processes fosters capabilitiesin new geographies (Lee and Lim, 2001).

3According to the conventional view, open source enhances employment opportunities forparticipants, facilitates tasks within existing employment, and encourages innovativeentrepreneurial initiatives. What part of this conventional view can be extended to a global viewof open source and entrepreneurship? In this study we consider whether open source equallyshapes entrepreneurship, or whether some channels have more effect, and vice versa.Figure 1 can motivate the question. It shows insights from 207 countries and illustrates thelast year of data from our study, 2016. The figure displays the correlation between broad measuresof open source participation and entrepreneurship in that country, displayed in log-log scale. Weask the reader to momentarily defer questions about definitions (which we address below) andfocus on the forest and not the trees in raw data. The figure illustrates the benefit of analyzing morethan just high-income countries. Income clearly plays a role, but substantial variance remains.What does this variance across countries show about the prevalence of open source, and whetherit strengthens or diminishes entrepreneurship? What evidence suggests a causal link, if any?While there are a variety of ways to measure open source participation andentrepreneurship within the US, none of them provide a viable approach to measuring activityoutside US borders, nor over time. This study pioneers a global approach for 2000-2016 with newdata. We utilize data from GitHub, the largest repository of OSS in the world, which is widelyadopted across countries. We match it to a measure of worldwide entrepreneurship, sourced fromCrunchbase. No other source provides a better standardized proxy over time and across the globe.Identification arises in steps. We start with OLS estimates of the association of OSS withnew venture founding and then the reverse association. We compare this with the evidence for acausal interpretation using 2SLS/IV approaches. We next consider disaggregation, examining OSS

4contributions from organization-affiliated or individual user accounts.2 Then we disaggregate theeffect on different types of innovative ventures – whether the venture aspires to a global or missionorientation, and whether they are high quality, as evidenced by entrepreneurial financing andacquisitions. We interpret these differences as proxies for different channels of influence.Consistent with our approach, we estimate the relationship in both directions. The focus on thetwo-way relationship enables us to address two related questions: Is there evidence of a causal linkbetween these activities, and evidence of a virtuous cycle, where each reinforces the other? Doesevery type of entrepreneurship and OSS play an equal role in this relationship?We find an association between GitHub participation and entrepreneurship, and this holdsfor a variety of definitions of entrepreneurship. A one percent increase in GitHub commits (codecontributions) in a given country in a year is associated with a 0.2-0.6 percent increase ininformation technology (IT) ventures and a 0.02-0.1 percent increase in OSS ventures in thatcountry – roughly 5-15 new IT ventures and 0.004-0.02 OSS ventures per year per country onaverage.3 In terms of other narrow effects, perhaps the results are more surprising. A one percentincrease in GitHub commits leads to a 0.02-0.6 percent increase in the number of globally- andmission-oriented ventures, indicating that increases in OSS activity shapes the direction these newfirms take. A one percent increase in GitHub commits is associated with a 0.3-0.8 percent increasein the value of new venture financing deals, a 0.2-0.5 percent increase in the number of newfinancing deals, and a 0.1-0.4 percent increase in the number of technology acquisitions,suggesting 382-1,019 million in new venture financing, 5-10 new financing deals, and 0.3-1.5acquisitions per country per year.2Organization-affiliated accounts are those for which an individual user joined an organization prior to the date of agiven commit. Individual user accounts are those not associated with an organization at the time of a given commit.3The baseline number of new OSS ventures in most countries is quite small when compared to all IT ventures,hence the large difference in impact.

5We also find some evidence that new venture formation predicts OSS contributions, andwe explore variance in the robustness of that evidence. A one percent increase in new IT venturesin the prior year is associated with a 0.7-0.9 percent increase in GitHub commits in the next year,equating to roughly 53,000 – 69,000 lines of code. A one percent increase in new globally-orientedIT ventures is associated with a 0.7-0.9 percent increase in OSS commits. A one percent increasein high-quality ventures as proxied by financing and acquisitions is associated with a 0.3-0.7percent increase in GitHub commits, equal to roughly 23,000-53,000 lines of code. The evidencesuggests that there is a virtuous cycle between OSS and entrepreneurship, but we use the phrasewith caution. The statistical evidence that increases in OSS causes increases in entrepreneurship ismore robust than the reverse. The two-way relationship appears most convincing for globallyoriented, as well as high-quality, entrepreneurship.These results contribute to several research agendas. To our knowledge, this is the firststudy to benchmark variance in OSS across the globe. Accordingly, our study implies that policyfor OSS has larger global consequence than has previously been recognized. This contrasts withprior research investigating OSS in developed economies (Nagle, 2019a; Lerner and Schankerman,2010, Kogut and Metiu, 2001). No research we are aware of has extended these insights into globalactivity. We also contribute statistical evidence for how OSS contributes to innovativeentrepreneurial development in some countries, and, relatedly, why entrepreneurship occurs insome countries more than others. To this we also add insight into the variance in entrepreneurshipchannels that operate across countries, as well as a plausible path for how some low- and middleincome countries may encourage entrepreneurial newcomers in software-intensive activities,allowing them to catch-up and eventually overtake established players (Lee and Malerba, 2017).

6We also add to investigations of “digital dark matter,” namely, the intangible inputs and unpriceddigital goods such as OSS (Greenstein and Nagle, 2014; Keller et. al., 2018; Robbins et. al., 2018).We contribute to the methodology of studying OSS. This study demonstrates how to matchlarge-scale GitHub platform data with commonly-used and publicly available firm-level andcountry-level data to measure the impact of OSS. The implementation of our instrumental variablestrategy is also novel. As a statistical matter, we demonstrate that OSS participation can serve asa valuable predictor variable for quality ventures and entrepreneurial ecosystems around the world.The paper develops hypotheses in Section II. Section III presents the empirical frameworkand data. Section IV discusses the results. Section V discusses policy and managerial implications.II.Framework and HypothesesDoes open source encourage innovative entrepreneurship? Does the reverse occur? We buildupon existing frameworks and extend them to craft hypotheses.We summarize our hypotheses in Figure 2 and provide details below. We initially ask whetherthe level of OSS activity influences the level of venture activity and then the reverse. In eachdirection we seek to infer (1) the presence and sign of the relationship and (2) the underlyingmechanism. We divide each question into a specific hypothesis, as represented by Figure 2. Thesequestions focus on (H1a) whether OSS has a positive effect on venture founding and (H1b)whether a coordination mechanism underlies this result. The questions then focus on (H2a)whether the effect leads to more globally-oriented, (H2b) mission-oriented, and (H2c) high-qualityventures. Next, we assess whether new venture founding affects OSS participation. We measure(H3a) whether new venture founding has a negative effect via (H3b) an idea exposure mechanismor (H4a) a positive effect through a (H4b) coordination mechanism on OSS participation. We then

7assess whether different types of ventures – (H5a) globally-oriented, (H5b) mission-oriented, and(H5c) high-quality – affect OSS participation.[Insert Figure 2]To adapt the hypotheses to GitHub requires some definitions, shown in Figure 3. Participantsare contributors on an open source software (OSS) platform, who may or may not be employed bya firm to contribute to a particular project. Projects are aggregations of software code around acommon goal. Each participant contributes to at least one project, and some individuals contributemore to a particular project while others less. Organizations are groups of projects that share acommon goal, and may be affiliated with a firm or a shared interest. Participants may be membersof an organization or not.Figure 3 illustrates. Participant 1 contributes more to project 1 than does participant 2, andparticipant 4 contributes to more projects than participant 3. Participants who contribute to OSSas part of their employment are likely to be members of their employers’ organization (e.g.,participant 1). Other participants may share interests (e.g., participant 2), or be unaffiliated withemployers (e.g., participant 3).II.A Does OSS increase entrepreneurship?Consider Figure 2. There are three possible signs for the impact of OSS onentrepreneurship: there is no effect, a positive effect, or a negative effect. No literature points inthe negative direction, so we focus on the possibility of no effect or a positive effect.Does participation in OSS increase the level of entrepreneurial activity? First, OSS mightreduce costs to search for human capital. Talented coders may self-select into participation, andexperience on the platform may improve their talent (Nagle, 2019b). Second, OSS might increaseaccess to complementary assets, such as community infrastructure and a feedback and recognition

8system. Such assets are also valuable for the production of commercialized products within aventure (Chatterji, 2009; Elfenbein, et. al., 2010). Third, OSS could reduce costs to thecommunication of knowledge, just as in peer networks within company settings (Nanda andSørensen, 2010; Gompers, Lerner, and Scharfstein, 2005), within entrepreneurial clusters (Arzaghiand Henderson, 2008), and inside diaspora/ethnic communities (Kerr, 2008; Nanda and Khanna,2010). It standardizes coding practices and sharing of programming solutions (Haefliger, Krogh,and Spaeth, 2008), and establishes “best practices” (Varian and Shapiro, 2003).The null is plausible. There are two arguments. In the first, OSS platforms attract companieslike Microsoft and IBM, creating incentives for participants to use the platform to advertise theirskills and potentially gain employment. The extrinsic career motivations also may incentivizethem to remain employees for incumbent companies (Lerner and Tirole, 2002; Blatter andNiedermayer, 2009; Hann, Roberts, and Slaughter, 2013). A second argument focuses on thelack of competitive advantages from participating in OSS. OSS enables access to coordinationactivities and new ideas, but with few barriers to entry. OSS itself is not a source of a rare andnon-imitable resource. Summarizing the hypothesis against the null are H1a and H1b.H1a: An increase in OSS participation in a country leads to an increase in venture foundingin that country.H1b: The mechanism through which H1a occurs is via a coordination channel.II.B What aspects of the entrepreneurship ecosystem does OSS impact?The next hypotheses consider how OSS impacts globally- and mission-oriented ventures,and how it influences venture quality, as proxied by financing and acquisitions.The global composition of the OSS community may lead OSS contributors to a broaderawareness of global demand for specific new products and services. International exposure of

9entrepreneurs may stimulate the international orientation of their ventures and shape their abilityto detect international opportunities. It also may shape their ability to execute on them byunderstanding risks and leveraging international support/customer networks (Crick and Jones,2000; Bruneel, et. al., 2010). This supports the next hypothesis:H2a: An increase in OSS participation in a country leads to an increase in globallyoriented venture founding in that country.Next, consider the mission of ventures that arise from OSS participation. A “missionoriented” startup is one that engages in socially-impactful activities, such as promoting genderequality, economic opportunity, environmental sustainability, improved health, education, andbroadening access to finance. It is possible that programmers who select in to contributing to OSSare already more community oriented than the average programmer. OSS places importance onthe community (Shah, 2006; Krogh et. al., 2012) and attracts contributors with pro-social motives(Krogh et. al., 2012; Nagle et. al., 2020). This supports the next hypothesis:H2b: An increase in OSS participation in a country leads to an increase in missionoriented entrepreneurship in that country.Why might OSS contributors start higher quality ventures? Both a selection and treatmenteffect could matter. As for selection, OSS contributors may have higher technical talent than thegeneral population, and that can translate into better products (Nagle, 2019b). As for treatment,the OSS platform aggregates resources – talent, co-founders, and a collaborative codingenvironment, and that enables coordination. It also enables contributors to observe problems andsolutions that may have a global market, such that the solutions can benefit from both a big marketon the revenue side and economies of scale on the cost side. Two common proxies of venturequality include (a) the extent of financing (Catalini, Guzman, and Stern, 2019) and (b) whether

10they are acquired (Guzman and Stern, 2020). Thus, ventures formed by OSS contributors mayreceive more financing, as well as have a higher probability of being acquired. Summarizing:H2c: An increase in OSS participation in a country leads to an increase in the quality ofnewly founded ventures in that country, as proxied by venture financing and acquisition.II.C The impact of venture founding on OSSThe reverse relationship also deserves attention. Once again, we consider hypotheses and theunderlying mechanisms, but unlike the prior consideration, the literature points in two distinctdirections, leading to competing hypotheses.It is possible that more venture activity will diminish OSS participation. Formal IPmechanisms can allow inventors to appropriate rents from their work, such as patents, and canincrease the propensity of young ventures to disclose information about their products and enterlicensing arrangements with other companies (Gans, Hsu, and Stern, 2008). By design, however,the licenses for OSS efforts do not encourage appropriation by a single entity. This could make itdifficult for the creator to appropriate rents from the software (Wen, Castagnoli, and Forman,2015; Pisano, 2006; Nagle, 2018). Thus, fear of intellectual property appropriation may deterventures from participating in OSS. Summarizing:H3a: An increase in IT venture founding in a country leads to a decrease in OSSparticipation.H3b: The mechanism through which H3a occurs is via an idea exposure channel.The relationship also may be positive. New ventures may develop their code on OSSplatforms to gain the coordination benefits of aggregated talent and infrastructure. Benefitsinclude exposure to talented developers who can either contribute or be hired (Lerner and Tirole,2002; Blatter and Niedermayer, 2009; Mehra, Dewan and Freimer, 2011; Bitzer and Geishecker,

112010; Hann, Roberts, and Slaughter, 2013; Nagle, 2019a). Participants also can gain aninteroperable way for company members to update and develop projects, as well as integratewith supplier and customer IT systems (von Hippel and von Krogh, 2003). Summarizing:H4a: An increase in IT venture founding in a country leads to an increase in OSSparticipation.H4b: The mechanism through which H4a occurs is via a coordination channel.II.D What kind of ventures would impact OSS?Does the composition of ventures impact OSS contribution in a measurable way? Weconsider the impact of globally- and mission-oriented ventures as well as their quality.Ventures that are global in their customer orientation may benefit from OSSinfrastructure. First, the OSS platform enables ventures to realize the productivity benefits ofaccessing global talent (Kerr, et. al., 2016), while minimizing the transaction costs involved intypical cross-border operations (Teece, 1986). Such teams benefit from OSS infrastructure thatenables co-development of code and updating of code (Lee and Cole, 2003). Globally-orientedventures are likely to have internationally-distributed teams (Rugman and Verbeke, 2001).Second, the OSS infrastructure may increase interoperability of products with business customersystems around the world. Corporations often contribute to large and growing projects on OSS(Lerner, Pathak, and Tirole, 2006). Entrepreneurial ventures may more easily sell their productsto global corporate clients. Summarizing:H5a: An increase in globally-oriented venture founding in a country leads to anincrease in OSS participation.More mission-oriented ventures may value development on OSS. First, because OSSfosters mission-oriented contributors (von Hippel and Krogh, 2003), mission-oriented ventures

12may use the platform to find talent with similar community-oriented values. Such ventures,because they are community-oriented in the nature of problems they solve, also may becommunity-driven. Further, these ventures’ pro-social orientation may make them valuetransparency and inclusive development, which is enabled by OSS platforms. Summarizing:H5b: An increase in mission-oriented venture founding in a country leads to anincrease in OSS participation.Higher quality ventures also may be drawn to OSS. The pre-existing IP andcomplementary services retain their differentiation and value on the revenue-side, while reducingtheir costs via coordination. They may increase access to talent and infrastructure. They also areless likely to face the risks of losing differentiation to competitors or IP theft (Guzman and Stern,2020). Such high-quality ventures also benefit from complementary services, monitoring, andadvice (Bernstein et. al., 2016). Summarizing:H5c: An increase in high-quality venture founding (as proxied by financing andacquisitions) in a country leads to an increase in OSS.A Virtuous CycleIf hypotheses 1a and 4a both hold, then we conclude there is a “virtuous” cycle betweenentrepreneurship and OSS. Virtuous cycles have a long history in entrepreneurship studies.4 Notall virtuous cycles are alike, however. Different channels and mechanisms may contributedifferent degrees of influence to the reinforcement of the cycle. Coordinating or idea exposurechannels may contribute different degrees of reinforcement. We also focus on whether missionoriented, globally-oriented, and high-quality ventures play a larger or more attenuated role infostering more OSS activities.4They are exhibited between pay and productivity (Banerjee and Mullainathan, 2008), between IT and productivity(Aral, Brynjolfsson and Wu, 2006), and between capital and innovation (Hausman, Fehder, and Hochberg, 2020).

13III.MEASUREMENTThe data sample consists of 3,519 observations, encompassing a panel of 207 countriesover 17 years (2000 to 2016). The panel becomes unbalanced due to missing observations amongsome of the exogenous variables.The sample draws from different levels of development. We consider this a good feature,as it retains variance among a novel sample for studies of open source. We begin with virtually allof the 75 high-income countries, which is 36 percent of the sample. We also have goodrepresentation from 55 European and Central Asian countries, and this is 27 percent of the sample.We will sometimes reduce the sample size to accommodate the availability of data – principally,when using the Human Capital Index and cost of starting a business. When these are included, 58high-income countries remain and are 32 percent of the sample. The 49 remaining European andCentral Asian countries are 27 percent of the sample.In constructing this data, we face a trade-off between the number of control variables andsample size. For example, GDP per capita data cover 197 countries, the internet users data cover204 countries, and the Human Capital Index data cover 186 countries. These three variables sharecoverage in 182 countries. We will choose to maximize the number of control variables in orderto control for potentially unobserved variables, and we found that doing this does not reduce thespread of observations across disparate settings, preserving an important feature of the dataset.We also lose observations due to missing variables in some years, especially among SubSaharan and upper middle-income countries. That reduces the number of observations in the finalsample for specifications with all control variables included, which consists of 1,288 country-yearobservations. Once again, it continues to sample from a disparate set of circumstances, andtherefore maintains the generalizability of the results.

14Measuring entrepreneurshipCountry-level data on entrepreneurship comes from Crunchbase, a source that has beenused in many studies of entrepreneurship (e.g., Yu, 2019; Scott, Shu, & Lubynsky, 2019). TheCrunchbase database has grown to become a primary data source for investors as well as inscholarly research. It has been used in over 90 scientific articles (e.g., Dalle, et. al., 2017; Yu,2019; Scott, Shu, & Lubynsky, 2019; Koning, Hasan, and Chatterji, 2019). The variable of interestis the number of new technology ventures founded per year in a given country.While the VC funding statistics from Crunchbase are similar to alternative sources (Dalle,et. al., 2017; Kalemli-Ozcan, et. al., 2015; Kaplan and Lerner, 2016), it comes with a number ofchallenges. The Crunchbase dataset launched in May 2007, and contributors have backfilled dataon companies founded prior to that date, such as Block and Sandner (2009). Crunchbase focuseson younger firms and updates on a daily basis because of the partially crowdsourced nature of thedataset. That necessitates controls for time and motivates a range of robustness tests.Crunchbase classifies new companies into categories.5 We identify companies focused onOpen Source Software. We also consider whether they are global or mission-oriented. Weconstruct the global and mission variables through two approaches: word searches of the company5Examples of sub-categories in our sample include: business information systems, cloud data services, and videochat (information technology); natural language processing, task management, and open source (software); andcloud infrastructure, data center automation, and network hardware (hardware). We explored broad/narrowcategories. The broad definition includes information technology, such as cloud data services, network security, anddata integration, hardware, and software. The narrow definition is only open source software companies. As it turnsout, this narrow definition is highly correlated (0.6) with the broad definition and, therefore, statistically points in asimilar direction.

15descriptions6 and a supervised logistic regression algorithm.7 Because of space limitations, we onlyinclude the machine learning-created measures in our results.8We also assess their quality, as proxied by entrepreneurial financing and acquisitions. Weuse the Preqin database to measure financing using 1) the total value of all venture investments ininformation technology companies that occurred in a given country in a given year; and 2) the totalnumber of venture investments in information technology companies in a given country in a givenyear.9 The data have been used by other studies such as Axelson, et. al., 2013 and Chakraborty andEwens, 2017. For acquisitions, we use data from Crunchbase on the number of acquisitions of ITcompanies. Crunchbase logs transaction-level data on events in which any of the companies itcovers are acquired; we aggregate these events to the country of interest in a given year.We take log(1 VARIABLE) to account for skewness and the value of zero.Measuring OSSOur data on open source activity in a country comes from GitHub, the most widely usedrepository for hosting OSS projects. Created in 2008, GitHub became the central repository formost major open source projects (GitHub, 2019), and became a repository for open source projectsfounded before 2008, which moved to the platform to take advantage of its useful tools. Based in6Global orientation is measured through the use of the words “international” and “worldwide” in the companydescriptions. Mission orientation is measured through the use of the words “empower,” “gender,” “women,” and“climate” in the company descriptions.7We manually train the logistic regression algorithm on 1,001 startup descriptions (2% of the venture data) byclassifying each firm as mission-oriented (1/0) and/

affiliated with an organizational account, and for globally-oriented, mission-oriented, and high-quality ventures. The reverse relationship also holds more weakly: an increase in the . eliminates hassles from negotiating intellectual property, and reduces friction