The Underground Economy of Open Source Reuse

FOSSBazaar is no longer being updated. The information on this site is preserved for your convenience but may be out of date. Please visit Linux Foundation's Open Compliance Program for current information and activities.

Eran Strod's picture
I used to work for a company where the General Manager joked that the term “NIH (not invented here) was invented here.”  No so, among open source developers. 

Black Duck recently took a look at open source projects to gain insight into the amount of sharing and reuse that occurs from project to project.  I think of this as the “underground economy of open source reuse” because it is happening everywhere and no one is tracking it.  We analyzed a collection of popular open source projects to see the extent to which open source developers reuse code.  We did this using a process we call “authoritation” which establishes the origin of a file that matches many different open source projects in our KnowledgeBase.  For example, 564 OS projects reused “Google Web Toolkit SDK” files, but only one project is the original source of those files.  At Black Duck we have automated systems and code forensic experts who authoritate the origin of files.

 

Our team spiders the internet collecting open source projects in a repository we call the Black Duck KnowledgeBase.  Ultimately, our mission is to help our enterprise customers reuse more open source while taking control of management, security and compliance issues.  Our KnowledgeBase has about 200,000 open source projects in it, collected from over 4100 unique internet sites.  Since the beginning of 2009, we have added over 3000 projects per week on average.  The Black Duck KnowledgeBase is the industry’s largest and fastest growing resource for information about the open source industry.  In addition to using the KnowledgeBase to help companies reuse open source code, we mine the data, from time to time, for information on what is happening in the open source community at-large.

 

We looked at 1,311 open source projects, mostly written in Java, and documented the number of times that at least one binary file from any given project was included in the downloadable release of another open source project.  There were over 365,000 instances of reuse overall.  To see the top most reused projects in our study, see the press release.  We calculated that the sharing and reuse of these open source projects amounted to creating an incredible 316,000 staff years of leverage. 

  Here’s how it broke down:  
  • 82 projects were reused > 1000 times
  • 461 projects, >= 100 times
  • 441 projects, >= 10 times
  • 327 projects, >= 1 time

Certainly, some projects like Log4j are adopted monolithically, but many of them are divisible into subsystems and components with adoption taking place at the level of bits and pieces.  The Eclipse project is a perfect example.  It contains 5M lines of code, but there are many useful plug-ins that can be adopted and adapted to other purposes.  In our study, we found a tremendous amount of reuse in the latter category.  Open source is truly driving the componentization of software.

 

Cheers