After the success of our Kubernetes codebase analysis during KubeCon, we’re thrilled to release an analysis of the Cloud Foundry codebase. The analysis leverages source{d} Engine to retrieve and analyze all Cloud Foundry Foundation’s git repositories through SQL queries to get insights into the project codebase history, as well as emerging trends.  The results show a mature and complex architecture yet extraordinarily active and agile.

“The analysis provided by source{d} gives us a fascinating view into one of the world’s largest and most active open source initiatives,” said Chip Childers, CTO, Cloud Foundry Foundation. “To actually pull all this data together in easy-to-understand charts gives us a unique view into both the history and the forward trajectory of our community.”

Public disclaimer: For this analysis, we decided to use the Community Edition so it's repeatable by everyone. However, using the community edition over 800+ repositories can cause some issues. For large scale analysis, we recommend our Enterprise edition which includes a spark connector to distribute the analysis across several machines to speed up the process from a couple hours to a couple minutes. Please contact us at devrel@sourced.tech if you’d like to learn more.

Number of repositories

The large number of repositories reveals the breadth and depth of the platform distributed across myriad small projects and components.

Release cycles

With an average of 98 project releases per month, we can say that that release velocity is quite high and reflects the distributed architecture mentioned above.

Looking back at the history of releases and top repositories over time, we can clearly see three major phases in history: Cloud Foundry core in 2013, Diego container runtime in 2015 and Cloud Foundry deployment and bosh-agent in 2018.

The weekly cadence can also be shown by counting how often we release on different weekdays. It’s good to see Friday is not the most common day to release with some activities on during the weekends.

Number of files

The number of files over time (just under 400,000 at the beginning of 2019) reveals an exponential growth from 2011 to 2017 which seems to have slowed down in 2018, a sign of maturity and stability. If we look at the number of files per repository, it shows the importance of buildpacks, and in particular of the dotnet-core-buildpack repository which highlights the enterprise nature of the application workloads running on the Cloud Foundry platform.

The dominant programming language by number of files is by far Ruby with almost 50% of the total, followed by YAML and Go.

The top repositories by number of commits are public-buildpacks-ci-robots and relint-ci-pools, two continuous integration repositories which reveals a strong focus on stability. Also included in the top five are BOSH and Stratos, two projects focused on the virtual infrastructure management layer and providing a user interface to the Cloud Foundry Application Runtime, another sign of maturity.

We can also easily track the growth of these repositories over time

Let’s focus on what’s happened in the last 3 years:

Check out this Jupyter notebook to access the raw data and the source{d} Engine queries used to perform this analysis.

source{d} Engine provides engineering observability for organizations focused on IT modernization, talent management and the adoption of DevOps best practices at scale. Companies interested in getting their own code base analyzed can request an analysis here.

Learn More about source{d} Engine: