Earlier this year, we released an analysis of the Cloud Foundry codebase. The analysis leveraged source{d} Community Edition (previously known as source{d} Engine) to analyze all Cloud Foundry Foundation’s git repositories through SQL queries. Back then, we decided to use source{d} CE so that everyone could easily reproduce it.

Today, we’re excited to share with you an updated version of this analysis done with source{d} Enterprise Edition. source{d} EE not only saves us time through higher query performance but also allows us to showcase advanced metrics that are not available out of the box in source{d} CE. In this analysis, the advanced metrics are related to dependencies, the health index, containers, and compliance metrics. Note that source{d} Enterprise includes additional advanced metrics that are not included in this analysis.

Follow this link to view a read-only dashboard of the entire Cloud Foundry Project analysis. Keep on reading for a summary of our key findings.

Commits activity

With 879 repositories in the Cloud Foundry GitHub organization, we can clearly see the breadth and depth of the platform which is distributed across many sub-projects and components.

The total number of commits remains quite high with a total of approximately 50,000 per month, a healthy sign of a very active open source project.


Excluding the repositories whose commits are mostly done by CI bots, we can see in the chart below that the highest levels of activity are centered around Stratos, Cloud Controller, and Istio. The nature of these commits over time reveals the different stages of the project and strategic features developed.

First, the number and evolution of commits tied to Stratos, a modern, web-based management application for Cloud Foundry, highlights the focus on user experience and the need to reconcile the needs of both developers and administrators.

Starting as early as 2015, the number of commits to the Cloud Foundry Cloud Controller repository shows investments in the development of new developer workflows and platform features to manage apps, services, user roles, and more!

Last but not least, we can see that the Cloud Foundry community was quick to embrace Istio, a popular open platform to connect and manage microservices, with the first commits dating back to 2017. As of July 2019, if we look at the top 10 most active repositories by the number of commits, we can see that almost 20% of them are related to Istio which shows a strong focus on improving the overall developer experience around microservices-based application architectures.

Pull Requests and Issues activity

Just like commits, insights on pull requests can be very valuable to project maintainers and tech leads. The chart below on the most active repositories over the past 5 years confirms the importance of Stratos to the Cloud Foundry community with 2,500 pull requests.

With 664 and 517 pull requests merged, the cf-abacus (usage metering and aggregation) and multiapps-controller (controller for Multi-Target Application) repositories are respectively the second and third most active repositories.

The table below highlights the top pending pull requests as of July 2019 including information about their age in days, the number of comments, as well as the number of lines of code modified. Now, 6 weeks later, it’s worth noting that the pull requests against the stratos and multiapps-controller repositories have been merged, which confirms the focus on these two projects versus the others listed in the table below.

Top pending Pull Requests 

In addition, project velocity looks very good this year with an average time to merge reduced from 8-days average time down to just 3 days most recently.

Average time to Merge over time

Another interesting metric to look at is the Cloud Foundry throughput. Throughput can be defined as the number of features, tasks or chores, and bugs completed within a period that is ready to ship or ready to test. In this case, we’re measuring the number of closed GitHub issues. This chart reveals that the total number of closed issues peaked in April 2018 with 345 closed issues by just under 200 unique developers. We can see that the number has declined significantly over the past year, which is a sign of project maturity and stability.

Throughput: Closed Issues

Once we have the data about GitHub issues and pull requests we can calculate an Activity index for each repository and a global, organization-wide Activity index.

The Activity index is a value in [0, 100] that depends on the following factors:

  • Number of open issues
  • Number of open pull requests
  • The average age of open issues
  • The average age of open pull requests.
Activity Index Math formula weighting the size of the repositories
Global Activity Index

A score of 55 is actually pretty good for an organization with just under 900 repositories. It’s normal for large, mature projects and repositories to have more pending issues and Pull Requests.

Dependency Analysis

With source{d} EE, it’s also possible to visualize how the largest dependencies evolve over time as well as trends on the top growing and declining ones. In this analysis, we’ve decided to look mostly at the Golang dependencies because it’s by far the most used language in Cloud Foundry. Although it should be noted that analyzing dependencies from other programming languages would not be a problem with source{d}.

As you can see in the chart below, the largest golang dependencies are Ginkgo, a BDD-style Go testing framework paired with the Gomega matcher library, as well as Docker.

Largest Go dependencies in Cloud Foundry

Go dependencies are also simple, well-defined per file and use URLs which allow us to check for external versus in-house dependencies. In-house dependencies are the imports from one of the following GitHub organizations: cloudfoundry, cloudfoundry-attic, cloudfoundry-incubator or imports from Cloud Foundry's own registry which is code.cloudfoundry.org. While external dependencies are open source ones that don’t belong to these.

On one hand, if we look at the in-house dependencies charts below, we can see that libbuildpack, loggregator-agent and cf-operator are the ones that have been recently growing the fastest while dockerdriver, loggregator-tools and volumedriver have rapidly been declining.

Top 10 growing in-house dependencies

On the other hand, if we look at the external dependencies, we can see that the most significant growth in Spec, is a simple BDD-style test organizer for Go, Prometheus, a systems monitoring and alerting toolkit and libbuildpack, a Go language binding of the Cloud Native Buildpack V3 API. While in the same time period, there has been a significant decline in nats-io/go-nats, a Golang client for NATS which has been deprecated, httpmock, HTTP mocking for Golang and stembuild, a binary used to build BOSH stemcells for windows on vsphere dependencies.

Top 10 growing 3rd party dependencies

Bonuses

As bonuses, the dashboard includes two extra tabs. The first one leverages our Dockerfile parsing capabilities to give a breakdown of the Docker base images used in Cloud Foundry. WIth 49% of the total, we can see that Ubuntu is by far the most popular based image followed by the Golang and Debian ones.

The second tab called compliance checks the Cloud Foundry repositories against a set of “compliance” rules. As a public disclaimer, these are not actual rules enforced by Cloud Foundry but examples of rules that open source project maintainers might like to get visibility on. In this case, we check whether or not a given repository has a readme, a changelog, a license, etc. These rules are absolutely not set in stone, they can be both modified and extended to fit specific enterprise rules and guidelines.

Follow this link to view a read-only dashboard of the entire Cloud Foundry Project analysis including additional charts and queries about CF pull requests and code reviews activities as well as dependencies.

Learn More: