At the recent Machine Learning for Software Engineering (ML4SE) workshop in Montreal, we had a team of engineers present their research about how to gain a better understanding of the collaboration that goes on in the software development process.

Their study included an analysis of 117 of the GitLab organization’s open-source projects – more specifically contributors’ commit activity and their usage of programming languages as complementary methods to analyze contributors to open source projects.

The research demonstrated how to identify existing collaborations inside an organization’s software development projects by exploring the topological structure of three feature spaces of a codebase: commit activity, usage of programming languages and topics of source code identifiers.

Future research related to collaboration in software development includes plans to analyze the contributions graph, which reveals the team structure of the company. The nodes in that graph are committers and repositories, and the weight of each edge is proportional to the number of corresponding commits.

If you want to have a look, here is the paper - “Identifying Collaborators in Large Codebases” and the poster presented at the ML4SE workshop.

More about source{d} and MLonCode