Welcome to source{d} bi-weekly, a newsletter with the latest news, resources and events related to Code as Data and Machine Learning on Code. Sign up for source{d} bi-weekly newsletter.

A new analysis of the Cloud Foundry project with source{d} EE

Today, we’re excited to share with you an updated version of the Cloud Foundry analysis we did earlier this year. This time the analysis was done with source{d} Enterprise Edition. source{d} EE not only saves us time through higher query performance but also allows us to showcase advanced metrics that are not available out of the box in source{d} CE. Follow this link to view a read-only dashboard of the entire Cloud Foundry Project analysis. Read this blog for a summary of our key findings.

source{d} News

MSR Paper Review: The Software Heritage Graph Dataset [Blog]
by Vadim Markovtsev

This year, In addition to attending, speaking and sponsoring MSR 2019, source{d} employees decided to write blog posts about our favorite research papers presented at the conference. As the first one of the series, we reviewed Software Heritage Graph Dataset (SHGD) presented by Antoine Petri in the Large-Scale Mining track, Learn More.

A closer look at source{d} CE 0.15 [Blog]
by Victor Coisne

Two months ago, we announced the beta release of source{d} Community Edition (CE), formerly known as the source{d} Engine. source{d} CE is the only free & open source product that provides individual developers, small to mid-size companies and open source project maintainers with software development analytics. Check out the list of metrics that are included out of the box in source{d} CE.

IDC confirms the increasing influence of developers in enterprise IT [Blog]
by Victor Coisne

The IDC report is based on a global survey of 2,500 developers. It said 67% of organizations have adopted DevOps practices in some way and that was no surprise to us since DevOps is among the most prevalent use case of source{d}, which is used by enterprises to help evaluate their engineering effectiveness in terms of velocity, quality and ability to report against business objectives.

Community News

Modeling Vocabulary for Big Code Machine Learning [Resarch Paper]
by Hlib Babii, Andrea Janes, Romain Robbes

When building machine learning models that operate on source code, several decisions have to be made to model source-code vocabulary. These decisions can have a large impact: some can lead to not being able to train models at all, others significantly affect performance, particularly for Neural Language Models. Yet, these decisions are not often fully described. This paper lists important modeling choices for source code vocabulary, and explores their impact on the resulting vocabulary on a large-scale corpus of 14,436 projects.

Refactoring made easy with IntelliCode [Blog]
by Mark Wilson-Thomas

With Visual Studio 2019 version 16.3 Preview 3, Microsoft announced that refactorings can now be enhanced by IntelliCode. IntelliCode spots repetition quickly and suggests other places in your code where you might want to apply that same change, right in your IDE.

Being Glue [Video]
by Tanya Reilly

Every senior person in an organisation should be aware of the less glamorous - and often less-promotable - work that needs to happen to make a team successful. Managed deliberately, glue work demonstrates and builds strong technical leadership skills. Left unconscious, it can be extremely career limiting. It pushes people into less technical roles and even out of the industry.

Code Review Developer Guide [Documentation]
by Adam Bender

A code review is a process where someone other than the author(s) of a piece of code examines that code. At Google we use code review to maintain the quality of our code and products. This documentation is the canonical description of Google’s code review processes and policies.


September 16th: UseDataConf (Moscow, Russia)

September 19-20th: Open Core Summit (San Francisco, US)

October 4th: source{d} paper reading club (Online)

October 9-11th: DevFest (Nantes, France)

Featured Community Member

Andra Janes is researcher at the Free University of Bolzano-Bozen (Italy). He received a master in computer science from the Technical University of Vienna, Austria and the doctorate in computer science (with distinction) from the University of Klagenfurt (Austria). Make sure to follow Andrea on Linkedin or visit this page to stay up to date with his latest publications.