Welcome to source{d} bi-weekly, a newsletter with the latest news, resources and events related to Code as Data and Machine Learning on Code. Sign up for source{d} bi-weekly newsletter.

The new source{d} Lookout analyzer, which has learned to model style based on experience from many code repositories, applies the model to the codebase being analyzed. In addition, source{d} Lookout understands that each codebase has its own nuances and learns to model the codebase style as precisely as possible, as opposed to relying on global style practices.

When new code is sent for review, source{d} Lookout analyzes it and detects any problems with style and automatically suggests fixes. By leveraging GitHub Suggested Changes, those suggestions can instantaneously be accepted (committed). Learn More.

source{d} News

FOSDEM 2019: ML on Code Devroom recap [Blog]
by Victor Coisne

This blog post is a recap of the talks that took place in the ML on Code FOSDEM devroom that the source{d} team helped organize to bring together the professors, researchers, practitioners interested in the topic of Machine Learning on Code.

Machine Learning on source Code [video]
by Francesc Campoy

Francesc showed the audience how source{d} Engine simplifies the process of retrieving code from various git repositories and turning it into language agnostic Abstract Syntax Trees called UASTs that can be analyzed through a flexible and friendly SQL API. Francesc also showed how to run machine learning on source code with a series of live demos.

Suggesting Fixes during Code Review with ML [Video]
by Vadim Markovtsev

Many developers hate doing code reviews. Reading foreign code is hard, and suggesting improvements is even harder. Yet a dramatic portion of code review time goes to figuring out the boring details: formatting, naming, microoptimizations and best practices. We believe that all of those can be automated with ML on Code, either learning from a particular project or from all the open source code in the world which is relevant. This talk will be about open source "analyzers" - ML-driven code review agents which deal with the boring but important details.

Deduplication on large amounts of code [Video]
by Romain Keramitas

In this talk, Romain will discuss how to deduplicate large amounts of source code using the source{d} stack, and more specifically the Apollo project. The 3 steps of the process used in Apollo will be detailed, ie: - the feature extraction step; - the hashing step; - the connected component and community detection step; he'll then go on describing some of the results found from applying Apollo to Public Git Archive, as well as the issues he faced and how these issues could have been somewhat avoided.

Git database with bitmap index [Video]
by Kuba Podgórski

Data retrieval team at source{d} process lot of data from git repositories. Most of the key components in our workflow like engine, gitbase, mysql-server are implemented in Go. This talk will go through the story how we embedded "pilosa" (distributed bitmap index) in our SQL frontend for git repositories.

FOSDEM 2019: Go Devroom recap [Blog]
by Victor Coisne

At source{d], we’re big fans and users of go which is the language most of the components of our platform are written in. This blog post is a recap of the talks that took place in the Go FOSDEM devroom.

MSR Interview #2: Georgios Gousios [Blog]
by Victor Coisne

This article is the second episode of our MSR Interview blog series. In case you missed it, check out the interview with Abram Hindle. This week, we’re publishing the interview of Georgios Gousios who’s an assistant professor of software engineering at the Software Engineering Research Group group at TU Delft.

Community News

Facebook’s Tool for Automated Testing at 2 Billion Users Scale [Article]
by Jennifer Riggins

Testing is often the most arduous part of development — and something programmers are more and more responsible for in the world of DevOps and individual code ownership. As pieces of code become increasingly smaller and more distributed, there’s also a greater need to invest in automation, particularly testing automation.

The Future Is Now: “Now AI can help us to write code” [Blog]
by Fotis Georgiadis

With the Go team I discovered a passion for developer tooling done right. You could ask many developers and they will tell you their favorite aspect of Go is not the language itself, but rather the tooling. There’s a bit of an obsession in the community for intuitive tools that compose well with others.

This machine learning analyzer will review your code for constancy [Article]
by Richard Harris

Lack of consistency in the development of source code has made maintaining code over time and making updates more time-consuming and costly. Source{d}, the company enabling machine learning for large-scale code analysis, solves this long-standing problem with Machine Learning assisted code review.

Learning How to Mutate Source Code from Bug-Fixes [Research Paper]
by Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, Denys Poshyvanyk

While some recent papers have tried to devise domain-specific or general purpose mutator operators by manually analyzing real faults, such an activity is effort- (and error-) prone and does not deal with an important practical question as to how to really mutate a given source code element. In this paper, the authors propose a novel approach to automatically learn mutants from faults in real programs.


February 22nd: source{d} Paper reading club (Online)

March 8th source{d} Paper reading club (Online)

Featured Community Member

Georgios Gousios is an assistant professor of software engineering at the Software Engineering Research Group group at TU Delft leading the group's Software Analytics research direction. He does research in the broad area of software engineering. Check out his website to see his impressive list of papers and projects. Make sure to follow Georgios on twitter @gousiosg to stay up to date with his latest publications and projects.