Welcome to source{d} bi-weekly, a newsletter with the latest news, resources and events related to Code as Data and Machine Learning on Code. Sign up for source{d} bi-weekly newsletter.

Introducing source{d} Engine 0.12

We’re pleased to announce source{d} 0.12, a release with C# language support, new SQL querying, improved performance and much more!! Learn More.

source{d} News

source{d} Engine analysis of the Cloud Foundry codebase [Blog]
by DevRel Team

After the success of our Kubernetes codebase analysis during KubeCon, we’re thrilled to release an analysis of the Cloud Foundry codebase. The analysis leverages source{d} Engine to retrieve and analyze all Cloud Foundry Foundation’s git repositories through SQL queries to get insights into the project codebase history, as well as emerging trends.  The results show a mature and complex architecture yet extraordinarily active and agile.

Code as Data workshop: Using source{d} Engine to extract insights from git repositories [Slides]
by Francesc Campoy

This workshop will teach you the basics git concepts (such as references, commits, and blobs) and how they can be mapped into a series of relational tables. Once we understand the basic concepts, we will discuss Universal Abstract Syntax Trees and how some advanced checks can be done on top this language agnostic structure. Running these checks at scale requires some extra knowledge and we’ll discuss the challenges and possible solutions.

Empower developers with ML-assisted code review [Slides]
by Waren Long

A new exciting domain of research has recently emerged combining Models of Code with Natural Language Processing. At source{d}, we are developing ML-assisted coding solutions to alleviate developers from one of their most time consuming tasks, code reviews. In this talk, we will first introduce source{d} Lookout, a framework for assisted code review, and then show an example of analysis on pull requests it enables, to solve code formatting problems.

Community News

How to Automate Tasks on GitHub With Machine Learning for Fun and Profit [Article]
by Hamel Husain

In order to show you how to create your own apps, this article will walk you through the process of creating a GitHub app that can automatically label issues.

Open-sourcing SPARTA to make abstract interpretation easy [Article]
by Jez Ng

Using abstract interpretation to build a scalable tool from scratch is a daunting engineering task that generally requires a protracted development effort led by an expert. To streamline that process, we built SPARTA, a C++ library of software components for building high-performance static analyzers that can run in a production environment. SPARTA provides the building blocks (a set of components that have a simple API, are highly performant, and can be easily assembled) so an engineer can focus solely on the logic that extracts the desired information from the program.

Aroma: Using machine learning for code recommendation [Article]
by Celeste Barnaby, Satish Chandra, Frank Luan

Thousands of engineers write the code to create our apps, which serve billions of people worldwide. This is no trivial task—our services have grown so diverse and complex that the codebase contains millions of lines of code that intersect with a wide variety of different systems, from messaging to image rendering. To simplify and speed the process of writing code that will make an impact on so many systems, engineers often want a way to find how someone else has handled a similar task. We created Aroma, a code-to-code search and recommendation tool that uses machine learning (ML) to make the process of gaining insights from big codebases much easier.

How Powerful are Graph Neural Networks? [Research Paper]
by Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka

Here, the researchers present a theoretical framework for analyzing the expressive
power of GNNs to capture different graph structures. Their results characterize
the discriminative power of popular GNN variants, such as Graph Convolutional
Networks and GraphSAGE, and show that they cannot learn to distinguish certain
simple graph structures. They then develop a simple architecture that is provably the most expressive among the class of GNNs and is as powerful as the Weisfeiler-Lehman graph isomorphism test.

Events

April 19th: source{d} paper reading club (Online)

April 29th: DockerCon 2019 (San Francisco, USA)

May 3rd: source{d} paper reading club (Online)

May 9th: source{d} Online meetup (Online)

May 17th: source{d} paper reading club (Online)

May 26-27th: Mining Software Repositories 2019 (Montreal, Canada)

June 17-19th: ML conference (Munich, Germany)

Featured Community Member

Gerald Schermann currently a Ph.D. student at the University of Zurich. Check out his website to see his impressive list of papers and projects. Make sure to follow Gerald on twitter @sh3llcat to stay up to date with his latest publications and projects