Welcome to source{d} bi-weekly, a newsletter with the latest news, resources and events related to Code as Data and Machine Learning on Code. Sign up for source{d} bi-weekly newsletter.

Celebrating Open Source with Hacktoberfest 2018

As great believers in Open Source and its philosophy, we get excited about every open source initiatives and community events. As such, we’re thrilled to be participating in this year’s Hacktoberfest.

Any contributor who gets three pull requests submitted and merged in any source{d} repo between October 1 and October 31 will receive a limited edition t-shirt as well as one sticker for those who get at least one pull requests merged. We really appreciate your interest in our open source projects and will make sure to answer your questions or comments as soon as possible so that your contributions to source{d} project also qualify towards participation in DigitalOcean’s Hacktoberfest and t-shirt prize. Learn More

source{d} News

Introduction to source{d} Engine and Lookout [Slides & Video] by Francesc Campoy

This talk is an intro presentation of both source{d} Engine and source{d} Lookout. Combining code retrieval, language agnostic parsing, and git management tools with familiar APIs parsing, source{d} Engine simplifies code analysis. source{d} Lookout is a service for assisted code review that enables running custom code analyzers on GitHub pull requests.

Paper Review: "Lessons from building static analysis tools at google” by Alexander Bezzubov

“Lessons from Building Static Analysis Tools at Google” by Caitlin Sadowski, Edward Aftandilian, Alex Eagle, Liam Miller-Cushon, Ciera Jaspan presents 2 stories: the history of failed attempts at integrating FindBugs, a static analysis tool for Java at Google, and lessons learned from the success story of incorporating extensible analysis framework, Tricorder, to development workflow at Google.

Machine Learning on Git: Hercules and his Labours [slides] by Vadim Markovtsev

In this talk, Vadim gives an introduction to Hercules, an open source library and CLI in Go to analyze the development history and help managers get an overview of their projects. It ships with a few algorithms to estimate the architecture quality, the logical parts of the codebase, relationships between developers and their ownership of the project.

Scalable Language-Agnostic Analysis of Source Code and VCS History [slides] by Vadim Markovtsev

This talk is an introduction to the source{d} open source tech stack in particular source{d} Engine and Hercules. source{d} Engine is devoted to an SQL interface over arbitrary number of Git repositories with the ability to extract Abstract Syntax Trees from files in various programming languages. Those trees are expressed in the same format named “Universal AST”. Hercules gathers deep insights from the history of a single Git project such as line burn-down and ownership through time or temporarily coupled units.

Docker powered CLI [Video] by Francesc Campoy

Docker has been used for quite a while as a way to go from "works on my laptop" to "works in prod" as easily as possible. This talk describes a completely different usage of Docker to, rather than installing things in production, provide an easy way to power CLI tools and their extensions through Docker containers. This talk explains how source{d} Engine (github.com/src-d/engine) is built, getting into the details of networking, communication, and versioning.

Community News

Can AI Generate Programs to Help Automate Busy Work? [blog post] by Joseph Sirosh and Sumit Gulwani

In this post, Joseph and Sumit take a look at Microsoft PROSE, an AI technology that can automatically produce software code snippets at just the right time and in just the right situations to help knowledge workers automate routine tasks that involve data manipulation. These are generally tasks that most users would otherwise find exceedingly tedious or too time consuming to even contemplate.

Finding and fixing software bugs automatically with SapFix and Sapienz [blog post] by Yue Jia, Ke Mao and Mark Harman

A blog post that introduces SapFix, a new AI hybrid tool created by Facebook engineers, which can significantly reduce the amount of time engineers spend on debugging, while also speeding up the process of rolling out new software. SapFix can automatically generate fixes for specific bugs, and then propose them to engineers for approval and deployment to production.

An Empirical Investigation into Learning Bug-Fixing Patches in the Wild via Neural Machine Translation [Research Paper] by Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White and Denys Poshyvanyk
In this paper, the authors performed "an empirical study to assess the feasibility of using Neural Machine Translation techniques for learning bug-fixing patches for real defects. They mined millions of bug-fixes from the change histories of GitHub repositories to extract meaningful examples of such bug-fixes. Then, they abstracted the buggy and corresponding fixed code, and use them to train an Encoder-Decoder model able to translate buggy code into its fixed version.".

Events

October 5th: source{d} Paper Reading Club (Online)

October 12th: source{d} Paper Reading Club (Online)

October 12th: source{d} Talk at Gopherpalooza (San Francisco, CA)

October 16th: Hacktoberfest with source{d} and Holberton School (San Francisco, CA)

October 20th: Women Who Go workshop (South SF Bay, CA)

Featured Community Member

Miltos is a researcher at Microsoft Research in Cambridge, UK and part of the Deep Program Understanding project. His research focuses on machine learning models and methods that "understand" and generate source code.  Make sure to follow Miltos on twitter @miltos1 to stay up to date with his latest Machine Learning on Code publications and projects.