Welcome to source{d} bi-weekly, a newsletter with the latest news, resources and events related to Code as Data and Machine Learning on Code. Sign up for source{d} bi-weekly

Announcing source{d} Engine beta for code retrieval and analysis and Lookout Alpha for assisted code review

After several months of hard work, we've officially announced the public beta of source{d} Engine and public alpha of source{d} Lookout. Combining code retrieval, language agnostic and git history tools with familiar APIs parsing, source{d} Engine simplifies code analysis. source{d} Lookout is a service for assisted code review that enables running custom code analyzers on GitHub pull requests. Learn More.

source{d} News

source{d} User survey by Ricardo BaetaWith the recent beta release of source{d} products, we're impatient to get feedback from our users. This user survey aims to identify new use cases, pain points and feature requests. We'll use your answers to build our roadmap and continue to ship useful products that our users love to use.

Machine Learning on Go code by Francesc CampoyWe've all wondered how to use Machine Learning with Go, but what about turning the tables for once? What can Machine Learning do *for* Go? During this presentation, we will discover how different Machine Learning models can help us write better go by predicting from our next character to our next bug!

Machine Learning on git: introducing Hercules v4 by Vadim Markovtsev

In case you missed it, last year we introduced Hercules, an open source Go command line application and a framework to mine and analyze Git repositories. This week, we're excited to introduce Hercules v4 which includes exciting new features such as forks and merges awareness, external plugins for custom analysis, merging results from multiple repositories and more.

Community News

Towards Natural Language Semantic Code Search [Blog Post] by Hamel Husain and Ho-Hsiang Wu

In this post, Hamel and Ho-Hsiang explain how they are leveraging deep learning to make progress towards augmenting keyword search with semantic search including an example that you can use to reproduce these results!

code2seq: Generating Sequences from Structured Representations of Code [Research Paper] by Uri Alon, Omer Levy, Eran Yahav

This papers presents CODE2SEQ: an alternative approach that leverages the syntactic structure of programming languages to better encode source code. Our model represents a code snippet as the set of paths in its abstract syntax tree (AST) and uses attention to select the relevant paths during decoding, much like contemporary NMT models.

Bringing macros to Python by abusing type annotations [Blog Post]  by Zach Mitchell


This is a really good blog post to learn what's a macro,  why are Rust macros special and how to parse code with Abstract Syntax Trees (ASTs). Zach also explains how he got the idea of bringing macros to Python from his experience with Rust's procedural macros.

Learning to Represent Programs with Graphs [Research Paper] by Miltiadis Allamanis, Marc Brockschmidt and Mahmoud Khademi.

In this paper, the authors "present how to construct graphs from source code and how to scale Gated Graph Neural Networks training to such large graphs. We evaluate our method on two tasks: VARNAMING, in which a network attempts to predict the name of a variable given its usage, and VARMISUSE, in which the network learns to reason about selecting the correct variable that should be used at a given program location."

Intelligent Code Reviews Using Deep Learning [Research Paper] by Anshul Guptaand Neel Sundaresan.

In this paper, the authors "present an automatic, flexible, and adaptive code analysis system called DeepCodeReviewer (DCR). DCR learns how to recommend code reviews related to common issues using historical peer reviews and deep learning. DCR uses deep learning to learn review relevance to a code snippet and recommend the right review from a repository of common reviews".

Events

September 19th: source{d} Talk at the Go SF meetup (San Francisco, CA)

September 25-28th: International Conference on Software Maintenance and Evolution (Madrid, Spain)

September 26th: source{d} Online meetup (Online)

September 27th: source{d} Talk at the Docker Seattle meetup (Seattle, WA)

October 1st: source{d} Talk at Nantes ML Meetup (Nantes, France)

October 1-3rd: source{d} Talk at Velocity (New York City, NY)

Featured Community Member

Hamel is a Senior Machine Learning Scientist @Github, previously @Airbnb and @DataRobot. Hamel has recently published really interesting blog posts on Natural Language Semantic Search with Machine Learning or using sequence to sequence models to summarize text found in Github issues. Make sure to follow Hamel on twitter @HamelHusain to stay up to date with his latest blog Machine Learning on Code blog posts and projects