Welcome to source{d} bi-weekly, a newsletter with the latest news, resources and events related to Code as Data and Machine Learning on Code. Sign up for source{d} bi-weekly

gitbase: exploring git repos with SQL

Git has become the de-facto standard for code versioning, but its popularity didn’t remove the complexity of performing deep analyses of the history and contents of source code repositories. SQL, on the other hand, is a battle-tested language to query large codebases as its adoption by projects like Spark and BigQuery shows.

So it is just logical that at source{d} we chose these two technologies to create gitbase: the Code as Data solution for large scale analysis of git repositories with SQL. Gitbase is one of the key open source component of source{d} Engine. Learn More.

source{d} News

Open Source vets join source{d} as advisors [Blog] by Victor Coisne

Last week, we announced that Chris Aniszczyk, Vice President at the Linux Foundation, Jessie Frazelle, Open Source Engineer at Microsoft, Joseph Jacks, Founder at OSS Capital, Julien Barbier, CEO at Holberton School and Patrick Chanezon, member of Technical Staff at Docker, are all becoming advisors of source{d}.

Go Devroom CFP -- FOSDEM 2019 [Blog] by Maartje Eyskens

The Go community has been represented at FOSDEM every year since 2014, and 2019 will not be an exception! This year again the Go devroom organized by source{d} got a full room on Saturday February 2nd and we are now looking for speakers.

ML on Code: Machine Learning will change programming [video] by Francesc Campoy

In this video recording from Velocity NYC keynote, Francesc Campoy explores ways machine learning can help developers be more efficient.

Machine Learning for Gophers [video] by Francesc Campoy

Is Go a good programming language for Machine Learning? This talk answers the question by reviewing available frameworks like gonum, gorgonia, and tensorflow and implementating some basic ML algorithms with each one of them.

Community News

Syntax and Sensibility: Using language models to detect and correct syntax errors [Research paper] by Eddie Antonio Santos, Joshua Charles Campbell, Dhvani Patel, Abram Hindle, and José Nelson Amaral

Syntax errors are made by novice and experienced programmers alike; however, novice programmers lack the years of experience that help them quickly resolve these frustrating errors. Standard LR parsers are of little help, typically resolving syntax errors and their precise location poorly. In this paper, the authors propose a methodology that locates where syntax errors occur, and suggests possible changes to the token stream that can fix the error identified.

code2vec: Learning Distributed Representations of Code [Research paper] by Uri Alon, Meital Zilberstein, Omer Levy, Eran Yahav

In this paper, the authors present a neural model for representing snippets of code as continuous distributed vectors. The main idea is to represent code as a collection of paths in its abstract syntax tree, and aggregate these paths, in a smart and scalable way, into a single fixed-length code vector, which can be used to predict semantic properties of the snippet.

Are Deep Neural Networks the Best Choice for Modeling Source Code? [Research paper] by Vincent J. Hellendoorn and Premkumar Devanbu

In this paper, Vincent and Premkumar enhance established language modeling approaches to handle the special challenges of modeling source code, such as: frequent changes, larger, changing vocabularies, deeply nested scopes, etc

Microsoft Software Engineer, Jessie Frazelle Joins source{d} as Advisor [Article] by Swampnil Bhartiya

Jessie Frazelle, a Microsoft Open Source engineer, has joined source{d} as an advisor. “source{d} is not only a great open source citizen with projects like source{d} Engine and their collection of research papers on Machine Learning for code, they also have the expertise and user experience skills to make something that will be truly revolutionary to the way developers interact with code,” said Frazelle.

Featured Community Member

Paige Bailey is Sr. Cloud Developer Advocate at Microsoft, specializing in data visualization, machine learning, and artificial intelligence. Check out her website to see her impressive list of talks and projects. Make sure to follow Paige on twitter @DynamicWebpaige to stay up to date with her latest publications and projects.