Welcome to source{d} bi-weekly, a newsletter with the latest news, resources and events related to Code as Data and Machine Learning on Code. Sign up for source{d} bi-weekly newsletter.

Top 10 questions to source{d} CEO: Eiso Kant

With the recent release of source{d} Engine and source{d} Lookout, we’ve been getting a lot of questions from users, partners, prospects, investors, journalists, and analysts. In this blog post, our CEO Eiso Kant answers the top 10 frequently asked questions to clarify many aspects of source{d}’s business and technology stack. Don’t hesitate to ask additional questions in the comments if this blog post does not include the answers you were looking for. Learn More.

source{d} News

CHAOSScon and Git Merge Europe 2019: a recap [Blog]
by Victor Coisne

As the source{d} team was planning its annual trip to FOSDEM, we realized that there were two interesting conferences taking place the day before: CHAOSScon and Git Merge Europe. No hesitation, we arrange our trips to arrive early and attended both events.

Deduplication on large amounts of code [Slides]
by Romain Keramitas

In this talk Romain will discuss how to deduplicate large amounts of source code using the source{d} stack, and more specifically the Apollo project.

MSR Interview #3: Vasiliki Efstathiou [Blog]
by Alexander Dahlin

This article is the third episode of our MSR Interview blog series. This interview is with Vasiliki Efstathiou, a researcher at Athens University of Economics and Business. She has authored two papers submitted at MSR’18 and ICSE’18.

MSR Interview #4: Sarah Nadi [Blog]
by Victor Coisne

This article is the fourth episode of our MSR Interview blog series. After Abram Hindle, Georgios Gousios and Vasiliki Efstathiou, this week’s episode is an interview with Sarah Nader who is an associate professor at the University of Alberta.

Community News

AI-assisted coding comes to Java with Visual Studio IntelliCode [Blog]
by Xiaokai He

IntelliCode saves you time by putting the most relevant suggestions at the top of your completion list. IntelliCode recommendations are based on thousands of open source projects on GitHub, each with over 100 stars, so it’s trained on most popular usage patterns and practices.

Large-Scale Refactoring @ Google [Video]
by Hyrum Wright

In this talk, Hyrum Wright discusses some of the reasons for doing migrations that impact hundreds of thousands of files, and how we do them at Google, using tools such as ClangMR. he will give examples, such as their recent migration to the standardized std::unique_ptr and std::shared_ptr types and lessons we've learned from these experiences.

Structured Neural Summarization [Research Paper]
by Patrick Fernandes, Miltiadis Allamanis & Marc Brockschmidt

Summarization of long sequences into a concise statement is a core problem in natural language processing, requiring non-trivial understanding of the input. Based on the promising results of graph neural networks on highly structured data, the authors develop a framework to extend existing sequence encoders with a graph component that can reason about long-distance relationships in weakly structured data such as text.

Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities [Research Paper]
by Martin White, Michele Tufano, Matías Martínez, Martin Monperrus, and Denys Poshyvanyk

In the field of automated program repair, the redundancy assumption claims large programs contain the seeds of their own repair. However, most redundancy-based program repair techniques do not reason about the repair ingredients—the code that is reused to craft a patch.

Open Vocabulary Learning on Source Code with a Graph-Structured Cache [Research Paper]
by Milan Cvitkovic, Badal Singh, Anima Anandkumar

Machine learning models that take computer program source code as input typically use Natural Language Processing (NLP) techniques. However, a major challenge is that code is written using an open, rapidly changing vocabulary due to, e.g., the coinage of new variable and method names.


March 4th: AI & Society (Paris, France)

March 6th: Predicting the future of Kubernetes from its code (Online)

March 12-14th: Open Source Leadership Summit (Half Moon Bay, California)

April 12th: GothamGo Conference 2019 (New York City)

Featured Community Member

Sarah Nadi is an assistant professor at the University of Alberta. She did her Ph.D. and Masters at the University of Waterloo and then a postdoc at TU Darmstadt in Germany. She generally works on finding or developing techniques for supporting developers in their software maintenance and reuse activities. Check out her website to see her impressive list of papers and projects. Make sure to follow Sarah on twitter @sarahnadi to stay up to date with her latest publications and projects.