Welcome to source{d} bi-weekly, a newsletter with the latest news, resources and events related to Code as Data and Machine Learning on Code. Sign up for source{d} bi-weekly newsletter.

Introducing source{d} Community Edition beta with built-in UI, dashboards and GitHub support

We are pleased to announce the beta release of source{d} Community Edition (CE) 0.14, formerly known as the source{d} Engine. source{d} Community Edition is the only free & open source product that provides individual developers, small to mid-size companies and open source project maintainers visibility into their codebase foundation, compliance with development guidelines, insights on their talent and overall productivity metrics. Read more in this blog post.

source{d} News

Announcing source{d} EE: A scalable, secure and extensible data platform [Blog]
by Victor Coisne and Marcelo Novaes

As part of our mission to enable large scale Engineering Observability over the entire Software Development Life Cycle (SDLC), we’re very excited to announce the release of source{d} Enterprise Edition (EE).

source{d} EE Product demo [Video]
by Eiso Kant

source{d} aims to provide this end-to-end platform so that people at different levels of IT organizations can easily extract, load and transform data into metrics that can be quickly analyzed and visually presented for informed, data-driven decision making.

Software Development Analytics Platform, source{d} launches an enterprise edition [Article]
by Frederic Lardinois

source{d}, provides developers and IT departments with deeper analytics into their software development life cycle. It analyzes codebases, offers data about which APIs are being used and provides general information about developer productivity and other metrics.

Mining Software Development History: Approaches and Challenges [Slides]
by Vadim Markovtsev

Vadim Markovtsev provided the audience at the ML Conference in Munich with fun history mining examples and presented some of the available tooling. The involved topics included graph embeddings, manifold learning, dynamic time warping, seriation, and modern clustering algorithms.

Community News

A Novel Neural Source Code Representation based on AST [Research Paper]
by Yujia Li, Chenjie Gu Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang and Xudong Liu

In this paper, it is proposed a novel AST-based Neural Network (ASTNN) for source code representation. Unlike existing models that work on entire ASTs, ASTNN splits each large AST into a sequence of small statement trees, and encodes the statement trees to vectors by capturing the lexical and syntactical knowledge of statements. Based on the sequence of statement vectors, a bidirectional RNN model is used to leverage the naturalness of statements and finally produce the vector representation of a code fragment.

C# or Java? TypeScript or JavaScript? ML based classification of languages [Blog]
by Kavita Ganesan and Romano Foti

GitHub hosts over 300 programming languages—from commonly used languages such as Python, Java, and Javascript to esoteric languages such as Befunge, only known to very small communities. One of the necessary challenges that GitHub faces is to be able to recognize these different languages.To make language detection more robust and maintainable in the long run, the authors of this blog post developed a machine learning classifier named OctoLingua. Click the link to read more.

Import2vec - Learning Embeddings for Software Libraries [Research Paper]
by Bart Theater, Frederik Vandeputte and Tom Van Cutsem

In this paper, the problem of developing suitable learning representations (embeddings) for library packages that capture semantic similarity among libraries is considered. Such representations are known to improve the performance of downstream learning tasks (e.g. classification) or applications such as contextual search and analogical reasoning.


July 11th: source{d} Enterprise Edition: Unlocking Engineering Observability with advanced IT analytics (Online)

July 12th: source{d} paper reading club (Online)

Sept 19th-20th: Open Core Summit (San Francisco, US)

Featured Community Member

Sarah Nadi

Sarah Nadi is currently an Assistant Professor at the University of Alberta. Check out her website to see her impressive list of papers and projects. Make sure to follow Sarah on twitter @sarahnadi to stay up to date with his latest publications and projects