Welcome to source{d} bi-weekly, a newsletter with the latest news, resources and events related to Code as Data and Machine Learning on Code. Sign up for source{d} bi-weekly newsletter.

Top 4 metrics to measure your Software Delivery Performance

For the past decades, static code analysis tools such as SonarQube and Coverity have helped engineering teams to ship higher-quality software faster than ever before. However, in recent years, shifts to DevOps practices and the proliferation of developer tools introduced a big challenge for engineering leaders in charge of software delivery performance. That challenge is the lack of end-to-end visibility into their DevOps pipeline with critical data spread across silos. Learn More.

source{d} News

Multi-GPU deep learning at source[d} [Blog]
by Vadim Markovtsev

In this post, Vadim explains how we solved several problems in order to train neural networks with Tensorflow 2.0 on several local GPUs in our ML cluster.

Overton, Apple flavored Machine Learning [Slides]
by Hugo Mougard

In early September, Apple released a paper describing Overton, the framework they built to create, monitor, and improve production-based ML systems. In this slides, Hugo introduces this framework and takes a closer look at the heart of Overton: slice-based learning.

Community News

Neural Code Search Evaluation Dataset [Resarch Paper]
by Hongyu Li, Seohyun Kim, Satish Chandra

There has been an increase of interest in code search using natural language. Assessing the performance of such code search models can be difficult without a readily available evaluation suite. In this paper, the authors present an evaluation dataset consisting of natural language query and code snippet pairs, with the hope that future work in this area can use this dataset as a common benchmark.

Clean code, why bother? part 1 and part 2 [Blog]
by Paula Santamaría

Writing clean code is nice and all but it also takes hard work and sometimes you have to spend extra time to get it right, so why bother? Why is it so important? In this article the author give you many reasons to believe clean code is important and worth your time!

Processing 40 TB of code from ~10 million projects [Blog]
by Ben Boyter

The command line tool I created Sloc Cloc and Code (scc) counts lines of code, comments and makes a complexity estimate for files inside a directory. The latter is something you need a good sample size to make good use of. The way it works is that it counts branch statements in code. However what does that actually mean? For example “This file has a complexity of 10” is not very useful without some context. To solve this issue I thought I would try to run scc at all the source code I could get my hands on. This would also allow me to see if there are any edge cases I didn’t consider in the tool itself. A brute force Q/A trial by fire.

Getting everything wrong without doing anything right [Video]
by Jan Vitek

Github has a wealth of data, trying to mine those data for insights about the software development process is irresistible. This talk is a cautionary tale of what can go wrong if care and healthy skepticism are not applied to the results obtained from data torture.


October 18th: source{d} paper reading club (Online)

October 28-31st: Tensorflow World (Santa Clara, USA)

November 9th: DevFest (Moscow, Russia)

Featured Community Member

Jan Vitek is a Professor of Computer Science at Northeastern University. He holds degrees from the University of Geneva (PhD’99, BS’89) and University of Victoria (MS’95). Professor Vitek works on topics related to the design and implementation of programming languages. Make sure to follow @j_v_66 on Twitter or visit his website to stay up to date with his latest publications.