Machine Learning on Git: introducing Hercules v4

Hercules is an open source project started in late 2016 with the goal to speed up collecting line burndown statistics from Git repositories. It has transformed into a general purpose Git repository mining framework with several cool use cases: ownership through time, file and people embeddings, structural hotness and even comment sentiment estimation. This post presents the latest ‘v4’ release of Hercules and gives some insights into how Git works.

Keep reading

Announcing Public Git Archive

Announcing Public Git Archive, the largest dataset of git repositories in the world.

Keep reading

Detecting licenses in code with Go and ML

Detecting the license of an open source projects is harder than it seems. We have created go-license-detector, a Go library and command line application to solve that task.

Keep reading

Calling C functions from BigQuery with Web Assembly

As part of our experimentations at source{d}, we decided to try and run a C library on BigQuery. Learn this blog post to see how web assembly came to the rescue, and what other improvements we had to apply to achieve decent performance.

Keep reading

Measuring code sentiment in a Git repository

This is the transcript of our MLonCode talk on GopherCon Russia. The idea is to combine the technologies we’ve developed to solve a toy problem: find funny comments.

Keep reading

Why did I join source{d}? - Francesc Campoy

The first post of a series on why multiple employees joined source{d}. This one is by Francesc Campoy.

Keep reading

source{d} does FOSDEM 2018

Almost every source{d} employee just came back from FOSDEM 2018 and we have so much to tell you!

Keep reading

Announcing the latest go-git!

After a year of intense work, we’re happy to announce the latest and best release of go-git ever. go-git v4 includes many new features, making it the most used and feature complete git library written in Go, and in use on production at companies like source{d} and keybase.

Keep reading

Source Code Identifier Embeddings

‘Embed and conquer’, they say. Everything which has a context can be embedded. word2vec, node2vec, product2vec… id2vec! We take source code identifiers, introduce the context as the scope in the Abstract Syntax Tree, and find out that ‘send’ is to ‘receive’ as ‘push’ is to ‘pop’.

Keep reading

enry: detecting languages

Announcing enry, a faster implementation of github/linguist in Go for programming language detection

Keep reading