What Are Your 2016 GitHub Commit Statistics?

Phil Sturgeon
4 min readDec 29, 2016

It was a cold December afternoon, and we were wondering: how many commits were made, and how many lines were added and removed this year?

Somebody had some fancy infographic to make, but who cares about that, this sounded like a code and API challenge. After searching around to make sure there was no existing solution (the best solution is usually the one you don’t have to build), we jumped into the GitHub API documentation. The end result was a gem: we-github-stats.

The first issue is that we have a lot of repositories. The GitHub Repositories API has a List organization repos endpoint, which gets that list. Then there is a Statistics section with a few endpoints to help with the numbers.

Get the last year of commit data was promising, it returns a whole pile of paginated data with a total number of commits for each week. Map and reduce on each total and you’re set.

[
{
"days": [
0,
3,
26,
20,
39,
1,
0
],
"total": 89,
"week": 1336280400
}
]

Then there is Get the number of additions and deletions per week, which seems to give data for every week ever. Not a problem, just map that and reject any data with a week timestamp we don’t like.

[
[
1302998400,
1124,
-435
]
]

Easy enough, should take about 10 minutes…

Setbacks

Things that look simple rarely are, and a few twigs were shoved in our spokes.

Firstly, the loops to handle the pagination looked boring and full of boilerplate. To avoid having to write that, Octokit was thrown into the mix. It is a GitHub Ruby SDK which has an auto pagination feature. Flipping that on handles the boilerplate for you, and simplifies the HTTP interactions.

Octokit.auto_paginate = true
Octokit::Client.new(access_token: access_token)

Secondly, GitHub repo statistics are not preemptively generated. They’re an unwarmed cache, which simply returns a sad pair of empty curly braces, with a 202 Accepted status code. Once it’s finished generating you can try again.

We settled for “just run the script a few times” as the code was not the product, the numbers were.

I threw in a little simple code to alert you when running the script:

if incomplete != []
puts "Warning: The following stats are not ready on the GitHub API:"
incomplete.each { |repo_name| puts "\t- #{repo_name}" }
puts "Please wait a few minutes and try again. In the meantime, the stats for other repos are..."
end

Finally, we hit another snag: rate limiting. During my lazy “it’ll only take a second” development I’d chosen to forgo test-driven development, and foolishly tried to knock out the code quickly. Running it over and over again cranked through our rate limit in no time.

If we were to do it again, using VCR would be a better approach, recording the HTTP responses, and working off of those in tests.

Usage

This gem does everything we need to get the job done, but it’s a little primitive.

$ gem install we-github-stats
$ github_stats -o wework -t super-secret-token

That will output a table for the repos that are done, count up the totals, and let you know if any are still being calculated by GitHub:

Warning: The following stats are not ready on the GitHub API:
- wework.github.io
- we-learn-react
- we-interview
- we-js-logger
- careday-api
- dotenv-rails-safe
- careday-app
- eslint-config-wework
- we-github-stats
Please wait a few minutes and try again. In the meantime, the stats for other repos are...
==== Repositories ====
+---------------+---------+-------------+---------------+
| Name | Commits | Lines Added | Lines Removed |
+---------------+---------+-------------+---------------+
| env-universal | 54 | 2358 | -692 |
+---------------+---------+-------------+---------------+
==== Total ====
Commits: 54
Lines Added: 2358
Lines Removed: -692

Run a few minutes later, you’ll see:

==== Repositories ====
+-----------------------+---------+-------------+---------------+
| Name | Commits | Lines Added | Lines Removed |
+-----------------------+---------+-------------+---------------+
| wework.github.io | 27 | 357 | -208 |
| we-learn-react | 0 | 0 | 0 |
| we-interview | 0 | 0 | 0 |
| we-js-logger | 64 | 2914 | -1037 |
| env-universal | 54 | 2358 | -692 |
| careday-api | 7 | 3223 | -628 |
| dotenv-rails-safe | 21 | 737 | -281 |
| careday-app | 15 | 1686 | -434 |
| eslint-config-wework | 4 | 347 | -3 |
| we-github-stats | 2 | 384 | -23 |
+-----------------------+---------+-------------+---------------+
==== Total ====
Commits: 233
Lines Added: 13166
Lines Removed: -3375

Want it in CSV? Pass the format parameter:

$ github_stats -o wework -t super-secret-token -f csv

It’ll give you a header row and a line with these stats for each completed repository.

Name, Commits, Lines Added, Lines Removed
wework.github.io,27,357,-208
we-learn-react,0,0,0
we-interview,0,0,0
we-js-logger,64,2914,-1037
env-universal,54,2358,-692
careday-api,7,3223,-628
dotenv-rails-safe,21,737,-281
careday-app,15,1686,-434
eslint-config-wework,4,347,-3
we-github-stats,2,384,-23

Looking at the internal WeWork organization for those stats, we clearly crushed it:

  • Roughly 30k commits
  • 16 million lines of code added
  • 9 million deleted

Glad to see we’re refactoring and not just flinging code into our repos! Either that or we can’t quite decide on tabs or spaces yet. Who knows.

There are plenty of todo items to make it perfect, but this gets the job done.

Want sorting? Use CSV and let Excel help you out.

Want graphics? Use CSV and let Excel help you out.

In the mean time, I wish you a happy ending to 2016, and hope that 2017 is drastically better.

--

--

Phil Sturgeon

Bike nomad turned electric van nomad, boycotting fossil-fuels, working on reforestation and ancient woodland restoration as co-founder of Protect Earth. he/him