I love pulling statistics out of git

I thought after the release of Liblime ILS I should do a bit of a statistics run using gitdm on the fork (from the point it forked UPDATE Ive been told I ran it from prior to the fork, so I have rerun the stats in a new blogpost) vs on the master branch of Koha. These statistics span the time period of September 16 2009 to now.

Summary:

  • In the period since the fork, Koha has had 3455 changesets, Liblime ILS has had 826
  • Koha had 95 different authors, Liblime ILS had 38
  • Koha: A total of 11,191,972 lines added, 5,424,709 removed (delta 5,767,263)
  • Liblime ILS: A total of 1,199,456 lines added, 736,049 removed (delta 463,407)

 

 

Here are the statistics for Koha

Developers with the most changesets
Chris Cormack 485 14.0%
Owen Leonard 360 10.4%
Galen Charlton 308 8.9%
Henri-Damien LAURENT 272 7.9%
Paul Poulain 246 7.1%
Colin Campbell 165 4.8%
Lars Wirzenius 125 3.6%
Chris Nighswonger 117 3.4%
Matthias Meusburger 115 3.3%
Katrin Fischer 114 3.3%
Nahuel ANGELINETTI 97 2.8%
Frédéric Demians 93 2.7%
Jean-André Santoni 89 2.6%
Nicole Engard 84 2.4%
Garry Collum 76 2.2%
Nicole C. Engard 71 2.1%
Jared Camins-Esakov 50 1.4%
Alex Arnaud 46 1.3%
Ian Walls 45 1.3%
Christopher Hall 38 1.1%
Robin Sheat 34 1.0%
Marcel de Rooy 34 1.0%
Jesse Weaver 28 0.8%
Jane Wagner 26 0.8%
Donovan Jones 25 0.7%
Stéphane Delaune 21 0.6%
Magnus Enger 20 0.6%
Srdjan Jankovic 16 0.5%
Liz Rea 16 0.5%
Bernardo Gonzalez Kriegel 14 0.4%
Kyle M Hall 13 0.4%
Tomas Cohen Arazi 12 0.3%
J. David Bavousett 12 0.3%
Frédérick Capovilla 9 0.3%
Michael Hafen 9 0.3%
Julian Maurice 8 0.2%
Zach Sim 8 0.2%
Andrew Elwell 8 0.2%
Koustubha Kale 7 0.2%
ruth@bywatersolutions.com 7 0.2%
Jared CAMINS-ESAKOV 6 0.2%
johnboy 6 0.2%
Christophe Croullebois 5 0.1%
John Soros 5 0.1%
Ricardo Dias Marques 4 0.1%
Sophie Meynieux 4 0.1%
Salvador Zaragoza Rubio 4 0.1%
Nicolas Morin 4 0.1%
MJ Ray 4 0.1%
Mason James 4 0.1%
Amit Gupta 4 0.1%
ByWater Solutions 4 0.1%
Will Stokes 4 0.1%
Cindy Murdock Ames 4 0.1%
David Birmingham 4 0.1%
Piotr Wejman 4 0.1%
fdurand 3 0.1%
Janusz Kaczmarek 3 0.1%
koha-preprod 3 0.1%
root 3 0.1%
Reed Wade 3 0.1%
Eric Olsen 3 0.1%
Sébastien Hinderer 3 0.1%
Schuster 2 0.1%
Brice Sanchez 2 0.1%
claudia 2 0.1%
Zeno0 Tajoli 2 0.1%
Brian Engard 2 0.1%
Matthew Hunt 2 0.1%
Wolfgang Heymans 2 0.1%
brendan 2 0.1%
Marc Chantreux 2 0.1%
conan (aka Fernando Canizo) 1 0.0%
Jonathan Druart 1 0.0%
Mark Gavillet 1 0.0%
Doug Dearden 1 0.0%
Savitra Sirohi 1 0.0%
Frère Sébastien Marie 1 0.0%
marcel@libdevelop.rijksmuseum.nl 1 0.0%
Jerome Charaoui 1 0.0%
spartaness 1 0.0%
Dobrica Pavlinusic 1 0.0%
Joe Atzberger 1 0.0%
Edward Allen 1 0.0%
Serhij Dubyk {?????? ?????} 1 0.0%
koha 1 0.0%
f.demians at tamil.fr 1 0.0%
dev2 1 0.0%
Nate Curulla 1 0.0%
Daniel Grobani 1 0.0%
Andrew Chilton 1 0.0%
Koha 1 0.0%
Koha User 1 0.0%
Zeno Tajoli 1 0.0%
NYUHSL 1 0.0%
Developers with the most changed lines
Frédéric Demians 5689198 46.9%
Chris Cormack 4870443 40.2%
Nahuel ANGELINETTI 282692 2.3%
Magnus Enger 57925 0.5%
Galen Charlton 57686 0.5%
Katrin Fischer 49202 0.4%
Piotr Wejman 48396 0.4%
Paul Poulain 41848 0.3%
Jared Camins-Esakov 21386 0.2%
Owen Leonard 17507 0.1%
Henri-Damien LAURENT 16803 0.1%
Chris Nighswonger 11286 0.1%
Matthias Meusburger 7601 0.1%
Lars Wirzenius 6403 0.1%
Salvador Zaragoza Rubio 4307 0.0%
Colin Campbell 3571 0.0%
Jesse Weaver 3399 0.0%
Nicole C. Engard 3106 0.0%
Andrew Elwell 3104 0.0%
Jean-André Santoni 3050 0.0%
Nicole Engard 1615 0.0%
Christopher Hall 1192 0.0%
Stéphane Delaune 1139 0.0%
Ian Walls 1066 0.0%
Kyle M Hall 933 0.0%
Koustubha Kale 869 0.0%
Robin Sheat 742 0.0%
Srdjan Jankovic 626 0.0%
Mason James 604 0.0%
Alex Arnaud 532 0.0%
Eric Olsen 499 0.0%
Garry Collum 478 0.0%
Marcel de Rooy 469 0.0%
John Soros 453 0.0%
Will Stokes 405 0.0%
Tomas Cohen Arazi 372 0.0%
Donovan Jones 354 0.0%
Jane Wagner 273 0.0%
johnboy 214 0.0%
Liz Rea 182 0.0%
Zeno0 Tajoli 168 0.0%
Zach Sim 148 0.0%
Jared CAMINS-ESAKOV 123 0.0%
David Birmingham 113 0.0%
Bernardo Gonzalez Kriegel 96 0.0%
Amit Gupta 84 0.0%
Julian Maurice 83 0.0%
Doug Dearden 82 0.0%
Christophe Croullebois 73 0.0%
Zeno Tajoli 73 0.0%
J. David Bavousett 66 0.0%
Joe Atzberger 66 0.0%
Sophie Meynieux 52 0.0%
Michael Hafen 48 0.0%
dev2 37 0.0%
root 36 0.0%
koha 34 0.0%
Savitra Sirohi 30 0.0%
Reed Wade 26 0.0%
Serhij Dubyk {?????? ?????} 26 0.0%
Frédérick Capovilla 22 0.0%
ruth@bywatersolutions.com 21 0.0%
Ricardo Dias Marques 17 0.0%
Brian Engard 17 0.0%
ByWater Solutions 16 0.0%
Brice Sanchez 16 0.0%
spartaness 15 0.0%
f.demians at tamil.fr 14 0.0%
Nate Curulla 14 0.0%
Daniel Grobani 14 0.0%
Sébastien Hinderer 13 0.0%
Jonathan Druart 12 0.0%
MJ Ray 10 0.0%
Marc Chantreux 9 0.0%
Andrew Chilton 9 0.0%
Koha 9 0.0%
Janusz Kaczmarek 8 0.0%
Nicolas Morin 7 0.0%
koha-preprod 7 0.0%
Matthew Hunt 6 0.0%
claudia 5 0.0%
Cindy Murdock Ames 4 0.0%
fdurand 4 0.0%
Frère Sébastien Marie 4 0.0%
Koha User 4 0.0%
Schuster 2 0.0%
Wolfgang Heymans 2 0.0%
marcel@libdevelop.rijksmuseum.nl 2 0.0%
Edward Allen 2 0.0%
NYUHSL 2 0.0%
conan (aka Fernando Canizo) 1 0.0%
Mark Gavillet 1 0.0%
Jerome Charaoui 1 0.0%
Dobrica Pavlinusic 1 0.0%
Developers with the most lines removed
Jared Camins-Esakov 20026 0.4%
Andrew Elwell 1158 0.0%
Colin Campbell 246 0.0%
Christopher Hall 222 0.0%
Serhij Dubyk {?????? ?????} 26 0.0%
Zeno0 Tajoli 17 0.0%
Ricardo Dias Marques 11 0.0%
claudia 4 0.0%
Jonathan Druart 2 0.0%
Wolfgang Heymans 1 0.0%
Developers with the most signoffs (total 2727)
Galen Charlton 1135 41.6%
Chris Cormack 876 32.1%
Nicole C. Engard 191 7.0%
Ian Walls 66 2.4%
Colin Campbell 65 2.4%
Katrin Fischer 63 2.3%
Henri-Damien LAURENT 45 1.7%
Owen Leonard 36 1.3%
Julian Maurice 32 1.2%
Jared Camins-Esakov 29 1.1%
Magnus Enger 20 0.7%
Claire Hernandez 19 0.7%
Chris Nighswonger 18 0.7%
Stéphane Delaune 17 0.6%
Frédéric Demians 17 0.6%
Marcel de Rooy 16 0.6%
Liz Rea 13 0.5%
Paul Poulain 12 0.4%
Jonathan Druart 11 0.4%
fdurand 11 0.4%
Christophe Croullebois 7 0.3%
ruth@bywatersolutions.com 6 0.2%
Robin Sheat 3 0.1%
Koustubha Kale 3 0.1%
Matthias Meusburger 3 0.1%
Frederic Demians 2 0.1%
Guillaume Hatt 2 0.1%
Jane Wagner 2 0.1%
Jesse Weaver 2 0.1%
Mark Gavillet 1 0.0%
Davi 1 0.0%
Frère Sébastien Marie 1 0.0%
Sophie Meynieux 1 0.0%
Salvador Zaragoza Rubio 1 0.0%
Top changeset contributors by employer
Biblibre 913 26.4%
(Unknown) 741 21.4%
Catalyst 554 16.0%
ACPL 359 10.4%
ByWater-Solutions 177 5.1%
BigBallOfWax 176 5.1%
PTFS-Europe 166 4.8%
Foundations 117 3.4%
BSZ-BW 116 3.4%
Tamil 93 2.7%
PTFS 43 1.2%
Top lines changed by employer
Tamil 5689288 46.9%
BigBallOfWax 4809342 39.7%
Catalyst 951588 7.9%
Biblibre 362753 3.0%
(Unknown) 192337 1.6%
BSZ-BW 49211 0.4%
ByWater-Solutions 26701 0.2%
ACPL 20793 0.2%
Foundations 14834 0.1%
PTFS-Europe 4347 0.0%
PTFS 495 0.0%
Employers with the most signoffs (total 2727)
(Unknown) 1209 44.3%
Catalyst 875 32.1%
ByWater-Solutions 289 10.6%
Biblibre 147 5.4%
PTFS-Europe 66 2.4%
BSZ-BW 63 2.3%
ACPL 36 1.3%
Tamil 19 0.7%
Foundations 18 0.7%
BigBallOfWax 3 0.1%
PTFS 2 0.1%

And here are the statistics for Liblime ILS

Developers with the most changesets
D Ruth Bavousett 161 19.5%
Clay Fouts 113 13.7%
David Birmingham 106 12.8%
PTFS 101 12.2%
Colin Campbell 81 9.8%
Jesse Weaver 58 7.0%
Jane Wagner 28 3.4%
Kyle M Hall 23 2.8%
Ha Quach 21 2.5%
Chris Cormack 19 2.3%
Owen Leonard 15 1.8%
Galen Charlton 11 1.3%
dev3 11 1.3%
Ryan Higgins 9 1.1%
cfouts 8 1.0%
Garry Collum 8 1.0%
Ian Walls 5 0.6%
ctftest2 5 0.6%
Jean-André Santoni 5 0.6%
Katrin Fischer 4 0.5%
Nicole Engard 4 0.5%
Henri-Damien LAURENT 4 0.5%
dev2 3 0.4%
Frédéric Demians 3 0.4%
Chris Nighswonger 3 0.4%
Liblime 2 0.2%
Arcadia Koha 2 0.2%
Nahuel ANGELINETTI 2 0.2%
kyletest 2 0.2%
Robert Vernon Phillips 1 0.1%
Robert Phillips 1 0.1%
J. David Bavousett 1 0.1%
Michele Maenpaa 1 0.1%
David Bavousett 1 0.1%
Koha User 1 0.1%
Magnus Enger 1 0.1%
Paul Poulain 1 0.1%
Michael Hafen 1 0.1%
Developers with the most changed lines
Chris Cormack 985443 81.6%
D Ruth Bavousett 167675 13.9%
David Birmingham 9298 0.8%
PTFS 7376 0.6%
Owen Leonard 5952 0.5%
Jesse Weaver 3753 0.3%
Kyle M Hall 3312 0.3%
Clay Fouts 3298 0.3%
Colin Campbell 3293 0.3%
Henri-Damien LAURENT 1830 0.2%
Ryan Higgins 1817 0.2%
Ha Quach 1561 0.1%
Jean-André Santoni 1176 0.1%
Galen Charlton 809 0.1%
Frédéric Demians 710 0.1%
ctftest2 472 0.0%
Robert Phillips 312 0.0%
Nicole Engard 300 0.0%
Jane Wagner 296 0.0%
cfouts 143 0.0%
kyletest 134 0.0%
Ian Walls 106 0.0%
dev3 99 0.0%
Arcadia Koha 88 0.0%
Garry Collum 50 0.0%
dev2 43 0.0%
Chris Nighswonger 33 0.0%
David Bavousett 31 0.0%
Katrin Fischer 24 0.0%
Nahuel ANGELINETTI 18 0.0%
Robert Vernon Phillips 15 0.0%
Magnus Enger 6 0.0%
Michael Hafen 5 0.0%
Koha User 4 0.0%
Paul Poulain 4 0.0%
Liblime 2 0.0%
J. David Bavousett 2 0.0%
Michele Maenpaa 1 0.0%
Developers with the most lines removed
dev3 34 0.0%
Michael Hafen 1 0.0%
Developers with the most signoffs (total 62)
Galen Charlton 59 95.2%
Clay Fouts 3 4.8%


Top changeset contributors by employer
PTFS 521 63.1%
(Unknown) 133 16.1%
PTFS-Europe 81 9.8%
Liblime 29 3.5%
BigBallOfWax 18 2.2%
ACPL 15 1.8%
Biblibre 12 1.5%
ByWater-Solutions 6 0.7%
BSZ-BW 4 0.5%
Foundations 3 0.4%
Tamil 3 0.4%
Catalyst 1 0.1%
Top lines changed by employer
BigBallOfWax 991503 82.1%
PTFS 190261 15.7%
(Unknown) 9640 0.8%
ACPL 5978 0.5%
Liblime 3644 0.3%
PTFS-Europe 3349 0.3%
Biblibre 3043 0.3%
Tamil 711 0.1%
ByWater-Solutions 145 0.0%
Foundations 33 0.0%
BSZ-BW 24 0.0%
Catalyst 7 0.0%
Employers with the most signoffs (total 62)
(Unknown) 59 95.2%
PTFS 3 4.8%

9 thoughts on “I love pulling statistics out of git”

  1. Chris, the glitch of counting translation commits makes those stats quite strange: Frederic has 2.7% of the changeset and 50% of the changed lines. Isn’t there a possibility to ignore changesets related to .po ?
    Or add a small disclaimer at the changed lines stats, to explain those “strange” numbers.
    (PS: don’t spend too much time on that, it’s very minor)

    Like

  2. There probably is, but I don’t think it really matters, it’s interesting to me to see how much the translations changed. I think the important thing is the number of changesets and the number of developers anyway.

    Like

  3. I show the point of departure between the two repos being commit 3bab38c. You have exactly one commit more recent than that in the 4_02 branch of git://github.com/liblime/LibLime-Koha.git repo, and it’s not 985,443 lines long; it’s 10. So where exactly did you count the other 985,433 lines?

    Incidentally, our 4.2 was released about the same time the NZ version’s 3.2 was, so that would be a more relevant point of comparison if you were interested in such a thing.

    Like

  4. I ran them both from 58ee841a73ea02d38d465e8c4663ab7bf509a62e to the head of the respective branches, that actually gives the impression that the Liblime fork has more commits on it. THanks for pointing that out, I will rerun it.

    Also the date of the last commit has nothing to do with it, Koha has 3455 changesets, Liblime ILS 826 thats where the 10,000 lines come in. I think you are misunderstanding how git works.

    I will rerun them from that commit you suggested to the latest release of both repos. Also what is this NZ release you speak of? Do you mean the Koha project release of 3.2.0 ?

    Coincidentally the tool used to count this is gitdm, its what the linux kernel developers use, not something I have written. The numbers come direct from that.

    Like

  5. Yes, the date has nothing to do with it. I understand perfectly well how git works, which is why I used git-merge-base to determine the point of divergence rather than a date. What prompted you to use 58ee841 as a starting point besides the fact that allowed you to include 900k+ lines of translation file changes? It’s not 10,000 lines of of your commits that we’ve included in 4.2. It’s ten, as in the number after nine, which is the number of pluses and minuses in commit 45f2e56, the only one to have been picked in since the divergence.

    Like

  6. I picked that commit because that was the last version number,
    3.01.00.061 that both had in common. Anyway to assuage your anger I have rerun them from the commit you suggested.
    https://blog.bigballofwax.co.nz/2011/05/25/new-batch-of-statistics/

    I never suggested you added 10,000 lines of ‘our’ commits, to your fork. In fact I don’t care about that, you are welcome to cherry pick as much as you like into your fork.

    You are angry because you think I implied you cherry-picked something you didn’t? I’m not sure it matters, whether that commit was there before you forked or after, the commit is still in your repo. All I was interested in was the number of commits since the fork.

    Like

  7. I’m not angry, just pedantic. If you’re going to measure something, it should be done accurately, and I knew your stats were wrong. The numbers don’t reflect much difference one way or the other, and “lines of code” or “commits” or “lines of deltas” aren’t informative data points about the progression of a code base, at any rate. Number of committers is somewhat more interesting, but only because having more eyes is nearly always better (though more hands is definitely not always better).

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s