P-Hacking with Dinosaurs

The Agile Mafia responded predictably, pathetically to Impact Engineering

Colm Campbell
7 min readAug 19, 2024

This is gonna be a tough one to write: I‘ll be critiquing my heroes. It’s a sad task. I’d rather sing the praises of people I admire, and ignore their few faults.

To get a flavour of what I’m dealing with, think about Jimi Hendrix. Hendrix was the total G.O.A.T. of rock guitar. I would feel churlish to write anything critical about his playing (like, “… but, but, he dropped so much acid at the Isle of Wight in ’70 that he played out of tune, with wonky timing … hardly the greatest”).

Einstein’s contribution to physics is huge — he was usually right. Who wants to focus on the few things he got wrong? It would just come across as envious and facetious. (“Albert Einstein? Hmmph: what an assclown. He couldn’t admit that QM was correct, despite all the evidence!”)

So, I’ve had a dilemma. Two of the best contributors to my field are being closed-minded — about one topic — and using their authority to misdirect people. Should I call them out on it (and take a lot of flak from their fans)? Or, should I just let it slide? Both of them have done much more to improve the theory and practice of Software Engineering than I ever will. I’ve learned a great deal from each of them, over many years, from their books and published talks. They deserve the respect they hold in the community. Who even cares if I disagree with them?

In the end, I decided to just get it off my chest: Martin Fowler and Dave Farley are too scared to look at valid data that threatens their ‘Agile-only’ worldview

After the online storm-in-a-teacup about the Impact Engineering book, and after I’d blogged about it, I saw this in my LinkedIn feed:

Screenshot from LinkedIn

So, Martin Fowler, a heavy-hitter who built his career on Agile projects, doesn’t like to see that “Agile projects fail 268% more …”, and sends the idea straight to his mental trash-can. Predictable. As the old saying goes, “It is difficult to get a man to understand something, when his job depends on his not understanding it”.

Fowler was using somebody called Hillel Wayne, to justify his prejudice against Impact Engineering, so I checked out what Hillel Wayne has to say. It turns out that Wayne’s only points (beside a lot of invective and baseless pooh-poohing) are:

  • The p-value supplied for the ‘268% more failures’ claim is low (about 0.00004).
  • Such low p-scores are rare in scientific research papers
  • Something … about logical impossibility of one of the results (which he leaves cryptically unexplained).

(I don’t know what Hillel Wayne meant exactly by the last of those three points, but it’s simply not true. I’ve looked at the published report and its available summary data in depth, and there’s nothing obviously implausible or impossible about the statistics it presents. I’ve checked this with two PhD-grade Data Scientists, who have no dog in the fight over Agile, and they agree: the stats are perfectly possible.)

Wayne’s p-value complaint is just misguided: very low p-values are perfectly possible, depending on the data in question, and the type of statistic being calculated. Here is the actual, contentious stat as quoted in the original Impact Engineering report:

https://www.engprax.com/post/268-higher-failure-rates-for-agile-software-projects-study-finds

So, it looks like the quoted t-statistic and p-value are on the positive correlation found between ‘Agile Requirements Engineering’ (ARE) and project failure (PF). The obvious way to formalise this result is that: “out of the 600 engineers who were surveyed about their last project, those who claimed their project had used ARE where 268% more likely to claim their project was a failure”. If we assume that the null hypothesis used to derive these statistics was “using ARE has zero effect on the likelihood of project failure”, and that the the T-statistic and P-value come from a standard Correlation Test, then it is trivially easy to find realistic data values that would have given P much less than 4E-5.

To prove this to myself (and to you), I created a dummy dataset of 600 records, each having a value for ‘Is Agile’ (either 1 or 0), and a value for ‘Project Failed’ (also, either 1 or 0). 300 records were ‘Agile’, 300 were ‘not-Agile’. 268 of the ‘Agile’ records were marked as ‘Project Failed’, while only 100 of the ‘not-Agile’ records were ‘Project Failed’. Obviously, this matches the basic finding from the report, that Agile fails with 2.68 times the frequency of not-Agile.

testing with R

In my dummy example, the P-Value came out as 0.0000000000000002 (!!!). It would be trivial to adjust the data to give the same outcome with whatever p-value we want.

Very low p-values are very possible when the sample size is large enough (600 will do it), and the correlation is high. Now, I’m not objecting to Hillel Wayne being possibly mistaken about this. (He may have mis-interpreted the report, or over-generalised from his experience of p-values in other contexts.) I do object to Martin Fowler using Wayne’s mistaken and mostly rhetorical critique as sufficient cause to laugh the Impact Engineering findings out of court.

Look at it this way: if the Impact Engineering study’s data and its findings are valid and correct at all, they indicate that the most prevalent way of running software development projects is enormously problematic: it may be costing the global economy several hundreds-of-billions of dollars each year due to failed projects. Shouldn’t we at least have this debate in a respectful, scientific and open way? It’s one thing for the dinosaurs of software to strongly defend opinions that have served them well in their careers. It’s quite another for them to hide contradictory evidence under a wall of mockery.

I read quite a few of the hundreds of comments under Fowler’s LinkedIn post. Almost all of them were agreeing with him. Many laughed at Impact Engineering, because Fowler did. None of the commenters appears to have looked at the data in an impartial or open-minded way.

Sad.

[Just to be clear, the data and findings from Impact Engineering may be problematic or wrong for various other reasons. I don’t have enough information about their methods to say for sure. I just know there’s no cause to dismiss them out of hand.]

Fowler’s response appalled me. Dave Farley’s response left me totally dismayed. Farley usually comes across as a very honest engineer, who is looking for data-driven and evidence-led improvements to how we do software engineering. His response to the Impact Engineering press coverage turned out to be every bit as prejudiced as Fowler’s. His objections were almost all rhetorical. He noted that:

  • Junade Ali has edited the Wikipedia articles about Agile, to include mention of Impact Engineering (So what? That’s a perfectly correct use of Wikipedia!)
  • The publication of the data seemed timed to coincide with the publication of Ali’s book (Again, so what?)
  • The “268%” thing is clickbait (So what? Scientists have publicists. That doesn’t invalidate their science)
  • It is: “nonsense!”; “absurd”; and “unscientific” (without really evidencing this. Farley even totally misreads one of the points from the report, pretending it says the opposite of what it actually does! He claims that this shows Ali’s “jumbled thinking” when in fact it’s absolutely his own thinking that’s jumbled here. [around 17:30 in the YouTube video])
  • The survey asked ‘leading questions’ (Again, so what? If Ali really asked a random sample of 600 engineers: a) ‘Did your last project fail?’; and then b) ‘Was it Agile?’; and got a very high correlation between Agile and Failed: then something is very wrong with Agile on some level, and anybody who cares about software delivery and software developers should give it careful thought)

I feel for Farley, and Fowler. They’ve hung their careers, including a lot of really great work about specific technical practices, on the mystical grand edifice of ‘Agility’, which is now collapsing under its own weight of bullshit. All humans will resist ideas that challenge their core worldviews and belief systems — I don’t blame them for being human. It’s just so vital now for us to move beyond the ‘Agile’ fiasco, that we can’t let such key opinion formers be revered while they’re utterly stuck in the past, and unwilling to imagine something different.

Photo by engin akyurt on Unsplash

--

--

Colm Campbell

Agile dissident. Tech leadership, cloud tech, software engineering, startups. Verging occasionally into AI/ML, metaphysics, management, philosophy of mind.