mjg59 | Your project's RCS history affects ease of contribution (or: don't squash PRs)

Github recently introduced the option to squash commits on merge, and even before then several projects requested that contributors squash their commits after review but before merge. This is a terrible idea that makes it more difficult for people to contribute to projects.

I'm spending today working on reworking some code to integrate with a new feature that was just integrated into Kubernetes. The PR in question was absolutely fine, but just before it was merged the entire commit history was squashed down to a single commit at the request of the reviewer. This single commit contains type declarations, the functionality itself, the integration of that functionality into the scheduler, the client code and a large pile of autogenerated code.

I've got some familiarity with Kubernetes, but even then this commit is difficult for me to read. It doesn't tell a story. I can't see its growth. Looking at a single hunk of this diff doesn't tell me whether it's infrastructural or part of the integration. Given time I can (and have) figured it out, but it's an unnecessary waste of effort that could have gone towards something else. For someone who's less used to working on large projects, it'd be even worse. I'm paid to deal with this. For someone who isn't, the probability that they'll give up and do something else entirely is even greater.

I don't want to pick on Kubernetes here - the fact that this Github feature exists makes it clear that a lot of people feel that this kind of merge is a good idea. And there are certainly cases where squashing commits makes sense. Commits that add broken code and which are immediately followed by a series of "Make this work" commits also impair readability and distract from the narrative that your RCS history should present, and Github present this feature as a way to get rid of them. But that ends up being a false dichotomy. A history that looks like "Commit", "Revert Commit", "Revert Revert Commit", "Fix broken revert", "Revert fix broken revert" is a bad history, as is a history that looks like "Add 20,000 line feature A", "Add 20,000 line feature B".

When you're crafting commits for merge, think about your commit history as a textbook. Start with the building blocks of your feature and make them one commit. Build your functionality on top of them in another. Tie that functionality into the core project and make another commit. Add client support. Add docs. Include your tests. Allow someone to follow the growth of your feature over time, with each commit being a chapter of that story. And never, ever, put autogenerated code in the same commit as an actual functional change.

People can't contribute to your project unless they can understand your code. Writing clear, well commented code is a big part of that. But so is showing the evolution of your features in an understandable way. Make sure your RCS history shows that, otherwise people will go and find another project that doesn't make them feel frustrated.

(Edit to add: Sarah Sharp wrote on the same topic a couple of years ago)

Flat | Top-Level Comments Only

From: (Anonymous)

I find myself doing this a lot, where I hack in WIP and SQUASH and all sorts of junk commits into the tree, but when I reach the end of a development state, I back the branch up, git reset back to the start, and work everything back in step by step, using git stash a lot.

To some people this is probably a lot more work than they are willing to do, but I think it's the only way I've found that gets between type A (revert, squash, hack, WIP, revert WIP) and type B, squash it all.

airlied

You can achieve a lot with git rebase -i...

I also do the same, I consider it my duty to whomever will inherit that code for future maintenance. The (final) commit messages are also as complete as possible.

The process of cleaning up a feature/topic branch into something worth of being enshrined in history is _also_ a self-review of the code. I've often found issues at that stage, and fixed them before wasting time of my peers.

I also compare the end result of the cleanup with the end result of the old mess, to ensure I did not introduce any errors in the process. Then, it is "git commit --amend -S" time (to seal the branch with a crypto signature), and ship it to review/integration (merge request).

Matthew Garrett

Your project's RCS history affects ease of contribution (or: don't squash PRs)

Your project's RCS history affects ease of contribution (or: don't squash PRs)

finish the code, and then start again with the history

Re: finish the code, and then start again with the history

Re: finish the code, and then start again with the history

Profile

About Matthew

Page Summary

Expand Cut Tags