Git commit hygiene is one of those things where you don’t realize you have such strong opinions – until you work with someone who has different practices.
I have expectations about how senior devs will commit code, and for the most part, they seem to align with others. But every so often, I’ll run into a great engineer with years of experience who has just never learned good habits around source control. These are often folks who either worked on mostly solo projects, where no one else is reviewing commits, or worked on enormous projects with huge teams, where there’s so much noise that commit patterns get lost.
My hope with this post is to provide a primer for those developers who’ve never received any mentoring or feedback on their source control habits. Most of this post is applicable to subversion or TFS or any modern source control system, but I do assume branching is easy.
While there are many areas of contention, there are also many general best practices, and the items in this list should serve you well on most teams.
Keep commits focused – You’ll find different levels of zealotry around this, just as you will with the Single Responsibility Principle, and it’s similar in its vague helpfulness. In general, a commit should have one primary change it is effecting in the codebase.
Consistency – When starting on an existing codebase, the best course of action is typically to follow the established patterns, at least for a while. There are various approaches to source control that can work well – if everyone is following the same rules. The worst repos I’ve seen have had multiple teams, with different approaches. Crazy inconsistent branching strategies and divergent branch names, some teams rebasing but others not, some squashing commits but others not, etc.
Commit early and often – I am a huge proponent of frequent, small commits. I’d much rather see someone err on the side of too many commits, rather than too few. That’s partially from my years of consulting, where the git log can be helpful in explaining billable hours, but it’s also the foundation of other best practices. It’s hard to keep commits focused or write a good commit message if your commit comprises several hours of work.
Push commits early and often – This one might be contentious, but I strongly believe it. Commits should be pushed every day, at an absolute minimum. Ideally you’re pushing after every commit, multiple times a day. I know developers don’t want to commit or push half-finished work, with failing tests, etc. I get it. I really do. But that’s what personal branches are for. If your org doesn’t let you create personal branches, fine, push to a colleague’s repo. This is decentralized version control, right? The act of pushing code has a way of clarifying thoughts and accelerating progress.
Don’t mix logic and formatting changes – I hate seeing commits where someone changed the system behavior and also decided to reformat a ton of code. It makes it so hard to see what is relevant to the behavior change vs what is noise. That said, a logic change could result in an automatic reformatting, e.g., due to a linter, code formatter, style cop, etc. That’s fine, great actually. More consistency! But just don’t go changing the linter settings and reformatting the whole codebase in the same commit where you fix a logic bug.
Don’t mix logic and refactoring changes – I also dislike seeing commits where someone changed logic and then decided to refactor, restructure, or clean up the code. I mean, I’m as guilty of this as anybody out there, and I love when developers have that drive to leave code better than they found it. But just like formatting changes, it can greatly complicate things if you need to track down a bug later. Refactoring is technically meant to change the internal structure of code without changing its external behavior. Combining those concerns is definitely not a focused commit.
Don’t mix logic and dependency changes – Hmm, notice a pattern here? Most devs know to isolate library and dependency upgrades in a separate commit or whole other branch. However, it can be so tempting. You’re fixing a bug and realize the new version of a library might help you. You pull in the upgrade to experiment, but it requires you to use new API methods. Fine! But that changes the whole shape of the integration. Now you have a commit with logic changes buried in the middle of the new API methods. Are you really going to stash your changes, do the upgrade first in a separate commit, then reapply changes? Probably not, if you’re under a tight deadline. This was supposed to be a quick bug fix. But now you’re left coming back to this commit in 6 months and having to spend way too long figuring out what actually changed and which part of the change is causing some new unforeseen consequence.
In general, that’s the goal behind all of these points. We want to avoid people on our team – including our future selves – looking at a commit and thinking, “What in the hell were they trying to do?”
Be careful following the items in this section. I know many excellent engineers who vehemently disagree with me.
Don’t squash commits – I don’t like squashing commits. I said it. I know some of you are thinking, “Whoa, you like small commits and not squashing them? WTF?” Yep! I get why people squash commits sometimes. If you’re in a super exploratory phase, experimenting with approaches, and then want to squash before opening a PR, I can definitely see the value in cleaning it up. However, in general, I’d much rather see the full history. I’d rather have the option of seeing the approaches they went through. I’d rather be able to use git bisect to see which commit introduced a change and hopefully a commit message about why it changed.
Merge commits are okay – Likewise, I don’t really mind seeing merge commits. I use the
--rebase option for pulls (actually
git config branch.autosetuprebase always). So I don’t have that many merge commits, but I don’t go out of my way to avoid them when it comes up, mostly around conflicts.
Detailed commit messages – This one could be a whole separate post, but I strongly believe in good commit messages. This dovetails with frequent small commits. It only takes a minute to write a sentence or two about a small commit. Trying to write a good commit message about an entire day of work is much more daunting. A good commit message should give a high level summary of the changes and perhaps explain why.
Bad commit message: “bug fix”
Adequate commit message: “fix default sorting on task list”
Good commit message: “Make task list sorted by most recently created, to match mockup.”
I don’t think having good commit messages is a contentious goal, but I do think there’s a lot of disagreement about what constitutes a good message.
“Any fool can write code that a computer can understand. Good programmers write code that humans can understand.” — Martin Fowler
Computers don’t care what our commit history looks like, but it affects the humans interacting with the codebase. I should be able to look at a commit and quickly understand what changed and what the goal was – even if I don’t immediately understand the full context of why this change helped achieve the goal.
I’m not a huge fan of inline code comments, mostly because they always get out of sync with the code logic. However, the way we shape our commits and the messages we write with them are powerful ways we can organically, incrementally capture the intent behind changes and the history of a system. They can’t get out of sync or stale, because they are temporally linked with the changes. I may have been mistaken or just plain stupid when I made the change, but it will always be true that – at that moment in time – that commit reflected my goal.
If you only remember one idea from this post it’s that how we commit code can be as important as what code we commit. Good programmers make commits that humans can understand.
Loved the article? Hated it? Didn’t even read it?
We’d love to hear from you.