Commit like a librarian.
β‘ TL;DR
Focused, coherent commits are a gift to the Engineers of Tomorrow, human and agent alike, who have to understand, maintain, and extend your code.
Unfortunately, agents don’t work that way by default. They happily git add -A everything into dumping-ground megacommits.
Skills can retrain your agents, and a few small git scripts help the human side too.
This article covers both:
- Three small shell scripts for you:
git addgrep,git restage, andgit post-fmt - And two agent skills for your robot:
gitplanandstage
π€¦ Why this matters
Agents are slow: tens to hundreds of minutes to crank out a single change. I suspect most developers run concurrent sessions to avoid the idle time, going from one squawking tmux session to the next like a parent bird working a nest of open-mouthed chicks, each demanding a permission prompt or a (hopefully) clarifying question in no particular order.
Some wrapper tooling exists to make git worktree less bothersome, but many engineers just yolo-interleave all of these changes into the same tree. It reduces integration friction at the cost of commit hygiene.
Run git log on most vibe or vibe-adjacent repos and you’ll find a mess of themes and purposes, with messages like fixed tests, made changes, or misc cleanup. It’s a junk drawer: things went in because they were lying around, not because they belong together, and now following a single theme through the history is hopeless.
Four readers pay for it later:
The PR reviewer. You should be stoked if you can get 5 minutes of solid concentration from a coworker to review your PR. A focused diff has a chance of getting read and understood. The first diff your reviewer sees that doesn’t match the TLDR of your PR gets a WTF. The second unrelated diff makes the PR feel like the untidy slop pile that it is, inspiring dread and a -1, do not ship. Review quality degrades super-linearly with diff size.
The bug-hunter, mid-git bisect. Bisect is the most underrated debugging tool in git, and it only works when commits are atomic. When the offending commit is “fix tests, update importer, misc cleanup,” bisect tells you the haystack, not the needle.
The changelog writer. Release notes drafted from focused commits write themselves. Release notes drafted from junk drawers require re-reading the diffs, which means they don’t get written; you get “various fixes and improvements” instead.
Future you, holding git blame. When a line’s last touch is a megacommit, the why is gone. You’re left reverse-engineering intent from a diff that changed 40 files for six reasons.
None of this (should be) news to any experienced developer.
What’s new is that agents made the tangled working tree the default rather than an occasional lapse: several themes of change, interleaved at once, every session. The discipline that used to be a nicety is now table stakes, and it needs tooling, because nobody curates hunks by hand under deadline.
π§° Three scripts for curating the index
Git has a pleasant extension mechanism: any executable named git-foo on your PATH becomes git foo. Drop these three in ~/bin (or wherever you keep scripts), chmod +x them, and they’re git subcommands.
I reach for the first two almost daily.
π git addgrep: stage every change matching a regex
Renamed a function? A file? A config key? The rename touched 23 files, but so did the feature you’re also working on. You want to commit the rename by itself.
git addgrep oldFunctionName stages every hunk whose diff mentions the pattern, and nothing else:
#!/bin/bash -e
# git addgrep: stage changes that match a regex pattern.
#
# Renamed a file, function, or method? Use this to stage every change
# that mentions it, and nothing else.
#
# With patchutils installed, staging is per-hunk. Without it, this falls
# back to staging whole files that match.
# Ubuntu: sudo apt install patchutils
# macOS: brew install patchutils
#
# Usage: git addgrep <pattern>
if [ -z "${1:-}" ] || [ -z "$(echo "$*" | tr -d '[:space:]')" ]; then
echo "Usage: $(basename "$0") <pattern>" >&2
exit 1
fi
command -v grepdiff >/dev/null || HUNKS=0
if [ "${HUNKS:-1}" == 1 ]; then
echo "(staging matching hunks)"
git diff -U0 | grepdiff -E "$*" --output-matching=hunk |
git apply --cached --unidiff-zero --allow-empty
else
echo "(staging matching files)"
# -S: changes that alter the number of occurrences of the pattern
# shellcheck disable=SC2046
git -c advice.addEmptyPathspec=false add --ignore-errors -- $(git diff -S"$*" --name-only)
# -G: changes whose added/removed lines match the regex
# shellcheck disable=SC2046
git -c advice.addEmptyPathspec=false add --ignore-errors -- $(git diff -G"$*" --name-only)
fi
The interesting part is the pipeline: git diff -U0 emits zero-context hunks, grepdiff (from patchutils) keeps only the hunks matching your regex, and git apply --cached applies that filtered patch directly to the index. It’s git add -p, if git add -p took a regex and didn’t make you answer y/n/s/e 23 times.
Without patchutils it degrades to whole-file staging via git diff -S/-G, which is still better than picking filenames out of git status by eye.
π git restage: refresh what’s already staged
You’ve staged the files for a commit. Then the pre-commit formatter runs, or you delete a debug line, or you add one more test. Now the working tree has drifted from the index and git status shows the same files as both staged and modified.
git restage re-stages the already-staged files, and only those:
#!/bin/bash
# git restage: re-stage all currently staged files.
#
# Useful when you've staged a commit, then edited those same files
# (prettier, delinting, deleting debug statements, one more test...)
# and want the staged copy to catch up without touching anything else.
#
# Usage: git restage
cd "$(git rev-parse --show-toplevel)" || exit 1
git diff --name-only --cached -z | while IFS= read -r -d '' file; do
echo "git stage $file"
git stage "$file"
done
One caveat: it stages the whole file, so any partially-staged file loses its hunk selection.
π§Ή git post-fmt: split formatter noise into its own commit
You ran Prettier (or shfmt, black, go fmt ./..., or cargo fmt) across the repo and now half your real diff is line-wrapping churn. Reviewers hate this, and they’re right to: formatting noise buries the changes that matter.
git post-fmt stages only the files whose changes are formatting-only, so you can commit chore(fmt): apply prettier first and leave your semantic changes for a focused commit:
#!/bin/bash -e
# git post-fmt: stage files whose only changes are formatting, so you can
# land them as `chore(fmt): ...` and keep your next commit semantic.
#
# For a file with a known formatter, the test is exact: reformat both the
# HEAD version and the working copy with that formatter, then compare. If
# the two canonical forms match, every difference between them is something
# the formatter rewrites, which is the definition of a formatting-only
# change. This is a proof, not a guess, for any deterministic formatter.
#
# Files with no configured formatter fall back to a normalization heuristic
# (strip the whitespace and punctuation formatters shuffle around, then
# compare). That part catches cases `git diff -w` misses, like content
# reflowing across lines, but it is a heuristic, not a proof.
#
# The per-file check is read-only, so it fans out across cores with xargs;
# only the final `git add` runs serially, because concurrent adds fight
# over .git/index.lock.
#
# Usage: git post-fmt
# Capture our own path before cd, so workers can re-invoke us.
self="$(realpath "$0")"
cd "$(git rev-parse --show-toplevel)" || exit 1
# Map a path to a formatter that reads stdin and writes stdout. Add your
# project's tools here, and point them at the same binaries your pre-commit
# hook and CI use, so "canonical" means the same thing everywhere. gofmt
# and rustfmt are the stdin engines behind `go fmt` and `cargo fmt`.
formatter_for() {
case "$1" in
*.ts | *.tsx | *.js | *.jsx | *.mjs | *.cjs | *.json | *.css | *.scss | *.html | *.vue | *.md | *.yaml | *.yml)
echo "prettier --stdin-filepath=$1" ;;
*.go) echo "gofmt" ;;
*.rs) echo "rustfmt --emit=stdout" ;;
*.py) echo "black --quiet -" ;;
*.sh | *.bash) echo "shfmt" ;;
*.c | *.cc | *.cpp | *.h | *.hpp | *.java) echo "clang-format --assume-filename=$1" ;;
*) echo "" ;;
esac
}
# Fallback for files with no formatter: delete whitespace and the
# punctuation formatters move around, so two versions compare equal iff
# they differ only in formatting. Reads stdin.
normalize_code() {
tr -d '[:space:]' |
tr -d ';,' |
sed 's/[][(){}]//g' |
sed "s/'/\"/g"
}
# True if the HEAD->working change for $1 is formatting-only.
format_only() {
local file="$1" fmt old new
fmt=$(formatter_for "$file")
if [ -n "$fmt" ] && command -v "${fmt%% *}" >/dev/null; then
# Exact: compare the formatter's canonical form of each version. Skip
# the file if either reformat fails (e.g. a mid-edit syntax error).
# shellcheck disable=SC2086
old=$(git show "HEAD:$file" 2>/dev/null | $fmt 2>/dev/null) || return 1
# shellcheck disable=SC2086
new=$($fmt <"$file" 2>/dev/null) || return 1
[ "$old" = "$new" ]
else
# Heuristic fallback for unmapped file types; skip binaries.
case "$(file --mime "$file" 2>/dev/null)" in
*charset=binary*) return 1 ;;
esac
[ "$(git show "HEAD:$file" 2>/dev/null | normalize_code)" = "$(normalize_code <"$file")" ]
fi
}
# Worker mode: read-only check of one file, print its NUL-terminated path
# iff the change is formatting-only. Safe to run many at once.
if [ "${1:-}" = "--check" ]; then
file="$2"
[ -f "$file" ] || exit 0 # skip deletions
if format_only "$file"; then printf '%s\0' "$file"; fi
exit 0
fi
# Check every changed file in parallel across cores, then stage the matches
# in one batched `git add`. getconf is portable across Linux and macOS;
# nproc is not.
jobs=$(getconf _NPROCESSORS_ONLN 2>/dev/null || echo 4)
staged=()
while IFS= read -r -d '' file; do
staged+=("$file")
done < <(git diff --name-only -z | xargs -0 -P"$jobs" -n1 "$self" --check)
if [ "${#staged[@]}" -gt 0 ]; then
git add -- "${staged[@]}"
fi
echo "Staged ${#staged[@]} files with formatting-only changes"
for file in "${staged[@]}"; do
echo " git add $file"
done
The formatter_for table is where you add languages: map an extension to any formatter that reads stdin and writes stdout, and the file gets the exact, proof-based test instead of the heuristic. The defaults cover Prettier, gofmt, rustfmt, black, shfmt, and clang-format; gofmt and rustfmt are the stdin engines behind go fmt and cargo fmt.
The exact test costs two formatter runs per file, which adds up: a 300-file sweep is 600 launches, and that’s brutal for interpreter-based tools like Prettier or Black where each cold start is half a second. So the check runs xargs -P across every core and stages the survivors in one batched git add. Native formatters (gofmt, shfmt, clang-format) clear hundreds of files in seconds; for a big Prettier repo, point the table at a resident daemon like prettierd to skip the Node startup tax on every file.
Two honest caveats remain. The formatter has to be installed and deterministic, or the file falls back to the heuristic. And dispatch is by extension, so the table has to point at the same binary (and config) your editor and CI use, or you’ll be comparing against a different canonical form. For unmapped file types you’re back to the old normalization heuristic, which deletes all whitespace, commas, semicolons, and brackets before comparing, so a change like f(a, b) β f(ab) can slip through; skim git diff --cached before committing those.
π€ Teaching the robot the same discipline
The scripts above are for your hands. Your agent needs the same discipline, stated as instructions it can follow, because left to its defaults it does the opposite: Claude will happily back the truck up and git add -A a tree that still holds your half-finished work from yesterday.
I publish two skills for this in my claude-code-skills plugin marketplace:
/plugin marketplace add photostructure/claude-code-skills
/plugin install coding@photostructure
π The stage skill
/coding:stage commits the current session’s work, and only the current session’s work. The skill makes Claude:
- Inventory its own
EditandWritetool calls from the conversation - Classify every changed file: all hunks mine, some hunks mine, none mine
- Partial-stage the mixed files
Step 3 is the part worth stealing. Interactive git add -p doesn’t work for agents (no tty), so the skill has Claude export the diff to a patch file, delete the hunks that aren’t its own, and apply the remainder with git apply --cached --recount. For heavily entangled files there’s a last-resort strategy: check out the HEAD version to /tmp, replay only the session’s edits onto it, and diff that. The staged result provably contains the session’s work and nothing else.
The skill ends with hard rules, because agents respect walls better than suggestions:
git add ./-A/--all: forbidden- Any staged line the agent can’t trace to one of its own edits: unstage it
- Commit message over 10 lines: the commit does too much, split it
- No commit without explicit user approval
π§© The gitplan skill
/coding:gitplan is for the aftermath: a tangled tree of 15+ files from several work streams that all needs to land. Claude scans the full diff, proposes themes (each with a single coherent purpose; “misc cleanup” is explicitly banned), and then walks one theme at a time through stage β review β commit, lowest-risk first.
Each theme gets reviewed before it’s committed (using the proof-based review approach so the review finds bugs instead of generating noise), and docs or plan files get bundled with the code they describe rather than dumped into a “docs” commit at the end.
βοΈ Commit messages: the why, in under 10 lines
Curating what’s in the commit is most of the battle. The message is the rest, and two rules cover it:
The diff already shows the what. Write the why. “Renamed FooManager to FooService” is visible in the diff. “FooManager collided with the new plugin API’s reserved names” is not, and it’s the only part the four readers need.
Ten lines, total. Subject, blank line, body, trailers. If the explanation doesn’t fit, the commit does too much; go back and split it. This rule is, dare we say, load-bearing1: it converts “my message is long” into a mechanical signal that the commit is wrong, which is the actual problem.
Git trailers are the structured part of the why, and they cost one line each:
Reported-by: Jane Doe
Link: https://forum.example.com/t/12345
Fixes: #678
Use git commit --trailer "Reported-by=Jane Doe" (git β₯ 2.32) so the formatting stays canonical, and repeat the trailer for multiple values rather than comma-separating.
π Related reading
- Most AI code reviews are noise. Here’s how to fix that. covers the
reviewandreview-stagedskills thatgitplanleans on. - Claude Code has amnesia. So do PRs, changelogs, and your future self. is the same argument, one level up: TPPs are to projects what focused commits are to diffs.
You’re absolutely right to call this out as a “belt and suspenders” rule. ↩︎
