Git Things

This is how I wish someone had explained git to me when I was first learning. (And a bit of a cheatsheet)

What is git?

Most people say Git is a version control system (VCS). However, it is more than that. Git is an object storage and tracking system.

Moreover, Git is a distributed object storage and tracking system. The practical upshot of this is that the entire repository history is immediately and locally accessible to anyone who has a copy (i.e. git clone) of the repository. The downside is that it can take a while to clone the repository for the first time.

What is a git repository?

A folder with a special .git/ directory in it. (Technically, you can also have a bare repository, which means that the root folder of the repository is itself the .git/ directory.)

Quick Definitions

  • Working tree: The actual files and directories on your file system. The working tree is what you’re editing when you open a file in your editor.
  • Index (or cache): Things which have been added (git add) in preparation for a commit to be made.
  • HEAD: The last thing you checked out (git checkout) or committed. After a fresh clone, this is usually the same as the master branch. In more technical terms, this points to whichever commit last updated the working tree.

What objects does Git store?

Git has many types of objects. Some are directly analogous to concepts in other VCS like SVN, but some are not.

  • Commit: A particular point in history, which stores the changes from previous and future commits. It also stores metadata, like a commit message, a committer, an author (often the same as the committer), an optional GPG signature, etc. Analogous to an SVN revision. Importantly, each commit also stores the identifier of its parent (the commit which was HEAD before the new commit was generated).
  • Merge commit: a commit which has more than one parent, typically created when merging branches together. These commits store the identifier of both parents.
  • Fast-forward merges: if one of the commits to be merged is a direct parent of the other, Git will instead ‘fast forward’ the HEAD and branch to the child commit, without creating a merge commit.
  • Most commands which accept a commit will also accept a branch name (meaning the latest commit on that branch), a tag name (meaning that tag’s commit), etc. In many cases, you can omit the commit entirely to use HEAD.
  • Tag: A pointer to a commit. Tags can be ’lightweight’ (effectively an alias for the commit), or ‘annotated’ (which can have its own tag message, signature, etc.).
  • Branch: A separate line of commits, which may or may not be interrelated with other branches. A branch is a name which points to the latest commit on that branch. Branches may exist in one or more remotes, as well as the local repository. The local copy of a branch can store which branch on which remote is associated with it (called ‘remote-tracking branches’), to be used as a default for git pull/git push.
  • Remote: A name another copy of the repository. A remote may in fact be stored on the local system (another directory), or more typically on a remote system like GitHub (usually via SSH or HTTP). The set of remotes is a part of your local configuration - it is not synchronized with other repositories, unlike the other objects.
  • Your local copy of the repository also stores some information from the last time you ran git fetch on each remote, like the current state of that remote’s branches.

Common Git actions

  • git status: a very useful command which will show you the current status: what branch you’re on, how any remote-tracking branch (which you have fetched) has diverged from it, changes in the working tree, changes in the index, the current state of a merge with conflicts, etc. Usually, it will also show you tips on what to do about anything it says.
  • git add <files>: add changes to the index.
  • git commit: create a new commit whose parent is the current HEAD, with any changes in the index.
  • git push [<remote>]: push the current state of the current branch to the specified (or a default) remote branch.
  • git fetch [<remote>]: refresh the local copy of the current status of a remote, and fetch any new commits which have been added to that remote. This does not update any local branches, nor the index or working tree.
  • git merge <commit>: merge a commit (which is often specified indirectly, i.e. by a branch name) into the current HEAD.
  • git pull [<remote>]: git fetch + git merge
  • git checkout <branch>: update HEAD and the working tree to point to the latest commit on the specified branch.
  • git checkout <commit>: enter ‘detached HEAD’ mode, and update HEAD and the working tree to point to the specified commit. I usually only find this useful to examine the state of the repository as of a particular commit. You can also make commits in this mode, and then either discard them or save them to a new branch, though.
  • git tag <tagname> [<commit>]: create a new tag from the specified commit. Depending on the options you specify, this can be either a lightweight or annotated tag.
  • git bisect ...: helpful to track down what commit introduced an error. You specify two commits to start with (one without the bug, one with the bug), and git will help you do a “binary search” through commits until you find the one responsible.
  • git log [<commit>] [<files>]: view metadata about the specified commit (like its identifier, any branches which currently point to it, the author, the commit message, etc), and each of its parents. If you specify files, only commits affecting one of those files are shown.
  • git show [<commit>]: view metadata about the specified commit, as well as the diff between the commit and its parent.
  • git diff:
    • git diff: show the difference between the index and the working tree. (These are the changes which you could git add.)
    • git diff --cached: show the difference between HEAD (changes that have been committed) and the index. (These are the changes which you have already git added.)
    • git diff <commit>: show the difference between a commit and the working tree. (All changes made, both those committed and in working tree, since the specified commit.)
      • git diff HEAD: show the changes between the latest commit and the working tree.
    • git diff <commit> <commit>: show the difference between two arbitrary commits. git diff <commit>..<commit> means the same thing, but you can omit either side to default it to HEAD.
    • git diff <commitA>...<commitB>: calculate the most recent common ancestor of both commits, and then show the difference between that and commitB. This will essentially show you what changes would be applied if you merged commitB into commitA. Again, you may omit either side to default to HEAD.