Git bundle converts your whole repository into a single file kind of like webpack

WHAT

Pretend you just spent a few minutes, hours, or days trying something out, and now you want to get the project off your janky laptop’s hard drive that you just know is gonna die soon.

You’ve been tracking work with git locally because it is trivial to set up:

$ cd myproject
$ vi README # pretend this is your brilliant code.
$ git init
$ git add *
$ git commit -a -m "Let's get this party started"

Now you want a single file that has your whole project and all the commits you’ve made.

You can use git bundle for this! Here is how:

$ cd myproject
$ git bundle create myproject.bundle --all
$ scp myproject.bundle example.com:/tmp/

If it helps, you can think of git bundle as kind of like tar or zip or even webpack. Those are all things that convert a big tree of stuff and spit out a single doodad.

HOW

Here is how to make a single file with everything from all branches:

$ git bundle create myproject.bundle --all

Or you can make a single file (a bundle) that has only the master branch:

$ git bundle create myproject.bundle master

Or make one just with whatever branch you’re working in now:

$ git bundle create myproject.HEAD.bundle HEAD

Now move the bundle to a remote box via scp or rsync or whatever other method you want.

You might ask why you would use rsync or scp, because they both copy a file over a secure tunnel. The only advantage of rsync is that it checks if the file needs to be copied again:

$ rsync -e ssh --verbose myproject.bundle example.com:/tmp/myproject.bundle

sent 2,793 bytes received 35 bytes 377.07 bytes/sec
total size is 2,705 speedup is 0.96

$ rsync -e ssh --verbose myproject.bundle example.com:/tmp/myproject.bundle

sent 100 bytes received 59 bytes 16.74 bytes/sec
total size is 2,705 speedup is 17.01

See how the second time I ran rsync, it only sent 100 bytes? That’s because it tested if the version of myproject.bundle on example.com was out of sync with the one here. That can really, really help when you’re on a slow connection or working with big files.

Here is how to make a new repo based on that bundle:

$ ssh example.com
$ git clone -b master /tmp/myproject.bundle myproject2
$ cd myproject2

Pretty fresh, right?

Also, the list-heads command is pretty useful for spying on what is inside a bundle file:

$ git bundle create myproject.all-branches.bundle --all
$ git bundle list-heads myproject.all-branches.bundle
5702b7e5d8dd16839850e3fbad44ee69a9411586 refs/heads/master
82a0cd0d59b4929df8ff439cede8a33bbf850cfe refs/heads/more-docs
5702b7e5d8dd16839850e3fbad44ee69a9411586 HEAD

$ git bundle create myproject.master.bundle master
$ git bundle list-heads myproject.master.bundle
5702b7e5d8dd16839850e3fbad44ee69a9411586 refs/heads/master

$ git bundle create myproject.HEAD.bundle HEAD
$ git bundle list-heads myproject.HEAD.bundle
5702b7e5d8dd16839850e3fbad44ee69a9411586 HEAD

Unless you use --all, you won’t get all your branches in your bundle! Sometimes, that’s exactly what you want. But for rookies, usually, you’re just trying to ship everything.

WHY

First of all, you can’t beat how easy it is to make a bundle and ship it:

$ git bundle create myproject.bundle --all
$ scp myproject.bundle example.com:/tmp/

Second, sure, usually, I would make a new repository on some hosted service like github or bitbucket or gitlab. And I might also make a private repository on a box I rent from Linode (that link has my referral code) or AWS EC2 or Digital Ocean.

But maybe I’m in a coffeeshop with slow wifi, and my friend is sitting right next to me, and I want to share the code with him or her, and it seems crazy for us both to communicate by sending packets around the world.

Also, Using git bundle vs pushing to a remote repository ain’t an either-or thing!

There is nothing wrong with setting up a few cron jobs to run git bundle to create some bundle files and shove them to AWS S3 or dropbox or wherever, even though you’re still paying that exorbitant github bill.

Undo a fast-forward git merge

A friend has a fork of my project and sent me a pull-request. Without doing any eyeballing at all, I did this:

$ git fetch XXX
$ git merge XXX/master

Git ran a fast-forward merge and now all those commits are in my code. Then I ran some tests and KABOOM.

After spending a while digging around and seeing just how much needed to be fixed, I decided that it would be better to just send an email with the test errors back to my friend rather than try to fix them myself. I got my own work to do, after all.

So I sent the email, but now I had a git checkout with all those commits.

And since this was a fast-forward merge, git didn’t create a commit for the merge.

Incidentally, today I learned that you can force git to make an explicit commit for fast-forward merges like this:

$ git merge --no-ff XXX/master

I went to #git on irc.freenode.org, where I always go, and I explained the situation, and then was told to do these commands:

$ git reflog show master
$ git reset --keep master@{1}

And it worked! All the foreign commits are now gone.

But what just happened?

I’m not absolutely certain, but it seems like the git reflog show master command shows the changes applied to HEAD over time. This is what the top lines showed for me:

cce8252 master@{0}: merge XXX/master: Fast-forward
08c8f50 master@{1}: commit: Cleaning up the my-account pages
5526212 master@{2}: commit: Deleted a handler that was never getting used

This is different than git log. This is talking about the state of my local master branch over time.

Then, the next command git reset –keep master@{1} is telling git to reset the checkout to look like what master looked like one state in the past.

Like I said, I’m still not sure I understand this, but I plan to study it more.

Some notes from when I start using git

When I switched to git from subversion at my old business, I stored notes on how to do certain tasks. I’m pasting it below. Maybe some of this will help you out.

If you know a better way to do these tricks, please let me now.

My git diary
============

.. contents::

Slowly learning how to use git.

Typical upgrade flow
--------------------

I just committed a bunch of code and I want to see what the diffs were,
so I ran this::

$ git diff HEAD^

My production server runs branch 3.5.1 of my software. I want to start
work on a scary new feature that may take a long time, so I made a new
branch called 3.5.2 in my local sandbox::

$ git checkout -b 3.5.2

Now I can commit all my intermediate stuff in here.

Somebody found a bug in the production site (running 3.5.1) so I switch
to my 3.5.1 checkout::

$ git checkout 3.5.1
$ vi # fixing the problem
$ git commit -a -m "Fixed the prod bug"
$ git push origin 3.5.1

That last line sends my local commits to my remote bare repository.
That remote bare repository is on a box with an SSH server and RAID
storage.

Then I connect to the production box and pull down the most recent
changes::

$ ssh prod
matt@prod$ cd where-the-repo-is
matt@prod$ git pull origin 3.5.1
matt@prod$ restart-everything

That restart-everything command is a homemade script that does just what
you think it does.

Now back in my dev sandbox, I want to pull the bug fix from 3.5.1 into
my 3.5.2 branch. So I do it like this::

$ git checkout 3.5.2
$ git pull . 3.5.1

And now my 3.5.2 branch has that code in it. Hurray! Understand that
the dot (.) in git pull . 3.5.1 means that git should retrieve code from
this repository, not the fancypants remote one.

Pulling stuff
-------------

When I run::

$ git pull origin 3.5.1

That means to pull from the origin's 3.5.1 branch into whatever branch I
currently have checked out.

A typical day
-------------

I have two branches, an experimental branch and a public branch.
The production server used by customer runs the public branch.

Usually I work in my development branch. Sometimes I have to do some
code into production so this is how I do it::

$ git checkout stable # switch my local copy to the stable branch.
$ git pull origin stable # update local copy, just in case.
$ vi blah.py # do the bug fixes.
$ git commit -a -m "Notes about bug fixes"
$ git push origin # This is my deploy system.
$ git checkout dev # Go back to my development branch.
$ git pull origin stable # merge in that bug fix.

So far, this works well. I did something similar with SVN, and it also
worked fine. But git is way better at safely merging and it is much
faster.

Undo changes to a single file
-----------------------------

I typically edit a dozen files and then figure out that I want to undo
some stuff. If I did::

$ git reset --hard

Then my whole working copy would be destroyed. Usually, I just want to
do something like revert one file. So this is how::

$ git checkout that/particular/file

That deletes my working-copy changes just there. Everything else is
left as-is.

How to submit patches by email
------------------------------

I cloned the ditz project and tweaked the code, and I wanted to submit a
patch, so this is what they told me to do::

For this type of thing, it's "very simple". git commit -a, add a
one-line description followed by a blank line and more commentary, and
then git format-patch HEAD^.

When I ran git format-patch HEAD^, that produced a file on my local
machine that had a pretty formatted patch. Then I emailed that patch to
the list with some text.

How to branch from a remote repository
--------------------------------------

This is what somebody in the #git room told me to do, since I use a
remote bare repository::

$ git fetch origin
$ git checkout -b newbranch origin/original

Here's another person's opinion::

$ git fetch && git checkout --track -b mynewbranchagain remotename/mynewbranch

Set the remote branch
---------------------

Normally I have to pull from my remote repo and specify the branch I
want to pull in like this::

$ git checkout experimental
$ git pull origin experimental

It is possible to associate my local branch of experimental with that
remote branch of experimental like this::

$ git config --add branch.experimental.remote origin
$ git config --add branch.experimental.merge experimental

And now I can run::

$ git pull origin

without specifying the remote branch. In fact this works too::

$ git pull

Compare one file across two local branches
------------------------------------------

I want to look at a diff of x.py in my public branch vs my experimental
branch, so this is what I did::

$ git diff public experimental -- x.py

I can see everything different between the two branches like this::

$ git diff public experimental

Break up a whole bunch of changes into different commits
--------------------------------------------------------

I'm irresponsible about committing after each conceptual unit of work.
Lots of time, I'll edit a file to fix one bug, then while I'm in there,
I'll edit some other code because I see a better way to do something
else. Then I'll maybe add a few doctests to a completely different
section just because I want to.

After a few hours, typically inside of the same file, I have edits
related to multiple separate tasks.

Before I commit my changes, I run::

$ git add -p frob.py

Which walks through all the changes in that file and asks me if I want
to stage each one. In the first pass, I stage all the hunks related to
the first issue. Then I commit those changes. Then I rerun git add -p
frob.py again and march through the file for the next the second issue.

Keep in mind that I committed my changes after the first pass, so when I
go through the file the second time, I won't get prompted for those
changes.

This is one of those git features that you just couldn't do with svn.

Find the commits in branch A that are not in branch B
-----------------------------------------------------

Sometimes I'll patch a production bug in my production branch and then I
will forget to merge it into the development branch. This is how I can
check for that::

$ git checkout production_branch
$ git cherry dev_branch

This will spit out a list of commits that are in production_branch but
not in dev_branch.

It will not return any commits made to dev_branch but not in
production_branch.

See a file as of a point in time
--------------------------------

Looking at a commit shows the changes. Sometimes I want to see the file
itself.

Here's how to look at foo.py as of two commits ago::

$ git show HEAD^^:b/foo.py

Here's how to see what foo.py looked as of a particular commit::

$ git show bf51ebdbc:b/foo.py

b/foo.py is the path to the foo.py from the top of the repository. I'm
not certain, but I don't think the current working directory matters.

Copy a file from one branch to another
--------------------------------------

I committed a file into one branch then checked out a new branch. This
is how I copied that file into the new branch WITHOUT merging it in::

$ git checkout newbranch # 1
$ git checkout otherbranch foo.py # 2

Step 1 moves me into the newbranch. Step 2 gets me a copy of the file
foo.py from otherbranch and saves it into this branch named newbranch.

See a file in a different branch
--------------------------------

After I checkout the maxhenry branch I want to see the version of
printable.kid in the mayfield branch::

$ git checkout maxhenry
$ git show mayfield:bazman/bazman/templates/printable/printable.kid

Set a file to an old version of itself
--------------------------------------

Sometimes I want to get the version of a file as of a certain commit and
check that in. There are lots of ways to do this, including using git
revert or git reset, but I've had good luck with this approach::

$ git log # Use this to find the commit you want.

$ git checkout ac778cbb5517e1aeef446c9a8a1092eef81717fa:repo-top/a/foo.py

After you run this, the index will have this old copy queued up to be
commited. You can use git diff --cached to see the changes.

Create a new branch on the remote repo
--------------------------------------

We just pushed the branch **mollusk** up to production, so now it is
time to create a new branch named **nosehair**::

$ git branch nosehair # creates the new branch
$ git checkout nosehair # switches to the new branch
$ git push origin nosehair # pushes it up to origin

Switch to a new branch that already exists on a remote repo
-----------------------------------------------------------

After somebody else already created the branch nosehair, and pushed it
up to the remote repository, here's how to switch to work on the
nosehair branch::

$ git fetch origin
$ git branch -a # make sure origin/nosehair exists!
$ git checkout -b nosehair origin/nosehair

Search through commit messages
------------------------------

This is how I found all commits that had a commit message that included
the word CCPL::

$ git log --grep=CCPL

This is how I limited it to just my (Matt's) commits for CCPL work::

$ git log --grep=CCPL --author=matt --all-match

Without the --all-match option, you'll see all commits by matt or that
have CCPL in the commit message.

git cherry is neat

Sometimes I’ll patch a production bug in my production branch and then I will forget to merge that commit into the development branch. This is how I can check for that:

$ git checkout production_branch
$ git cherry dev_branch

This will spit out a list of commits that are in production_branch but not in dev_branch. It will not return any commits made to dev_branch but not in production_branch. It is not the same as a diff of the two branches either.

Break up changes into different commits with git add -p

This guy’s post led to this one.

I’m irresponsible about committing after each conceptual unit of work. Lots of time, I’ll edit a file to fix one bug, then while I’m in there, I’ll edit some other code because I see a better way to do something else. Then maybe I’ll add a few doctests to a completely different section because I feel like it.

After a few hours, I have edits in a single file that are related to multiple separate tasks. So back when I used svn, I would commit it with a message like “Fixed topics A, B, C”. Or I would say “Fixed A and a bunch of other stuff”.

Now with git, before I commit my changes, I run:

$ git add -p frob.py

Then git opens up an interactive session that walks through all the changes in that file and asks me if I want to stage each one. It is also possible to look at every change across a repository if you want to — don’t specify the file or files you want to see.

In the first pass, I stage all the hunks related to the first issue. Then I commit those changes. Then I repeat the process and stage chunks related to the next issue.

Keep in mind that I committed my changes after the first pass, so when I go through the file the second time, I won’t get prompted for those changes.

A real-world example

I’ve got two edits in mkinstall.py. One is a change to the list of files I want to ignore, and the other edit is a silly stylistic change. I want to commit them separately.

$ git diff mkinstall.py
diff --git a/mkinstall.py b/mkinstall.py
index 4c6bb4e..2cb43de 100644
--- a/mkinstall.py
+++ b/mkinstall.py
@@ -17,7 +17,8 @@ Otherwise, I'll add a symlink.
import os, shutil

# Anything you want to skip:
-skip_us = ["mkinstall.py", ".svn", "_vimrc"]
+skip_us = ["mkinstall.py", ".svn", "_vimrc", "diffwrap.sh", "lib",
+ "lynx_bookmarks.html", "ipythonrc-matt", ".git"]

# Anything you want to copy rather than symlink to:
copy_us = [".vim"]
@@ -60,7 +61,8 @@ for thing in copy_us:
if os.path.islink(homefile):
print "A symbolic link to %s exists already, so I'm not going to copy over it." % homefile

- elif os.path.exists(homefile): continue
+ elif os.path.exists(homefile):
+ continue

else:
svnfile = os.path.join(svnpath, thing)

This is what happens when I run git add -p:

$ git add -p mkinstall.py
diff --git a/mkinstall.py b/mkinstall.py
index 4c6bb4e..2cb43de 100644
--- a/mkinstall.py
+++ b/mkinstall.py
@@ -17,7 +17,8 @@ Otherwise, I'll add a symlink.
import os, shutil

# Anything you want to skip:
-skip_us = ["mkinstall.py", ".svn", "_vimrc"]
+skip_us = ["mkinstall.py", ".svn", "_vimrc", "diffwrap.sh", "lib",
+ "lynx_bookmarks.html", "ipythonrc-matt", ".git"]

# Anything you want to copy rather than symlink to:
copy_us = [".vim"]
Stage this hunk [y/n/a/d/j/J/?]?

At this point, I will hit y. Now that section of the file is staged to be committed. That is not the same as committing it.
Now git shows the next section of code that is different:

Stage this hunk [y/n/a/d/j/J/?]? y
@@ -60,7 +61,8 @@ for thing in copy_us:
if os.path.islink(homefile):
print "A symbolic link to %s exists already, so I'm not going to copy over it." % homefile

- elif os.path.exists(homefile): continue
+ elif os.path.exists(homefile):
+ continue

else:
svnfile = os.path.join(svnpath, thing)
Stage this hunk [y/n/a/d/K/?]?

I don’t want to stage this right now, so I hit n. That’s the last edit in the file, so the interactive session completes. Now when I run git diff –cached, which tells me what is about to be committed, look what I see:

$ git diff --cached mkinstall.py
diff --git a/mkinstall.py b/mkinstall.py
index 4c6bb4e..a348ee1 100644
--- a/mkinstall.py
+++ b/mkinstall.py
@@ -17,7 +17,8 @@ Otherwise, I'll add a symlink.
import os, shutil

# Anything you want to skip:
-skip_us = ["mkinstall.py", ".svn", "_vimrc"]
+skip_us = ["mkinstall.py", ".svn", "_vimrc", "diffwrap.sh", "lib",
+ "lynx_bookmarks.html", "ipythonrc-matt", ".git"]

# Anything you want to copy rather than symlink to:
copy_us = [".vim"]

So now I’ll commit this edit with an appropriate remark:

$ git commit -m "Added some more files to the list of files to be skipped"
Created commit ce0478d: Added some more files to the list of files to be skipped
1 files changed, 2 insertions(+), 1 deletions(-)

Now I’ll view the unstaged changes again in my file, and notice that the other change still remains:

$ git diff mkinstall.py
diff --git a/mkinstall.py b/mkinstall.py
index a348ee1..2cb43de 100644
--- a/mkinstall.py
+++ b/mkinstall.py
@@ -61,7 +61,8 @@ for thing in copy_us:
if os.path.islink(homefile):
print "A symbolic link to %s exists already, so I'm not going to copy over it." % homefile

- elif os.path.exists(homefile): continue
+ elif os.path.exists(homefile):
+ continue

else:
svnfile = os.path.join(svnpath, thing)

At this point, I can rerun git add -p and stage up more stuff to be committed. In this case, it is more realistic that I would run

git commit -a -m "Made a silly style change"

That will stage and commit that last edit in one swoop.