for Astrophysicists
by Henriette Wirth
1 Installing and setting up git
To install git please use the guide for your OS on:
https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
After installing git you need to set your name and email address:
$ git config --global user.name "Jean-Luc Picard"
$ git config --global user.email jpicard@starfleet.ufp
2 Working with repositories
2.1 Initializing git
Imagine a visitor from Betelgeuse was coming to Europe and wants to know about the
local animals. So you are tasked with creating a database for them containing all the
important animals with their properties, but of course you want to be able to backup
your progress on this project. So the first thing you do when starting the project is to
initialize git in your working directory.
$ git init
After this command a folder with the name ‘.git’ should appear in your working direc-
tory.
2.2 Moving files between your Working directory and the local repository
git has typically three places, where your data is stored. One is the working
directory, this is the directory that contains the files you are actively modifying.
The local repository is contained in the .git directory and it contains a local copy
of your repository. The remote repository is a copy of your repository on a server.
Multiple remote repositories can exist for one project.
At the moment all three locations are empty. So let us create the first files for our
project:
1
cat.txt
# cat
- pointy ears
- fur
- retractable claws
dog.txt
# dog
- four legs
- tail
- many sizes
These files exist now in the working directory, but not in the local or remote repository.
So let’s look at how we can move them to the local repository. To move files to the local
repository we go over a virtual space called the staging area.
working
directory
staging
area
local
repository
add
reset
commit
checkout
To add a file to the staging area we use the command ‘git add’. Let’s add the file
‘cat.txt’ to the staging area:
$ git add cat.txt
We can check our staging area using the command ‘git status’. If our file was added
correctly the output should look like this:
$ git status
On branch master
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: cat.txt
Untracked files:
(use "git add <file>..." to include in what will be committed)
dog.txt
This output tells us that we are working on the branch ‘master’, which is the default
2
branch. We will learn how to use branches later. It also tells us that the file ‘cat.txt’ is
currently staged which means that it is going to be added to the local repository during
our next commit. The file ‘dog.txt’ is currently which means that it will not be added
to the local repository. If we decide that we do not want to move ‘cat.txt’ to the local
repository after all, we can remove it again from the staging area using ‘git remove
cat.txt’.
Files can also be added and removed using the wildcard ‘*’. For example ‘git add *
´ adds all files from the working directory to the staging area, while ‘git add *.txt’
adds all files ending with ‘.txt ’.
In a lot of projects there are files that we do not ever want to commit. In software projects
this might be the compiled program and the ‘.o’ files generated during compilation,
in Latex projects it might be the resulting ‘.pdf’ file and the ‘.aux’, ‘.log’, ‘.out’,
and the ‘.synctex.gz’ files. These files can be excluded by creating a textfile named
‘.gitignore’ on the top level of the working directory. Wildcards can also be used in
the ‘.gitignore’ file. This would be an example of the ‘.gitignore’ file for a c++
project that is compiled into the file ‘prog’:
prog
*.o
Feel free to experiment with different ‘.gitignore’ files for our animal project. After
staging all the files we want to add to our local repository, we can commit the files using
‘git commit -m "<comment>"’. Let’s for example move all of the files in our current
project into the local directory using ‘git add *’ and then ‘git commit -m "added
cats and dogs"’.
To check your local repository we can use the simple command ‘git log’, but I recommend
a program with a gui like gitg. On Ubuntu and its derivatives you can easily
install it using:
$ sudo apt install gitg
After the installation it can be called as ‘gitg’ from your local working directory. If
run it on the current project the result should look something like in Fig. 1. On the
left section of the screen we see, that we have currently one local branch called master
(again, we will discuss branches later) and no remotes or tags (we will add those later
too). On the top right we see the commit we just did including the comment we added,
who was committing it and when and what we call the hash. The hash is a unique
40-digit combination of letters and numbers that git assigns to each commit. It is not
relevant for us right now, but it might be useful later if we want to go back to this
specific version of the project. Below is a list of files that were changed and the changes
3
Figure 1: the first commit
made to them. In our case this is the first commit, so we only added information. On
your computer, the contents of the local repository are stored in the ‘.git’ folder, but
luckily you will never have to look inside it.
Now that our files are in the local repository, we can always get our files back from the
local repository if they happened to be lost. Feel free to delete the files and then get
them back using:
$ git checkout master <file>
$ git checkout master *
The first command only allows you to checkout specific files, while the second one checks
out all the files on the master.
For completeness let us see, what happens if we change our files and commit those
changes. Let’s change the file ‘dog.txt’ and add ‘horse.txt’
dog.txt
# dog
- four legs
- tail
- many sizes and colours
horse.txt
# horse
- big mammal
- three legs
- mane
4
Let’s commit these changes:
$ git add *
$ git commit -m "appended dogs and added horses"
[master 14d42ee] appended dogs and added horses
2 files changed, 7 insertions(+), 1 deletion(-)
create mode 100644 horse.txt
After executing this command let’s have another look at gitg. It should now look like
in Fig. 2. On the top-right we see the history of commits on the branch ‘master’. git
repositories always store the history of all your projects so you can go back any time to
see what you have done. On the bottom right we see the changes committed. As the
unit git works with is the line adding the words ‘and colours’ to the file ‘dog.txt’ leads
to the deletion of the line ‘-many sizes’ and the insertion of the line ‘- many sizes
and colours’. The six lines in the file ‘horse.txt’ are all counted as insertions, which
leads to the 1 deletion and 7 insertions mentioned in the terminal output.
Figure 2: the second commit
2.3 Branches and Detached Heads
Our alien visitor decides that they want to expand their trip to Australia. So we bring
in an expert on Australia to add their local species. But of course we want to continue
adding European animals at the same time. So how can we both work on the same
project? The answer is to give everyone their own branch. So we let our Australian
colleague create their own branch using:
5
$ git branch Australia
Looking at gitg in Fig. 3, we see that a new branch named ‘Australia’ is listed on the
left side and from the history on the right side we see that it is identical to the original
‘master’. However, the check mark on the left tells us, that we are still following the
master branch, which means that all changes we commit will still end up on the ‘master’
and everything we checkout also comes from there.
Figure 3: creating a new branch
Therefore, our Australian colleague must switch branches using:
$ git checkout Australia
Note that checkout on a branch automatically extracts the files from the local
repository to the working directory. Since ‘master’ and ‘Australia’ are identical,
it won’t make a difference in this case, however, when switching branches later on
this can significantly change the content of the working directory.
The two command above can also be combined to one:
$ git checkout -b Australia
This creates a new branch and checks it out right away. A list of available branches can
be viewed using ‘git branch’.
Now he adds two files:
kangaroo.txt
# kangaroo
- strong hind legs
- likes to jump
- big ears
swan.txt
# swan
- bird
- black feathers
- likes to swim
6
In the meantime we keep working on the ‘master’ and also add two animals:
cow.txt
# cow
- has horns
- likes grass
- is not naturally purple
swan.txt
# swan
- bird
- white feathers
- likes to swim
Looking at the result in gitg gives:
Figure 4: creating a new branch
As we can see in Fig. 4 two different versions of the project exist now on two different
branches. However, the alien visitor does of course only want to read through one
database. Therefore, let us merge the Australia branch into our master branch. The
command for this is:
$ git merge Australia
Auto-merging swan.txt
CONFLICT (add/add): Merge conflict in swan.txt
Automatic merge failed; fix conflicts and then commit the result.
This resulted in an error message. git is very good at merging automatically if files are
identical on both branches or exist only on one branch. However, in our case the file
‘swan.txt’ exists on both branches with conflicting information. Therefore git copied
the file ‘kangaroo.txt’ over from the branch ‘Australia’ as it is since it is not conflicting
with anything. However, it modified the file ‘swan.txt’ in the following way:
7
swan.txt
# swan
- bird
<<<<<<< HEAD
- white feathers
=======
- black feathers
>>>>>>> Australia
- likes to swim
Note that git leaves the lines identical as they are and only highlights the lines that
differ from each other for us. We can now change this file into what we want it to look
like in the final project and commit it.
However, when dealing with large files it can also help to use a merge-tool. One such
tool is meld, which can be installed using:
$ sudo apt install meld
After installing meld we can use it as a mergetool, using the command:
$ git mergetool -t meld
This will open meld looking like Fig. 5. The window is divided in three parts: the left
part shows the state of the file as it is on the ‘master’ branch that we want to merge
into, the right part shows the version of the file on the branch ‘Australia’ that we want
to merge into the ‘master’ and the middle shows the final result as it will be in the
merged version.
Figure 5: meld before the merge
Currently, the middle part is empty. So let’s click on the black arrow between the left and
the middle part to move the version currently on the ‘master’ into the middle section.
The result can be seen in Fig. 6. We see that meld highlights the differences between
different files for us. If a line differs between two files it is highlighted in blue, while
8
a line that only exists in one of the files is highlighted in green. We can now use the
arrows to move changes into the resulting file or we can edit it manually.
Figure 6: meld after importing the version on the ‘master’ branch.
Let’s edit the file as seen in Fig. 7 and then save our changes using the button above
the document. After we close meld we will notice that the file ‘swan.txt’ now matches
the file we created in the middle of the meld window. This method of merging files is
especially useful for large files with many changes, where it is easy to loose track of the
differences.
Figure 7: meld after importing the version on the ‘master’ branch.
We can now commit our merge using:
$ git commit -m "merged Australia into master"
In Fig. 8 we can see the result. We can easily follow the history of the ‘master’ through
both the ‘master’ branch itself and through the branch ‘Australia’. We also see that
the merger did not affect the branch ‘Australia’, so if we were to check it out again, we
would end up with the same version of it as we were before merging it into the master
and could continue working on it.
We can also set meld as the default mergetool using:
$ git config --global merge.tool meld
$ git config --global mergetool.prompt false
$ git config --global mergetool.keepBackup false
The first line sets the mergetool, the second one skips the confirmation dialogue before
using the tool and the third one prevents git from saving a backup file after merg-
ing.
9
Figure 8: gitg after merging ‘Australia’ into ‘master’
Similarly we could also set meld as a difftool to show us the differences between different
commits:
git config --global diff.tool meld
git config --global difftool.prompt false
This will make meld show the differences between two commits when prompted us-
ing:
git difftool
We previously only checked out branches, but we can also check out commits based on
their specific hash. If for example we wanted to check out the initial commit again this
leads to:
$ git checkout ae5d7d4c44b6d442cacd8f801de77711339fb6a6
Note: switching to ’ae5d7d4c44b6d442cacd8f801de77711339fb6a6’.
You are in ’detached HEAD’ state. You can look around, make
experimental changes and commit them, and you can discard any
commits you make in this state without impacting any branches by
switching back to a branch.
If you want to create a new branch to retain commits you create, you
10
may do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch Turn
off this advice by setting config variable advice.detachedHead to
false
HEAD is now at ae5d7d4 added cats and dogs
As we are not at the ‘head’ of a specific branch, we cannot commit any changes in
this state, but we can create a new branch using ‘git branch’ and make commits onto
that.
2.4 Cherrypicking
Our visitor decides that they also want to see Africa. So we get an expert for Africa on
board who makes commits visible in Fig. 9.
Figure 9: gitg with information about Africa
What is that? They found a bug in our old version. We accidentally wrote that horses
have three leg and not four! Now, we do of course want to fix this on the ‘master’ too,
but we do not want to merge their half-finished work yet. We could just make the same
changes they did to their files, but we also might think of using a feature of git called
cherry picking. Cherry picking means copying the changes of one or multiple commits
11
onto another branch. The command for this is:
git cherry-pick e12cb18ecc2004fc5e2ee90781c92e9e95703687
where the number at the end is the hash we copied out of gitg. The result can be seen
in Fig. 10.
Figure 10: gitg after cherrypicking
2.5 Tags
At some point in our project we want to publish the first official version. This version
shall be marked using a tag.
A tag is a fixed name for a given commit in our repository which can be used to
find that commit more easily.
We are going to use semantic versioning1 for our tags and therefore call the first one
‘1.0.0’. To create the tag we use:
git tag 1.0.0
The result can be seen in Fig. 11. We can clearly see the tag 1.0.0 next to the head of
‘master’. Unlike the branches, this tag cannot be changed any more. We can ask git to
list all existing tags in terminal using git tag.
1
Semantic versioning is a versioning concept that involves three numbers: major.minor.bugfix separated
by dots. The bugfix is incremented, if the change from the previous tag merely involve a bugfix, the
minor if new features were added and the major in case of an interface change.
12
Figure 11: gitg after tagging
We can also create annotated tags. In this case the above command would change
to:
$ git tag -a 1.0.0 -m "some comment about this tag"
3 The remote repository
So far we have only worked on the local repository, but eventually we want to backup our
project on a server and share it with our colleagues. For this we must create a remote
repository. The easiest way to create a remote repository on a server is to first create it
locally and then move it to the server using scp. To create the repository locally let’s
just move one level above the working directory and use the following command:
$ cd ..
$ git clone --bare exampleWorkDir/ repo.git
The command ‘git clone’ copies the content from the folder ‘.git’ into the folder
‘repo.git’. The ‘--bare’ defines that no working directory is to be associated with this
repository. Now that we have our remote repository we can move it to the server, where
it ultimately shall exist.
$ scp -r repo.git user@server:/path/on/server/repo.git
We can delete ‘repo.git’ on our machine after this is done. Now we want to tell git,
where to find the remote repository. Therefore, we enter the following command into
our terminal from the working directory:
$ git remote add origin user@server:/path/on/server/repo.git
This adds a remote named ‘origin’ with the path ‘user@server:/path/on/server/repo.git’
to the known repositories. A list of remotes can be viewed using:
13
$ git remote
$ git remote -v
with the first only showing the names and the second giving additional information like
the paths. Now the remote repository is connected, but looking into gitg we find that
we do not see any branches from the remote yet. To connect find the branches we have
to update our information on the remote using:
$ git fetch
In Fig. 12 we can see that all the branches that we have in our local repository are also
in the remote. If we make changes to out local repository we can use ‘git push’ to copy
changes on the current branch to the local repository.
Figure 12: the remotes visible in gitg
The first time we use this command we will have to use:
$ git push --set-upstream origin master
to clarify that we want our local ‘master’ to track the ‘master’ on the remote repository.
To push tags we need to add the flag ‘--tags’. To move branches from the remote
repository to the local repository we use ‘git pull’. If we are working with multiple
remote repositories we can specify the target repository at the end of the command:
$ git push --set-upstream origin master
If we are working with multiple remotes we can specify which one we are pushing
to:
14
$ git push origin
local
repository
remote
repository
push
fetch, pull
If a remote repository already exists and we want to create a local repository to work
on it we can use:
$ git clone user@server:/path/on/server/repo.git
in the folder in which we want to create our working directory.
4 Working in groups of developers
Now that we understand the basics of git, we can start developing our own projects
with our colleagues. But how does one keep big repositories with many users clean? We
already know half of the answer to this, which is to use multiple branches. A common
strategy in software development is to declare the master a special branch that may
only contain fully implemented, tested features. For every feature that gets added a new
feature branch is created, for each bugfix to be made a bugfixbranch is made. This can
look something like this:
dev feat1
master
dev feat2
bugfix 1
Here the naming convention is to start the name of each feature branch with ‘dev ’
and the name of each bugfix branch with ‘bugfix ’. The developers working on those
branches are not allowed to merge anything in the ‘master’ unless it is fully tested and
documented. Huge software companies even make their developers pear-review each
others code before it can be merged. You might also find that they use branches called
release branches on which specialized testing teams do a lot of additional software testing
before anything gets tagged and then shipped to the customer.
15