Skip to content

Git pipeline

Here first (but also incomplete) draft of the guide with some details at the end:

// Few first sections are from Pro GIT by Scott Schacon a very good book which you can find online

So, what is Git in a nutshell?

This is an important section to absorb, because if you understand what Git is and the fundamentals of how it works, then using Git effectively will probably be much easier for you. As you learn Git, try to clear your mind of the things you may know about other VCSs, such as CVS or Subversion — doing so will help you avoid subtle confusion when using the tool. Even though Git’s user interface is fairly similar to these other VCSs, Git stores and thinks about information in a very different way, and understanding these differences will help you avoid becoming confused while using it.

Snapshots, Not Differences!

The major difference between Git and any other VCS is the way Git thinks about its data. Conceptually, most other systems store information as a list of file-based changes. These other systems (CVS, Subversion) think of the information they store as a set of files and the changes made to each file over time (this is commonly described as delta-based version control).

Git doesn’t think of or store its data this way. Instead, Git thinks of its data more like a series of snapshots of a miniature filesystem. With Git, every time you commit, or save the state of your project, Git basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored. Git thinks about its data more like a stream of snapshots.

Git Has Integrity!

Everything in Git is checksummed before it is stored and is then referred to by that checksum. This means it’s impossible to change the contents of any file or directory without Git knowing about it. This functionality is built into Git at the lowest levels and is integral to its philosophy. You can’t lose information in transit or get file corruption without Git being able to detect it.

Git Generally Only Adds Data!

When you do actions in Git, nearly all of them only add data to the Git database. It is hard to get the system to do anything that is not undoable or to make it erase data in any way. As with any VCS, you can lose or mess up changes you haven’t committed yet, but after you commit a snapshot into Git, it is very difficult to lose, especially if you regularly push your database to another repository. This makes using Git a joy because we know we can experiment without the danger of severely screwing things up.

But how we should organise our work with GIT?

The current state of GitHub/Bifrost looks somehow like this (dots: commits, arrows: merges):

Git pipeline

We have moved Bifrost to Github some time ago and that version got tag 1.0. (read about tagging in the book) At that point some of you have created a new branche and continue working without syncing it for example with master. This is not necessary a problem if we worked on different files/modules BUT because GiT is a collection of snapshots not only content of some files may be different but ‘state’ of entire $BIFROST folder e.g., git traces also access permissions privileges, something you control with chgrp)

Fortunately GiT is a good at merging and we should merge soon into one branch “master-dev” and this is why we should do this:

1) I hope we all aim at having one stable code, free of bugs and ready to be executed on HPC systems without any problems.

For that we need “master” branch. Branch which changes slowly because everything what will be added to master must be tested first to make sure that your 1 week long run you’ve added to the queue 3 days ago will not crash after 1 iteration because of some silly bug.

2) .. “but I don’t want to wait for stupid tests to be finish!..I have this new BC I’ve just finished and tested on my laptop, but now I want to run it on HPC!”.

Ok, for that we going to have “master-dev” (which I will rename to develop soon to match names in git-flow):

“master-dev”/“develop" - is the branch where we are going to integrate all our work. Place where you will add your new function or you will push a big update. Place were all developers have chance to reflect on your modification and where they can check how this will affect theirs work. This is a branch you want to keep always updated…also a branch I will continuously auto test for “silly” mistakes i.e., if all modules still compiles after each new commit and if there is no crashes with simple experiments.

After some time of quarantine, all changes on master-dev will go though some serious tests (e.g., though longer debug runs) and if everything will be ok, I will merge them into ‘master’ and we will have a new release of Bifrost with new tag and mail to all developers listing changes with the respect to the previous one.

3) .. “but I am not ready to push my changes to master-dev..I am just testing some ideas”

.. sure no problem, it means that you are working on some new feature and you just need a new branch. A clone of master-dev to which you will apply your modifications. Such mini project might take some time and in a mean time some of us probably will continue committing to master-dev, thus it is very important you will keep merging your new branch with master-dev. When you happy with your work. It is time to merge your branch with master-dev

If we will organise ourself this way. Our repository will look something like this:

Git pipeline

This arrangement is very similar to “git-flow” an idea introduced here.

IMPORTANT - How we proceed ?

First of all, we need to integrate our work i.e., merge our branches into develop/master-dev and this is how we can do it:

git fetch —all

To check what’s going on the GitHub

git branch —all

To check on which branch I am and how many different branches are there

git checkout your_branch
git pull origin

Check if there is no local changes ready to be commit to your branchL

git status

If there is "nothing to commit, working tree clean” you are ready to go.

git merge --squash origin/master-dev

This will take all the commits from master-dev branch, squash them into 1 commit, and merge it with your branch

git status

Will list files which will be modify by this merge but:

git status -v

Will show you all modifications this one squashed commit made to your local files

The best is to check one-by-one with:

git diff --staged file_from_the_git_status_list

If you don’t want that change you can always:

git reset file_from_the_git_status_list

To get:

Unstaged changes after reset:
M  work_mpi.f90

Finally if there is no conflicts just go with:

git commit

This will open your default editor with the list of all commits. Just save it and leave the edition it to proceed.

Finally you can try to merge your branch with master-dev. The best would be to doit with pull request. For that push first your branch to the origin with:

git push origin

Next go to GitHub and select your branch (1) and create New Pull request (2):

Git pipeline

This should looks like that: (1) base is a branch to which you would like to push and (2) is your branch.

Git pipeline