Skip to content

Rclone - What is it?

Rclone is a command-line program that manages files in cloud storage. Rclone has powerful cloud equivalents to the Unix commands rsync, cp, mv, mount, ls, du, tree, rm, and cat. Rclone's familiar syntax includes shell pipeline support and --dry-run protection. It is used at the command line, in scripts or via its API.

https://rclone.org

Installation

On Mac OS

brew install rclone

On Linux

curl -O https://downloads.rclone.org/rclone-current-linux-amd64.zip
unzip rclone-current-linux-amd64.zip
cd rclone-*-linux-amd64
cp rclone* ~/.local/bin

Make sure ~/.local/bin is in your $PATH env constant.

Configuration - example for Betzy HPC

touch ~/.config/rclone/rclone.conf

Add content:

[betzy]
type = sftp
host = betzy.sigma2.no
known_hosts_file = ~/.ssh/known_hosts
key_file = ~/.ssh/id_sigma2
md5sum_command = md5sum
sha1sum_command = sha1sum
port = 12
shell_type = unix

Command

rclone copy
rclone move

More elaborate example

Moving (i.e., copying data from the HPC system, checking correctness, removing on HPC system) files with selected extension and age

rclone -P move --include "*.{snap,idl,aux}" betzy:/cluster/work/sim sim_local

Above command will move (i.e., equivalent of unix mv command + checks sum) all files with extensions: *.snap,*.idl,*.aux leaving behind everything else (e.g., file with extension *.scr).

You can also add sub command --min-age 1h to make sure that files which are at least 1h old are moved. This can be useful if you are intend to restart your simulation from full snapshot not from *.scr file.

Using crontab to get data from HPC periodically

Crontab is a utility in Linux and Unix systems that allows users to schedule and automate the execution of scripts or commands on a regular basis. Crontab uses a simple configuration file to specify the commands to run and when to run them.

To use crontab, you need to create a crontab file and add the commands you want to run at specified intervals. Here is an example of how to do this:

Open a terminal window and type crontab -e to edit the crontab file. (best to do it on your workstation).

In the crontab file, add a line that specifies the command to run and the schedule at which to run it. The syntax for this is:

* * * * * command-to-run

The five asterisks represent the minute, hour, day of the month, month, and day of the week. You can use numeric values or keywords like /5 to specify intervals. For example, /5 * * * * would run the command every 5 minutes.

Save and exit the crontab file. Crontab will automatically take care of running the command at specified intervals.

Here is an example crontab line that runs a script called myscript.sh (you can call rclone or similar directly here) every day at midnight:

0 0 * * * /path/to/myscript.sh

Remember to replace /path/to/myscript.sh with the actual path to your script.

Using crontab can be a powerful tool to automate tasks and save time. However, be careful when scheduling commands, as they will run automatically and may have unintended consequences if not configured correctly.

Troubleshooting

After sima2 forced me to use 2FA I started having problem with connection via rclone simple solution for that is a simple command to run once:

ssh-keyscan -p 12 betzy.sigma2.no >> ~/.ssh/known_hosts
ssh-keyscan -p 12 login-3.betzy.sigma2.no >> ~/.ssh/known_hosts
ssh-keyscan -p 12 login-2.betzy.sigma2.no >> ~/.ssh/known_hosts