Transferring data

For most use cases data to be analyzed needs to be transferred to Hábrók and results need to be transferred to other locations. In this page we will describe two basic file transfer methods, followed by a description of a more advanced tool.

The MobaXterm SSH client offers a file browser interface to files on the Hábrók cluster. As shown in the following figure.

This interface can be used to create directories and for downloading and uploading files. The interface offers a upload and download icon above the file listing. Files can also be dragged and dropped between MobaXterm and the Windows file explorer. The last way also allows a full directory including files to be transferred to or from Hábrók.

There are also alternative file transfer clients available like:

Another option is to use command line tools for copying data from or to Hábrók. This option is available in local terminals for Mac OS X or Linux. MobaXterm also offers a terminal on the local system, as does the Linux subsystem for Windows if you were to install that. The two commands that are available are scp and sftp. Both must be started from the remote system, unless that system can also be accessed through SSH directly. This normally is not the case for your personal desktop or laptop. So in the examples we assume that you work in a terminal session on your local desktop or laptop. If ran from Hábrók source and destination have to be adjusted.

scp is a command for copying files to or from a remote location, accessible through SSH, over the network. The basic syntax is:

scp [options] [user@host1:]source [user@host2:]destination

Source and destination are mandatory. Either source or destination can be prefixed by a username and hostname for specifying the remote side. Here is an example that will transfer a single file from the local system to a directory, within the home directory of the user in Hábrók. Note that the user@host part is prefixed by a colon ::

alice@skries:~/workdir$ scp vars.yml username@login2.hb.hpc.rug.nl:my_project
vars.yml                                      100% 3495   198.4KB/s   00:00    
alice@skries:~/workdir$ 

In the example above the file vars.yml from the directory workdir on the local desktop or laptop is being sent to the users directory my_project on Hábrók. Note that if you don't specify the username, the username will be taken to be your username on the local system.

If the directory would not have existed the file would have been renamed as my_project in the users home directory on Hábrók.

In order to copy the file back from Hábrók to the local system, source and destination have to be interchanged and adjusted where necessary. E.g.:

alice@skries:~/workdir$ scp username@login2.hb.rug.nl:my_project/vars.yml .
vars.yml                                      100% 3495   191.7KB/s   00:00    
alice@skries:~/workdir$ 

In this case we had to make the remote location explicit, because the source has to be specified exactly. The destination is now ., which is the current directory on the local system.

In many cases we want to copy a whole directory from or to Hábrók. This can be done by using the option -r, which like as for cp stands for recursive.

alice@skries:~/workdir$ scp -r username@login2.hb.rug.nl:my_project .
main.yml                                      100%  517    26.7KB/s   00:00    
vars.yml                                      100% 3495   219.4KB/s   00:00    
alice@skries:~/workdir$ 

After this the directory my_project will have been copied to workdir on the local system, as workdir was the current directory. The destination of the recursive scp has to be a directory. If the directory does not exist it will be created, and the contents of the source directory will be copied into this new directory.

In many cases more than a few small files need to be transferred. Unfortunately there is always the chance of failing connections or other issues. When using scp this would either lead to removing the destination files, and starting anew, or a extensive manual check of the status of the transferred files.

A file transfer tool, which can help solving this issue is rsync. rsync is available in the Linux and OS X terminal, and also inside the MobaXterm local terminal window.

rsync is a tool that offers the following advantages:

  • It makes sure that all files from the source are copies to the destination.
  • It allows restarting operations that were interrupted.
  • It can resume transferring partially copied files. This is useful for large incompletely transferred files.
  • It can be used to update files in the destination that were modified in the source, without retransferring unchanged files.

The basic syntax of rsync is:

rsync [options] source destination

The most commonly used flags are:

  • -a A combination flag, this is used to copy directories and files where you wish to preserve symbolic links, permissions, etc.
  • -z Compression. This will first compress the files to be copied, then copy it over the network and uncompress the data at the other side. This is mainly useful when transferring large amounts of textual data.
  • -v Verbose. This will show information about the files that are being copied.
  • -p Shows a progress indicator.
  • –delete Delete files from the receiving end, which are missing at the sending side.
  • –append Append data onto shorter files on the receiving end. The –append-verify option will check the contents of the file first. These options are useful for resuming transfer of large files after a connection failure.

Source and destination can be given in the same way as for scp. Here is an example of transferring a directory from Hábrók to our local system:

john@skries:~/testdir$ rsync -avz username@login2.hb.hpc.rug.nl:pydaal/pydaal-tutorials .
receiving incremental file list
pydaal-tutorials/
pydaal-tutorials/.gitignore
pydaal-tutorials/HPC DevCon - Get Your Hands Dirty - v2.pptx
pydaal-tutorials/LICENSE
pydaal-tutorials/LR_example.ipynb
pydaal-tutorials/NumericTables_example.ipynb
pydaal-tutorials/README.md
...
...
pydaal-tutorials/mldata/optdigits.tes
pydaal-tutorials/mldata/optdigits.tra
pydaal-tutorials/mldata/wine.data

sent 2,012 bytes  received 12,728,382 bytes  2,828,976.44 bytes/sec
total size is 14,132,266  speedup is 1.11

At the end of the output you can see a summary of how much data was transferred and the average speed.

CAVEAT behaviour of rsync changes with final / at the source location. The behavour of rsync changes when a / is appended at the end of the source location. The man page explains this as follows:

       A  trailing slash on the source changes this behavior to avoid creating
       an additional directory level at the destination.  You can think  of  a
       trailing / on a source as meaning "copy the contents of this directory"
       as opposed to "copy the directory by name", but in both cases  the  at‐
       tributes  of the containing directory are transferred to the containing
       directory on the destination.  In other words, each  of  the  following
       commands  copies  the files in the same way, including their setting of
       the attributes of /dest/foo:

              rsync -av /src/foo /dest
              rsync -av /src/foo/ /dest/foo