{{indexmenu_n>2}} ====== Transferring data ===== For most use cases data to be analyzed needs to be transferred to Hábrók and results need to be transferred to other locations. In this page we will describe two basic file transfer methods, followed by a description of a more advanced tool. ===== Using MobaXterm ====== The MobaXterm SSH client offers a file browser interface to files on the Hábrók cluster. As shown in the following figure. {{habrok:introduction:what_is_a_cluster:mobaxterm_file_browser.png?nolink|}} This interface can be used to create directories and for downloading and uploading files. The interface offers a upload and download icon above the file listing. Files can also be dragged and dropped between MobaXterm and the Windows file explorer. The last way also allows a full directory including files to be transferred to or from Hábrók. ==== Alternative applications ==== There are also alternative file transfer clients available like: * [[https://filezilla-project.org/|Filezilla]]: Windows, OS X, Linux (See the page on [[habrok/data_management/clients/filezilla|FileZilla]] for more information on how to use it). * [[https://winscp.net|WinSCP]]: Windows ===== Using command line tools ===== Another option is to use command line tools for copying data from or to Hábrók. This option is available in local terminals for Mac OS X or Linux. MobaXterm also offers a terminal on the local system, as does the Linux subsystem for Windows if you were to install that. The two commands that are available are ''scp'' and ''sftp''. Both must be started from the remote system, unless that system can also be accessed through SSH directly. This normally is not the case for your personal desktop or laptop. So in the examples we assume that you work in a terminal session on your local desktop or laptop. If ran from Hábrók source and destination have to be adjusted. ==== scp ==== ''scp'' is a command for copying files to or from a remote location, accessible through SSH, over the network. The basic syntax is: scp [options] [user@host1:]source [user@host2:]destination Source and destination are mandatory. Either source or destination can be prefixed by a username and hostname for specifying the remote side. Here is an example that will transfer a single file from the local system to a directory, within the home directory of the user in Hábrók. Note that the ''user@host part'' is prefixed by a colon '':'': alice@skries:~/workdir$ scp vars.yml username@login2.hb.hpc.rug.nl:my_project vars.yml 100% 3495 198.4KB/s 00:00 alice@skries:~/workdir$ In the example above the file vars.yml from the directory workdir on the local desktop or laptop is being sent to the users directory my_project on Hábrók. Note that if you don't specify the username, the username will be taken to be your username on the local system. If the directory would not have existed the file would have been renamed as my_project in the users home directory on Hábrók. In order to copy the file back from Hábrók to the local system, source and destination have to be interchanged and adjusted where necessary. E.g.: alice@skries:~/workdir$ scp username@login2.hb.rug.nl:my_project/vars.yml . vars.yml 100% 3495 191.7KB/s 00:00 alice@skries:~/workdir$ In this case we had to make the remote location explicit, because the source has to be specified exactly. The destination is now ''.'', which is the current directory on the local system. In many cases we want to copy a whole directory from or to Hábrók. This can be done by using the option ''-r'', which like as for cp stands for recursive. alice@skries:~/workdir$ scp -r username@login2.hb.hpc.rug.nl:my_project . main.yml 100% 517 26.7KB/s 00:00 vars.yml 100% 3495 219.4KB/s 00:00 alice@skries:~/workdir$ After this the directory ''my_project'' will have been copied to ''workdir'' on the local system, as ''workdir'' was the current directory. The destination of the recursive scp has to be a directory. If the directory does not exist it will be created, and the contents of the source directory will be copied into this new directory. ==== More advanced: rsync ==== In many cases more than a few small files need to be transferred. Unfortunately there is always the chance of failing connections or other issues. When using scp this would either lead to removing the destination files, and starting anew, or a extensive manual check of the status of the transferred files. A file transfer tool, which can help solving this issue is ''rsync''. ''rsync'' is available in the Linux and OS X terminal, and also inside the MobaXterm local terminal window. ''rsync'' is a tool that offers the following advantages: * It makes sure that all files from the source are copies to the destination. * It allows restarting operations that were interrupted. * It can resume transferring partially copied files. This is useful for large incompletely transferred files. * It can be used to update files in the destination that were modified in the source, without retransferring unchanged files. The basic syntax of rsync is: rsync [options] source destination The most commonly used flags are: * ''-a'' A combination flag, this is used to copy directories and files where you wish to preserve symbolic links, permissions, etc. * ''-z'' Compression. This will first compress the files to be copied, then copy it over the network and uncompress the data at the other side. This is mainly useful when transferring large amounts of textual data. * ''-v'' Verbose. This will show information about the files that are being copied. * ''-p'' Shows a progress indicator. * ''--delete'' Delete files from the receiving end, which are missing at the sending side. * ''--append'' Append data onto shorter files on the receiving end. The ''--append-verify'' option will check the contents of the file first. These options are useful for resuming transfer of large files after a connection failure. Source and destination can be given in the same way as for ''scp''. Here is an example of transferring a directory from Hábrók to our local system: john@skries:~/testdir$ rsync -avz username@login2.hb.hpc.rug.nl:pydaal/pydaal-tutorials . receiving incremental file list pydaal-tutorials/ pydaal-tutorials/.gitignore pydaal-tutorials/HPC DevCon - Get Your Hands Dirty - v2.pptx pydaal-tutorials/LICENSE pydaal-tutorials/LR_example.ipynb pydaal-tutorials/NumericTables_example.ipynb pydaal-tutorials/README.md ... ... pydaal-tutorials/mldata/optdigits.tes pydaal-tutorials/mldata/optdigits.tra pydaal-tutorials/mldata/wine.data sent 2,012 bytes received 12,728,382 bytes 2,828,976.44 bytes/sec total size is 14,132,266 speedup is 1.11 At the end of the output you can see a summary of how much data was transferred and the average speed. **CAVEAT behaviour of rsync changes with final ''/'' at the source location. ** The behavour of ''rsync'' changes when a ''/'' is appended at the end of the source location. The man page explains this as follows: A trailing slash on the source changes this behavior to avoid creating an additional directory level at the destination. You can think of a trailing / on a source as meaning "copy the contents of this directory" as opposed to "copy the directory by name", but in both cases the at‐ tributes of the containing directory are transferred to the containing directory on the destination. In other words, each of the following commands copies the files in the same way, including their setting of the attributes of /dest/foo: rsync -av /src/foo /dest rsync -av /src/foo/ /dest/foo