Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
habrok:data_management:sharing_data [2024/05/23 09:29] – fokke | habrok:data_management:sharing_data [2025/03/07 16:10] (current) – Add Fokke's expanded shared dir docs pedro | ||
---|---|---|---|
Line 2: | Line 2: | ||
====== Sharing data ====== | ====== Sharing data ====== | ||
- | We don't allow users to open up their private folders using file system permissions or access control lists. This because managing these correctly can be complicated and can easily lead to security problems, where users accidentally share data. | + | We don't allow users to open up their private folders, using file system permissions or access control lists. This because managing these correctly can be complicated, and can therefore |
- | If you want to share data on Hábrók with other users, there are two options. | + | |
+ | If you need to share data on Hábrók with other users, there are two options. | ||
+ | |||
+ | Next to this we also offer ''/ | ||
+ | |||
+ | Note that the second part of this page has sections on how to manage access privileges in order fix issues with group access to data sets. | ||
===== Group directory ===== | ===== Group directory ===== | ||
- | A group directory is useful if you want to share data with a group of users and the other cluster | + | A group directory is useful if you need to share data with a group of users, and the other users on the cluster must not have access to that data. In this case we can set up a group on the cluster for this limited set of users, and give the group access to one or more shared folders. |
- | Group directories are created on ''/ | + | |
+ | These group directories are created on ''/ | ||
For working with this data there are two models: | For working with this data there are two models: | ||
- | - The files in the shared folder are readable and writable for all group members. | + | - There is a single group with access to the data, and the files in the shared folder are readable and writable for all group members. |
- | - There is a data manager | + | - There are one or more data managers |
- | If you want to request a group directory, please contact | + | If you want to request a group directory, please contact |
+ | - The proposed name of the group (this name should not be in use already, and be convenient | ||
+ | - The amount | ||
+ | - Who the primary owner of the group is. This person has to approve the requests for joining | ||
+ | - A second person who can act as an alternative contact person for the group to approve these requests. | ||
+ | - Do all users need full write access or are there data managers? In case there are data managers two groups will be created, one with write access and another with read-only access. The group names will be suffixed with '' | ||
+ | - If there are data managers, who will fulfill that role? | ||
===== Public directory ====== | ===== Public directory ====== | ||
- | Sometimes you need to share non-sensitive, | + | Sometimes you need to share non-sensitive, |
+ | |||
+ | When you need to share data for a longer period, please let us know. We can then create a persistent directory in ''/ | ||
+ | |||
+ | |||
+ | ===== Software directory ====== | ||
+ | |||
+ | Since ''/ | ||
+ | |||
+ | Please contact [[hpc@rug.nl]] if you need additional space on ''/ | ||
+ | |||
+ | |||
+ | ===== File system permissions ===== | ||
+ | |||
+ | ==== General description ==== | ||
+ | |||
+ | In order to be able to fix issues with the file system permissions, | ||
+ | A more thorough explanation of managing Linux file permissions can be found at: https:// | ||
+ | |||
+ | In POSIX based file systems, like used on Linux, files and directories have an owner and a group. Note that only the owner of a file can change the permissions on files and fix issues with those. Each file and directory has three sets of permissions. One regarding the owner, one regarding the group and one for everybody else. These can be listed using '' | ||
+ | |||
+ | < | ||
+ | -rw-rw-r--. 1 user1 hb-public-courses 168024976 Sep 5 2023 dataset.tar.gz | ||
+ | -rw-rw-r--. 1 user2 hb-public-courses | ||
+ | drwxrwsr-x. 2 user2 hb-public-courses 4096 Oct 17 10:27 inputfiles | ||
+ | -rw-rw-r--. 1 user1 hb-public-courses | ||
+ | </ | ||
+ | |||
+ | In the example some files are owned by '' | ||
+ | |||
+ | There are three permission groups shown like '' | ||
+ | |||
+ | The first set of '' | ||
+ | |||
+ | Note that all top level directories for private and group directories on the cluster are set to be unreadable and unwritable by anybody except the group, which means that the files and directories inside can only be accessed by group members. This even though the files inside may have read, write or execute permissions for " | ||
+ | |||
+ | |||
+ | ==== The sgid bit for group directories ==== | ||
+ | |||
+ | In normal situations the group attached to a newly created file or directory will be the primary group of the person writing the file. On the cluster this will be the p- or s-number group. This means that other group members that have access to the directory may not be able to read, modify or delete the file. | ||
+ | |||
+ | To make sure files are owned by the shared group instead, we set the sgid bit on the group directories. This is listed as a lowercase '' | ||
+ | |||
+ | Please be aware that files can also have the '' | ||
+ | |||
+ | |||
+ | ===== Preserving group directory permissions ===== | ||
+ | |||
+ | The group directories are set up in such a way that files will be readable and writable by the appropriate group(s). Through the sgid bit newly created files will get the right permissions. | ||
+ | |||
+ | Archiving and copying tools may, however, override these default permissions, | ||
+ | |||
+ | In order to deal with these issues, the person copying the data must take some precautions. Furthermore, | ||
+ | |||
+ | === cp === | ||
+ | |||
+ | For '' | ||
+ | |||
+ | Normally the only important attributes to keep, when copying data are the original timestamps. This can be achieved using the option '' | ||
+ | |||
+ | Furthermore the permissions of the files in the destination should be those for the group directories, | ||
+ | |||
+ | So copying a directory of data, preserving the time stamps, could be done like: | ||
+ | < | ||
+ | cp -r --preserve=timestamps --no-preserve=mode $HOME/ | ||
+ | </ | ||
+ | |||
+ | |||
+ | === rsync === | ||
+ | |||
+ | The commonly used '' | ||
+ | |||
+ | Leaving out '' | ||
+ | |||
+ | Furthermore we can tell '' | ||
+ | |||
+ | So the full example for '' | ||
+ | < | ||
+ | rsync -rltv --chmod=ug=rwX | ||
+ | </ | ||
+ | |||
+ | |||
+ | === tar === | ||
+ | |||
+ | In our testing tar created files with the right group ownership. You may still need to fix the group read and write permissions after extraction. See the instructions below for details. | ||
+ | |||
+ | |||
+ | ==== Fixing file and directory permissions ==== | ||
+ | |||
+ | When permissions in a group directory are wrong, the person owning the files can fix these using the '' | ||
+ | < | ||
+ | chmod g+rwX file_or_directory | ||
+ | </ | ||
+ | The '' | ||
+ | |||
+ | If you want to change the permission for a directory, including all files and subdirectories inside, one can add the '' | ||
+ | < | ||
+ | chmod -R g+rwX directory_name | ||
+ | </ | ||
+ | |||
+ | To prevent new files from being owned by the private group of the creator the sgid bit must be set on directories. This can be done using: | ||
+ | < | ||
+ | chmod g+s directory_name | ||
+ | </ | ||
+ | |||
+ | Since this sgid bit should not be used on files, we cannot use the '' | ||
+ | < | ||
+ | find . -type d -exec chmod g+s {} \; | ||
+ | </ | ||
+ | This will find all files of type '' | ||
+ | |||
+ | Finally giving other groups read and execute access can be achieved using: | ||
+ | < | ||
+ | chmod o+rX file_or_directory | ||
+ | </ | ||
+ | This is of course only required in case a " | ||
+ | |||
+ | |||
+ | ==== File system access control lists ===== | ||
+ | |||
+ | The permission system described above can only handle a single user and group. If multiple groups need access to data, file system access control lists (ACLs) must be used. These give an additional set of controls on the access rights of files and directories. | ||
+ | |||
+ | Setting the correct rights on the top level group directory, using an ACL for the read-only group, is sufficient to prevent the other cluster users from accessing the files and directories inside. Because the ACL system is quite complex, it is better to manage the rights for the other read-only group using the standard permissions for " | ||
+ | |||
+ | Since it is important to be able to check the rights on the group folder, the use of '' | ||
+ | |||
+ | === Retrieving the current access control list === | ||
+ | |||
+ | Files and directories that have an ACL applied will show an additional '' | ||
+ | < | ||
+ | drwxrws---+ 7 root hb-acl_testing_rw 20480 Feb 28 08:56 hb-acl_testing_rw/ | ||
+ | </ | ||
+ | |||
+ | The current set of ACLs can be obtained using the '' | ||
+ | < | ||
+ | $ getfacl hb-acl_testing_rw | ||
+ | # file: hb-acl_testing_rw/ | ||
+ | # owner: root | ||
+ | # group: hb-acl_testing_rw | ||
+ | # flags: -s- | ||
+ | user::rwx | ||
+ | group:: | ||
+ | group: | ||
+ | mask::rwx | ||
+ | other:: | ||
+ | </ | ||
+ | This example shows that the directory is owned by the main system user '' | ||
+ | |||
+ | The ACL list shows that another group '' | ||
+ | |||
+ | These settings mean that users from both the groups '' | ||
+ | |||
+ | All other users on the cluster will not be able to access the data inside the folder at all. | ||
- | Since we have allocated limited space to this directory a cleanup script | + | Inside the group folder the regular permission bits for " |