Difference between revisions of "Group Storage Allocation FAQ"

From RCSWiki
Jump to navigation Jump to search
(Created page with "This page provides common questions and answers about the use of <code>/work</code> and <code>/bulk</code> storage on ARC. =Frequently Asked Questions= ==How do I access my...")
 
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
This page provides common questions and answers about the use of <code>/work</code> and <code>/bulk</code> storage on ARC.
 
This page provides common questions and answers about the use of <code>/work</code> and <code>/bulk</code> storage on ARC.
 +
 +
=General Information=
 +
Work and Bulk storage mostly work like any other directories that you have access to on ARC (e.g. your home directory). You can use the standard linux file system commands within them <code>ls</code>, <code>cd</code>, <code>cp</code>, <code>mv</code>, <code>rm</code>. You can also refer to them directly by their full path from any node in the cluster. As long as you set all of the permissions correctly, this means that you can treat these spaces the same as you do your home directory. Most of the complexity of using Work and Bulk storage on ARC comes from the handling of Linux permissions, which are mostly inconsequential in your home directory. For the examples in the rest of this document we will assume that your group allocation is named "somepi_lab".
  
 
=Frequently Asked Questions=
 
=Frequently Asked Questions=
  
==How do I access my work/bulk directory?==
+
==I can't access my advisor's (or other colleague's) work or bulk directory. Why not?==
 +
To access any work or bulk directory on ARC you must belong to the unix group associated with it. This can be requested for you by the owner (or their delegate) simply by emailing support@hpc.ucalgary.ca and requesting that you be added to the unix group (including the group name). This can be done at the same time that you ARC account is requested. Once you have been added to the unix group for the group allocation, you may still not be able to access all subdirectories in it, as some groups allow members to keep some data private from other members of the group. The permissions mechanism for this is explained in [[#How do Linux permissions work for sharing data in a work or bulk directory? | Linux Permissions]]
 +
 
 +
==How do I access my work or bulk directory?==
 +
 
 +
Accessing your work or bulk directory is much like accessing home directories. First, you need to [[Connecting to RCS HPC Systems|connect to ARC]]. From there, you will need to navigate to your allocation. If it is a work allocation, this would look something like
 +
<source lang="console">
 +
[username@arc ~]$ cd /work/somepi_lab
 +
[username@arc somepi_lab]$ ls -lh
 +
total 4.7G
 +
-rw-r--r-- 1 username somepi_lab 2.4G Feb  3 10:13 A.csv
 +
-rw-r--r-- 1 otheruser somepi_lab 2.4G Feb  3 10:16 B.csv
 +
[username@arc somepi_lab]$ cp B.csv ~/myData
 +
</source> 
 +
here we have changed our current working directory to the Work directory, examined the contents of the directory, and copied a file (created by another user in the group) back to our home directory. This is only possible if we belong to the group somepi_lab. We don't have to copy files back to a home directory to work on them. More likely, we will have a subdirectory as a personal workspace under <code>/work/somepi_lab</code>.
 +
 
 +
==How do I reference a work or bulk directory from a job running on ARC?==
 +
A work or bulk directory can be referenced from a job script just like you home directory. The work and bulk directories are accessible from every compute node and nothing special needs to be done to write to them (beyond managing permissions). A jobs script could be something like
 +
 
 +
<source lang="bash">
 +
#!/bin/bash
 +
#SBATCH --partition=single
 +
#SBATCH --time=2:0:0
 +
#SBATCH --nodes=1
 +
#SBATCH --ntasks=1
 +
#SBATCH --cpus-per-task=2
 +
#SBATCH --mem=1000
 +
 
 +
export PATH=/work/somepi_lab/software/anaconda3/bin:$PATH
 +
cd /work/somepi_lab/username/examples/1
 +
 
 +
python ./scripts/matmul_test.py
 +
</source>
 +
here both absolute and relative path references to the work directory are used without issue from within a job.
 +
 
 +
==How do I transfer data to a work or bulk directory directly from a personal workstation?==
 +
 
 +
Data transfers to your group allocation can be done in the same manner as data transfers to your home directory. Please review the article on data transfers: [[How to transfer data]] The only difference is that you need to explicitly point at the path to the work or bulk directory and can't rely on a wildcard like <code>~</code> or the assumption of a relative path starting from <code>~</code> in your path name. For example, a transfer to a data directory in your '''home''' directory might be:
 +
<syntaxhighlight lang="bash">
 +
desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:"~/data"
 +
</syntaxhighlight>
 +
 
 +
Whereas in your '''work''' directory it would look like:
 +
<syntaxhighlight lang="bash">
 +
desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:"/work/somepi_lab/data"
 +
</syntaxhighlight>
 +
 
 +
==How do I transfer data to a work or bulk directory from my home directory?==
 +
Files and directories can be copied and moved between home and work/bulk in the usual way (with <code>cp -R</code> and <code>mv</code> respectively). However, you must have appropriate read, write and execute permissions in the subdirectory of the group allocation.
 +
 
 +
==How do Linux permissions work for sharing data in a work or bulk directory?==
 +
 
 +
Permissions for file sharing in group storage allocations are based on standard linux permissions. A work or bulk directory will have a unix group associated with it that everyone who is supposed to have access will belong to. If /work/somepi_lab is the directory of your allocation, you can get this information as follows: 
 +
<source lang="console">
 +
[username@arc ~]$ ls -l /work | grep somepi
 +
drwxrws---. 16 root      somepi_lab          4096 Sep 27 12:47 somepi_lab
 +
</source>
 +
This tells us that somepi_lab is also the name of the unix group for the group allocation. Sharing is then accomplished by using this as the group for Linux permissions on files and directories in your work or bulk directory. A detailed introduction to <code>ls -l</code> output and making sense of Linux permissions can be found in the wiki article: [[Linux Permissions]]
 +
 
 +
==My colleague has opened up a directory for me to access. Why can't I use <code>ls</code> to look inside it?==
 +
As described in [[Linux Permissions]], most directory operations hinge on on the execute permission. What is less obvious is that one needs the execute permission on every directory between the file system root (i.e. <code>/</code>) and the directory that you want to look inside of. If one layer is missing the execute permission, you won't be able to inspect the directory of interest. 
 +
 
 +
==I have access to two group storage allocations. How do I move data between them?==
 +
 
 +
Different group storage allocations (e.g. <code>/work/somepi1_lab</code> and <code>/work/somepi1_somepi2_jointproject</code>) are implemented (behind the scenes) as different virtual file systems mounted on ARC. This means that the standard way that <code> mv </code> works will fail between group allocations. Instead you will need to copy the data between directories (where you have write permissions) and then remove the data from the source if you don't want it to remain behind. This can be achieved as a <code>cp -R /work/somepi1_lab/source /work/somepi1_somepi2_jointproject/target</code> and then optionally <code>rm -R /work/somepi1_lab/source</code>. The combined operation can be achieved with <code>rsync --remove-source-files</code>.
 +
 
 +
==None of my colleagues can read files that I create in my work or bulk directory. What is going on?==
 +
If the set-group-id bit is not set on the directory, new files will not be created with the right unix group by default. For a file to be readable by your colleagues, you will need the group and group permissions to both be set correctly. For example,
 +
<source lang="console">
 +
$ ls -lh /work/somepi_lab | grep user2_private
 +
drwxr-x--- 12 username2 username2  4096 Feb  9 1:13  user2_private
 +
</source>
 +
will not be readable by your colleagues because the group doesn't include them.
 +
 
 +
 
 +
<source lang="console">
 +
$ ls -lh /work/somepi_lab | grep user2_failshared
 +
drwx------ 3  username2 somepi_lab 2.4G Feb  9 1:16  user2_failshared
 +
</source>
 +
will not be readable by your colleagues because the group permissions don't allow them to
 +
 
  
==How do I reference a work/bulk directory from a job running on ARC?==
+
<source lang="console">
 +
$ ls -lh /work/somepi_lab | grep user2_shared
 +
drwxr-s--- 3  username2 somepi_lab 2.4G Feb  9 1:16  user2_shared
 +
drwxr-x--- 3  username2 somepi_lab 2.4G Feb  9 1:16  user2_shared2
 +
</source>
 +
will both be readable by your colleagues.
 +
 +
==How do I share data with another colleague on ARC without adding them to the unix group for my allocation?==
 +
This question has a number of possible answers. In principle, any data transfer strategy to or from a workstation can be used to get data to another user. However, we will focus here on a solution that doesn't ever go outside of ARC.
  
==How do I transfer data to a work/bulk directory==
+
One strategy is to have the current data owner temporarily open a single area of their group allocation to being accessed by anyone. For example
 +
<source lang="console">
 +
$ chmod -R o+rx /work/somepi_lab/data/external_sharing
 +
... wait for the person that is being shared with to cp -R /work/somepi_lab/data/external_sharing ...
 +
$ chmod -R o-rx /work/somepi_lab/data/external_sharing
 +
</source>

Latest revision as of 21:09, 15 October 2021

This page provides common questions and answers about the use of /work and /bulk storage on ARC.

General Information

Work and Bulk storage mostly work like any other directories that you have access to on ARC (e.g. your home directory). You can use the standard linux file system commands within them ls, cd, cp, mv, rm. You can also refer to them directly by their full path from any node in the cluster. As long as you set all of the permissions correctly, this means that you can treat these spaces the same as you do your home directory. Most of the complexity of using Work and Bulk storage on ARC comes from the handling of Linux permissions, which are mostly inconsequential in your home directory. For the examples in the rest of this document we will assume that your group allocation is named "somepi_lab".

Frequently Asked Questions

I can't access my advisor's (or other colleague's) work or bulk directory. Why not?

To access any work or bulk directory on ARC you must belong to the unix group associated with it. This can be requested for you by the owner (or their delegate) simply by emailing support@hpc.ucalgary.ca and requesting that you be added to the unix group (including the group name). This can be done at the same time that you ARC account is requested. Once you have been added to the unix group for the group allocation, you may still not be able to access all subdirectories in it, as some groups allow members to keep some data private from other members of the group. The permissions mechanism for this is explained in Linux Permissions

How do I access my work or bulk directory?

Accessing your work or bulk directory is much like accessing home directories. First, you need to connect to ARC. From there, you will need to navigate to your allocation. If it is a work allocation, this would look something like

[username@arc ~]$ cd /work/somepi_lab
[username@arc somepi_lab]$ ls -lh
total 4.7G
-rw-r--r-- 1 username somepi_lab 2.4G Feb  3 10:13 A.csv
-rw-r--r-- 1 otheruser somepi_lab 2.4G Feb  3 10:16 B.csv
[username@arc somepi_lab]$ cp B.csv ~/myData

here we have changed our current working directory to the Work directory, examined the contents of the directory, and copied a file (created by another user in the group) back to our home directory. This is only possible if we belong to the group somepi_lab. We don't have to copy files back to a home directory to work on them. More likely, we will have a subdirectory as a personal workspace under /work/somepi_lab.

How do I reference a work or bulk directory from a job running on ARC?

A work or bulk directory can be referenced from a job script just like you home directory. The work and bulk directories are accessible from every compute node and nothing special needs to be done to write to them (beyond managing permissions). A jobs script could be something like

#!/bin/bash
#SBATCH --partition=single 
#SBATCH --time=2:0:0 
#SBATCH --nodes=1 
#SBATCH --ntasks=1 
#SBATCH --cpus-per-task=2 
#SBATCH --mem=1000

export PATH=/work/somepi_lab/software/anaconda3/bin:$PATH
cd /work/somepi_lab/username/examples/1

python ./scripts/matmul_test.py

here both absolute and relative path references to the work directory are used without issue from within a job.

How do I transfer data to a work or bulk directory directly from a personal workstation?

Data transfers to your group allocation can be done in the same manner as data transfers to your home directory. Please review the article on data transfers: How to transfer data The only difference is that you need to explicitly point at the path to the work or bulk directory and can't rely on a wildcard like ~ or the assumption of a relative path starting from ~ in your path name. For example, a transfer to a data directory in your home directory might be:

desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:"~/data"

Whereas in your work directory it would look like:

desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:"/work/somepi_lab/data"

How do I transfer data to a work or bulk directory from my home directory?

Files and directories can be copied and moved between home and work/bulk in the usual way (with cp -R and mv respectively). However, you must have appropriate read, write and execute permissions in the subdirectory of the group allocation.

How do Linux permissions work for sharing data in a work or bulk directory?

Permissions for file sharing in group storage allocations are based on standard linux permissions. A work or bulk directory will have a unix group associated with it that everyone who is supposed to have access will belong to. If /work/somepi_lab is the directory of your allocation, you can get this information as follows:

[username@arc ~]$ ls -l /work | grep somepi
drwxrws---. 16 root       somepi_lab          4096 Sep 27 12:47 somepi_lab

This tells us that somepi_lab is also the name of the unix group for the group allocation. Sharing is then accomplished by using this as the group for Linux permissions on files and directories in your work or bulk directory. A detailed introduction to ls -l output and making sense of Linux permissions can be found in the wiki article: Linux Permissions

My colleague has opened up a directory for me to access. Why can't I use ls to look inside it?

As described in Linux Permissions, most directory operations hinge on on the execute permission. What is less obvious is that one needs the execute permission on every directory between the file system root (i.e. /) and the directory that you want to look inside of. If one layer is missing the execute permission, you won't be able to inspect the directory of interest.

I have access to two group storage allocations. How do I move data between them?

Different group storage allocations (e.g. /work/somepi1_lab and /work/somepi1_somepi2_jointproject) are implemented (behind the scenes) as different virtual file systems mounted on ARC. This means that the standard way that mv works will fail between group allocations. Instead you will need to copy the data between directories (where you have write permissions) and then remove the data from the source if you don't want it to remain behind. This can be achieved as a cp -R /work/somepi1_lab/source /work/somepi1_somepi2_jointproject/target and then optionally rm -R /work/somepi1_lab/source. The combined operation can be achieved with rsync --remove-source-files.

None of my colleagues can read files that I create in my work or bulk directory. What is going on?

If the set-group-id bit is not set on the directory, new files will not be created with the right unix group by default. For a file to be readable by your colleagues, you will need the group and group permissions to both be set correctly. For example,

$ ls -lh /work/somepi_lab | grep user2_private
drwxr-x--- 12 username2 username2  4096 Feb  9 1:13   user2_private

will not be readable by your colleagues because the group doesn't include them.


$ ls -lh /work/somepi_lab | grep user2_failshared
drwx------ 3  username2 somepi_lab 2.4G Feb  9 1:16   user2_failshared

will not be readable by your colleagues because the group permissions don't allow them to


$ ls -lh /work/somepi_lab | grep user2_shared
drwxr-s--- 3  username2 somepi_lab 2.4G Feb  9 1:16   user2_shared
drwxr-x--- 3  username2 somepi_lab 2.4G Feb  9 1:16   user2_shared2

will both be readable by your colleagues.

How do I share data with another colleague on ARC without adding them to the unix group for my allocation?

This question has a number of possible answers. In principle, any data transfer strategy to or from a workstation can be used to get data to another user. However, we will focus here on a solution that doesn't ever go outside of ARC.

One strategy is to have the current data owner temporarily open a single area of their group allocation to being accessed by anyone. For example

$ chmod -R o+rx /work/somepi_lab/data/external_sharing 
... wait for the person that is being shared with to cp -R /work/somepi_lab/data/external_sharing ...
$ chmod -R o-rx /work/somepi_lab/data/external_sharing