Group Storage Allocation FAQ: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
Line 62: Line 62:
$ ls -lh
$ ls -lh
total 2.4G
total 2.4G
-rw-rw-r-- 1 username somepi_lab 2.4G Feb  3 10:13 A.csv
-rw-rw-r-- 1 username1 somepi_lab 2.4G Feb  3 10:13 A.csv
drwxr-x--- 2 username somepi_lab 4096 Feb  3 10:16 username_files
drwxr-x--- 2 username1 somepi_lab 4096 Feb  3 10:16 username1_files
drwxr-x--- 12 user2 user2 4096 Feb  9 1:13 user2
drwxr-x--- 12 username2 username2  4096 Feb  9 1:13   user2
drwxr-S--- 3 user2 somepi_lab 2.4G Feb  9 1:16 user2_shared
drwxr-S--- 3 username2 somepi_lab 2.4G Feb  9 1:16   user2_shared
</source>
</source>


Understanding this output and the implications for group sharing requires some careful discussion of ideas from Linux permissions. There are three parts to permissions for every directory and regular file: owner, group, and mode. The first piece of information printed on each line is the mode and looks like <code>-rw-rw-r--</code>. It is exactly 10 characters long and they are all important to permissions. The second piece of information is the number of links and we won't need it for our discussion. The third piece of information is a username for the owner of the file. The fourth piece of information is a group name for the group associated with the file. The fifth, sixth, and seventh pieces of information are the size, file status change date, and the file or directory name.
Understanding this output and the implications for group sharing requires some careful discussion of ideas from Linux permissions. There are three parts to permissions for every directory and regular file: user, group, and mode. The first piece of information printed on each line is the mode and looks like <code>-rw-rw-r--</code>. It is exactly 10 characters long and they are all important to permissions. The second piece of information is the number of links and we won't need it for our discussion. The third piece of information is called the ''user'' (or owner) of the file. The fourth piece of information is called the ''group'' of the file. The fifth, sixth, and seventh pieces of information are the size, file status change date, and the file or directory name.


We will begin by explaining the owner and group and then we will use these to explain how to make sense of the mode string. The owner is the user who owns the file or directory and can change permissions on it.
We will begin by explaining the user and group and then we will use these to explain how to make sense of the mode string. The ''user'' of the file or directory is the username of a user who owns the file or directory. This user can change permissions on the file or directory (and usually has the most privileges for reading and writing). The ''group'' of the file or directory is a name for a collection of users that (often) have special privileges for accessing the file or directory (but typically less than the owner). If a particular user is trying to access the file, the system will ask what their relationship is to the file. If they are the owner then they have user permissions. If they are not the owner but belong to the file's group then they have group permissions. If they are not the owner, and do not belong to the file's group, then their relationship to the file is ''other''.
 
We are now able to talk about what the mode tells us. The mode can be read as consisting of 4 parts
<source>
-rw-rw-r--
drwxr-x---
(file type)(user permissions)(group permissions)(other permissions)
(    -    )(      rw-      )(        rw-      )(      r--        )
(    d    )(      rwx      )(        r-x      )(      ---        )
</source>
 
The file type of a file is - if it is a regular file, d if it is a directory, and l if it is a symbolic link to another file or directory.
 
Each type of permissions consists of three characters for three types of permissions: (r)ead, (w)rite, and e(x)ecute. For regular files, read permissions generally allow users to access data in the file, write permissions allow users to change the contents of the file, and execute permissions allow users to run it like a piece of software. For directories, the combination of read and execute permissions allow users to look in the directory, while the combination of write and execute allows adding or removing files and directories inside it. Broadly speaking, execute is required to do anything with directories.
 
If the permissions string contains a - where "r", "w", or "x" would be, then the respective users do not have that permission. If the character is the letter then they do have that permission. There are two more important options called set-user-id and set-group-id. In the user and group permissions respectively they are indicated by s or S in the execute character. s is used if execute is also set and S if it is not. These enable special behaviours and should not appear on most files. However, setting the set-group-id bit on a directory causes its group to inherit in files created under it. Consequently, you are likely to see this in at least some parts of your group allocation directory.
 
In our example above, the first record is a file and its owner can read or write but not execute on it, while anyone belonging to the group of the file can read or write but not execute and anyone else can read it but not write or execute. The second record is a directory and the owner can read write or execute, members of the group can read or execute, and no one else can do anything.
 
Returning to our original example,
<source lang="console">
-rw-rw-r-- 1 username1 somepi_lab 2.4G Feb  3 10:13 A.csv
</source>
The regular file, A.csv, is owned by username1 and has a group somepi_lab. The user with the username, username1, is the owner of the file and can change its permissions, they can also read and write in the file. Likewise, any member of somepi_lab can read and write in the file. Finally, anyone on ARC at all can read the file if they know where it is to begin with and its name.
 
<source lang="console">
drwxr-x--- 2  username1 somepi_lab 4096 Feb  3 10:16  username1_files
</source>
 
The directory, username1_files, is owned by username1 and has a group somepi_lab. The user with username, username1, is the owner of the directory and can change its permissions, they can also inspect the contents of the directory and create and delete files in it. Any member of somepi_lab can inspect the contents of the directory but not create or delete files or directories in it. However, note that the set-group-id bit is not set so the group will not automatically inherit. No one else can look in the directory or create or delete files or directories in it.
     
<source lang="console">
drwxr-x--- 12 username2 username2  4096 Feb  9 1:13  user2
</source>
 
The directory, user2, is owned by username2 and has a group username2 (a group exclusive to username2). The user with username, username2, is the owner of the directory and can change its permissions, they can also inspect the contents of the directory and create and delete files in it. Any member of the group username2 can inspect the contents of the directory but not create or delete files or directories in it. No member of somepi_lab would be part of this group so the group permissions are ultimately immmaterial. No one else can look in the directory or create or delete files or directories in it. In effect, this is a private directory for the user with username: username2.
 
<source lang="console">
drwxr-S--- 3  username2 somepi_lab 2.4G Feb  9 1:16  user2_shared
</source>
 
The directory, user2_shared, is owned by username2 and has a group somepi_lab. The user with username, username2, is the owner of the directory and can change its permissions, they can also inspect the contents of the directory and create and delete files in it. Any member of somepi_lab can inspect the immediate contents of the directory but not create or delete files or directories in it. They also won't be able to inspect the contents of directories beneath it. However, the set-group-id bit is set, so the group of somepi_lab will inherit for files created in the directory. No one else can look in the directory or create or delete files or directories in it.
 
Finally, to modify this directory so that was completely open to the group, username2 could run a change mode command (<code>chmod</code>):
<source lang="console">
[username2@arc somepi_lab]$chmod g+wx user2_shared
[username2@arc somepi_lab]$ls -l
...
drwxrws--- 3  username2 somepi_lab 2.4G Feb  9 1:16  user2_shared
</source>
where the command says to add write and execute permissions to the group permissions. <code>chmod -R</code> causes the mode to be changed recursively on the directory and every file and directory anywhere under it.
 
To change a file or directory's ownership, the owner can use the chown command to change the owner and/or group:
 
<source lang="console">
[username2@arc somepi_lab]$chown -R username2:somepi_lab user2
[username2@arc somepi_lab]$ls -l
...
drwxr-x--- 12 username2 somepi_lab  4096 Feb  9 1:13  user2
...
</source>


==My colleague has opened up a directory for me to access. Why can't I use <code>ls</code> to look inside it?==
==My colleague has opened up a directory for me to access. Why can't I use <code>ls</code> to look inside it?==

Revision as of 04:07, 15 October 2021

This page provides common questions and answers about the use of /work and /bulk storage on ARC.

General Information

Work and Bulk storage mostly work like any other directories that you have access to on ARC (e.g. your home directory). You can use the standard linux file system commands within them ls, cd, cp, mv, rm. You can also refer to them directly by their full path from any node in the cluster. As long as you set all of the permissions correctly, this means that you can treat these spaces the same as you do your home directory. Most of the complexity of using Work and Bulk storage on ARC comes from the handling of Linux permissions, which are mostly inconsequential in your home directory. For the examples in the rest of this document we will assume that your group allocation is named "somepi_lab".

Frequently Asked Questions

I can't access my advisor's (or other colleague's) work or bulk directory. Why not?

To access any work or bulk directory on ARC you must belong to the unix group associated with it. This can be requested for you by the owner (or their delegate) simply by emailing support@hpc.ucalgary.ca and requesting that you be added to the unix group (including the group name). This can be done at the same time that you ARC account is requested. Once you have been added to the unix group for the group allocation, you may still not be able to access all subdirectories in it, as some groups allow members to keep some data private from other members of the group. The permissions mechanism for this is explained in Linux Permissions

How do I access my work or bulk directory?

Accessing your work or bulk directory is much like accessing home directories. First, you need to connect to ARC. From there, you will need to navigate to your allocation. If it is a work allocation, this would look something like

[username@arc ~]$ cd /work/somepi_lab
[username@arc somepi_lab]$ ls -lh
total 4.7G
-rw-r--r-- 1 username somepi_lab 2.4G Feb  3 10:13 A.csv
-rw-r--r-- 1 otheruser somepi_lab 2.4G Feb  3 10:16 B.csv
[username@arc somepi_lab]$ cp B.csv ~/myData

here we have changed our current working directory to the Work directory, examined the contents of the directory, and copied a file (created by another user in the group) back to our home directory. This is only possible if we belong to the group somepi_lab. We don't have to copy files back to a home directory to work on them. More likely, we will have a subdirectory as a personal workspace under /work/somepi_lab.

How do I reference a work or bulk directory from a job running on ARC?

A work or bulk directory can be referenced from a job script just like you home directory. The work and bulk directories are accessible from every compute node and nothing special needs to be done to write to them (beyond managing permissions). A jobs script could be something like

#!/bin/bash
#SBATCH --partition=single 
#SBATCH --time=2:0:0 
#SBATCH --nodes=1 
#SBATCH --ntasks=1 
#SBATCH --cpus-per-task=2 
#SBATCH --mem=1000

export PATH=/work/somepi_lab/software/anaconda3/bin:$PATH
cd /work/somepi_lab/username/examples/1

python ./scripts/matmul_test.py

here both absolute and relative path references to the work directory are used without issue from within a job.

How do I transfer data to a work or bulk directory directly from a personal workstation?

Data transfers to your group allocation can be done in the same manner as data transfers to your home directory. Please review the article on data transfers: How to transfer data The only difference is that you need to explicitly point at the path to the work or bulk directory and can't rely on a wildcard like ~ or the assumption of a relative path starting from ~ in your path name. For example, a transfer to a data directory in your home directory might be:

desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:"~/data"

Whereas in your work directory it would look like:

desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:"/work/somepi_lab/data"

How do I transfer data to a work or bulk directory from my home directory?

Files and directories can be copied and moved between home and work/bulk in the usual way (with cp -R and mv respectively). However, you must have appropriate read, write and execute permissions in the subdirectory of the group allocation.

How do Linux permissions work for sharing data in a work or bulk directory?

When we examine a group allocation directory with ls -l, we will get a lot of details that we may not usually pay attention to for files in our home directory.

$cd /work/somepi_lab
$ ls -lh
total 2.4G
-rw-rw-r-- 1  username1 somepi_lab 2.4G Feb  3 10:13  A.csv
drwxr-x--- 2  username1 somepi_lab 4096 Feb  3 10:16  username1_files
drwxr-x--- 12 username2 username2  4096 Feb  9 1:13   user2
drwxr-S--- 3  username2 somepi_lab 2.4G Feb  9 1:16   user2_shared

Understanding this output and the implications for group sharing requires some careful discussion of ideas from Linux permissions. There are three parts to permissions for every directory and regular file: user, group, and mode. The first piece of information printed on each line is the mode and looks like -rw-rw-r--. It is exactly 10 characters long and they are all important to permissions. The second piece of information is the number of links and we won't need it for our discussion. The third piece of information is called the user (or owner) of the file. The fourth piece of information is called the group of the file. The fifth, sixth, and seventh pieces of information are the size, file status change date, and the file or directory name.

We will begin by explaining the user and group and then we will use these to explain how to make sense of the mode string. The user of the file or directory is the username of a user who owns the file or directory. This user can change permissions on the file or directory (and usually has the most privileges for reading and writing). The group of the file or directory is a name for a collection of users that (often) have special privileges for accessing the file or directory (but typically less than the owner). If a particular user is trying to access the file, the system will ask what their relationship is to the file. If they are the owner then they have user permissions. If they are not the owner but belong to the file's group then they have group permissions. If they are not the owner, and do not belong to the file's group, then their relationship to the file is other.

We are now able to talk about what the mode tells us. The mode can be read as consisting of 4 parts

-rw-rw-r--
drwxr-x---
(file type)(user permissions)(group permissions)(other permissions)
(    -    )(       rw-      )(        rw-      )(      r--        )
(    d    )(       rwx      )(        r-x      )(      ---        )

The file type of a file is - if it is a regular file, d if it is a directory, and l if it is a symbolic link to another file or directory.

Each type of permissions consists of three characters for three types of permissions: (r)ead, (w)rite, and e(x)ecute. For regular files, read permissions generally allow users to access data in the file, write permissions allow users to change the contents of the file, and execute permissions allow users to run it like a piece of software. For directories, the combination of read and execute permissions allow users to look in the directory, while the combination of write and execute allows adding or removing files and directories inside it. Broadly speaking, execute is required to do anything with directories.

If the permissions string contains a - where "r", "w", or "x" would be, then the respective users do not have that permission. If the character is the letter then they do have that permission. There are two more important options called set-user-id and set-group-id. In the user and group permissions respectively they are indicated by s or S in the execute character. s is used if execute is also set and S if it is not. These enable special behaviours and should not appear on most files. However, setting the set-group-id bit on a directory causes its group to inherit in files created under it. Consequently, you are likely to see this in at least some parts of your group allocation directory.

In our example above, the first record is a file and its owner can read or write but not execute on it, while anyone belonging to the group of the file can read or write but not execute and anyone else can read it but not write or execute. The second record is a directory and the owner can read write or execute, members of the group can read or execute, and no one else can do anything.

Returning to our original example,

-rw-rw-r-- 1 username1 somepi_lab 2.4G Feb  3 10:13 A.csv

The regular file, A.csv, is owned by username1 and has a group somepi_lab. The user with the username, username1, is the owner of the file and can change its permissions, they can also read and write in the file. Likewise, any member of somepi_lab can read and write in the file. Finally, anyone on ARC at all can read the file if they know where it is to begin with and its name.

drwxr-x--- 2  username1 somepi_lab 4096 Feb  3 10:16  username1_files

The directory, username1_files, is owned by username1 and has a group somepi_lab. The user with username, username1, is the owner of the directory and can change its permissions, they can also inspect the contents of the directory and create and delete files in it. Any member of somepi_lab can inspect the contents of the directory but not create or delete files or directories in it. However, note that the set-group-id bit is not set so the group will not automatically inherit. No one else can look in the directory or create or delete files or directories in it.

drwxr-x--- 12 username2 username2  4096 Feb  9 1:13   user2

The directory, user2, is owned by username2 and has a group username2 (a group exclusive to username2). The user with username, username2, is the owner of the directory and can change its permissions, they can also inspect the contents of the directory and create and delete files in it. Any member of the group username2 can inspect the contents of the directory but not create or delete files or directories in it. No member of somepi_lab would be part of this group so the group permissions are ultimately immmaterial. No one else can look in the directory or create or delete files or directories in it. In effect, this is a private directory for the user with username: username2.

drwxr-S--- 3  username2 somepi_lab 2.4G Feb  9 1:16   user2_shared

The directory, user2_shared, is owned by username2 and has a group somepi_lab. The user with username, username2, is the owner of the directory and can change its permissions, they can also inspect the contents of the directory and create and delete files in it. Any member of somepi_lab can inspect the immediate contents of the directory but not create or delete files or directories in it. They also won't be able to inspect the contents of directories beneath it. However, the set-group-id bit is set, so the group of somepi_lab will inherit for files created in the directory. No one else can look in the directory or create or delete files or directories in it.

Finally, to modify this directory so that was completely open to the group, username2 could run a change mode command (chmod):

[username2@arc somepi_lab]$chmod g+wx user2_shared
[username2@arc somepi_lab]$ls -l
...
drwxrws--- 3  username2 somepi_lab 2.4G Feb  9 1:16   user2_shared

where the command says to add write and execute permissions to the group permissions. chmod -R causes the mode to be changed recursively on the directory and every file and directory anywhere under it.

To change a file or directory's ownership, the owner can use the chown command to change the owner and/or group:

[username2@arc somepi_lab]$chown -R username2:somepi_lab user2
[username2@arc somepi_lab]$ls -l
...
drwxr-x--- 12 username2 somepi_lab  4096 Feb  9 1:13   user2
...

My colleague has opened up a directory for me to access. Why can't I use ls to look inside it?

I have access to two group storage allocations. How do I move data between them?

None of my colleagues can read files that I create in my work or bulk directory. What is going on?

How do I share data with another colleague on ARC without adding them to the unix group for my allocation?