Difference between revisions of "Group Storage Allocation FAQ"

From RCSWiki
Jump to navigation Jump to search
Line 57: Line 57:
  
 
==How do Linux permissions work for sharing data in a work or bulk directory?==
 
==How do Linux permissions work for sharing data in a work or bulk directory?==
 +
When we examine a group allocation directory with <code>ls -l</code>, we will get a lot of details that we may not usually pay attention to for files in our home directory.
 +
<source lang="console">
 +
$cd /work/somepi_lab
 +
$ ls -lh
 +
total 2.4G
 +
-rw-rw-r-- 1 username somepi_lab 2.4G Feb  3 10:13 A.csv
 +
drwxr-x--- 2 username somepi_lab 4096 Feb  3 10:16 username_files
 +
drwxr-x--- 12 user2 user2 4096 Feb  9 1:13 user2
 +
drwxr-S--- 3 user2 somepi_lab 2.4G Feb  9 1:16 user2_shared
 +
</source>
 +
 +
Understanding this output and the implications for group sharing requires some careful discussion of ideas from Linux permissions. There are three parts to permissions for every directory and regular file: owner, group, and mode. The first piece of information printed on each line is the mode and looks like <code>-rw-rw-r--</code>. It is exactly 10 characters long and they are all important to permissions. The second piece of information is the number of links and we won't need it for our discussion. The third piece of information is a username for the owner of the file. The fourth piece of information is a group name for the group associated with the file. The fifth, sixth, and seventh pieces of information are the size, file status change date, and the file or directory name.
 +
 +
We will begin by explaining the owner and group and then we will use these to explain how to make sense of the mode string. The owner is the user who owns the file or directory and can change permissions on it.
  
 
==My colleague has opened up a directory for me to access. Why can't I use <code>ls</code> to look inside it?==
 
==My colleague has opened up a directory for me to access. Why can't I use <code>ls</code> to look inside it?==

Revision as of 22:17, 14 October 2021

This page provides common questions and answers about the use of /work and /bulk storage on ARC.

General Information

Work and Bulk storage mostly work like any other directories that you have access to on ARC (e.g. your home directory). You can use the standard linux file system commands within them ls, cd, cp, mv, rm. You can also refer to them directly by their full path from any node in the cluster. As long as you set all of the permissions correctly, this means that you can treat these spaces the same as you do your home directory. Most of the complexity of using Work and Bulk storage on ARC comes from the handling of Linux permissions, which are mostly inconsequential in your home directory. For the examples in the rest of this document we will assume that your group allocation is named "somepi_lab".

Frequently Asked Questions

I can't access my advisor's (or other colleague's) work or bulk directory. Why not?

To access any work or bulk directory on ARC you must belong to the unix group associated with it. This can be requested for you by the owner (or their delegate) simply by emailing support@hpc.ucalgary.ca and requesting that you be added to the unix group (including the group name). This can be done at the same time that you ARC account is requested. Once you have been added to the unix group for the group allocation, you may still not be able to access all subdirectories in it, as some groups allow members to keep some data private from other members of the group. The permissions mechanism for this is explained in Linux Permissions

How do I access my work or bulk directory?

Accessing your work or bulk directory is much like accessing home directories. First, you need to connect to ARC. From there, you will need to navigate to your allocation. If it is a work allocation, this would look something like

[username@arc ~]$ cd /work/somepi_lab
[username@arc somepi_lab]$ ls -lh
total 4.7G
-rw-r--r-- 1 username somepi_lab 2.4G Feb  3 10:13 A.csv
-rw-r--r-- 1 otheruser somepi_lab 2.4G Feb  3 10:16 B.csv
[username@arc somepi_lab]$ cp B.csv ~/myData

here we have changed our current working directory to the Work directory, examined the contents of the directory, and copied a file (created by another user in the group) back to our home directory. This is only possible if we belong to the group somepi_lab. We don't have to copy files back to a home directory to work on them. More likely, we will have a subdirectory as a personal workspace under /work/somepi_lab.

How do I reference a work or bulk directory from a job running on ARC?

A work or bulk directory can be referenced from a job script just like you home directory. The work and bulk directories are accessible from every compute node and nothing special needs to be done to write to them (beyond managing permissions). A jobs script could be something like

#!/bin/bash
#SBATCH --partition=single 
#SBATCH --time=2:0:0 
#SBATCH --nodes=1 
#SBATCH --ntasks=1 
#SBATCH --cpus-per-task=2 
#SBATCH --mem=1000

export PATH=/work/somepi_lab/software/anaconda3/bin:$PATH
cd /work/somepi_lab/username/examples/1

python ./scripts/matmul_test.py

here both absolute and relative path references to the work directory are used without issue from within a job.

How do I transfer data to a work or bulk directory directly from a personal workstation?

Data transfers to your group allocation can be done in the same manner as data transfers to your home directory. Please review the article on data transfers: How to transfer data The only difference is that you need to explicitly point at the path to the work or bulk directory and can't rely on a wildcard like ~ or the assumption of a relative path starting from ~ in your path name. For example, a transfer to a data directory in your home directory might be:

desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:"~/data"

Whereas in your work directory it would look like:

desktop$ rsync -axv my_data1 my_data2 my_data3 username@arc-dtn.ucalgary.ca:"/work/somepi_lab/data"

How do I transfer data to a work or bulk directory from my home directory?

Files and directories can be copied and moved between home and work/bulk in the usual way (with cp -R and mv respectively). However, you must have appropriate read, write and execute permissions in the subdirectory of the group allocation.

How do Linux permissions work for sharing data in a work or bulk directory?

When we examine a group allocation directory with ls -l, we will get a lot of details that we may not usually pay attention to for files in our home directory.

$cd /work/somepi_lab
$ ls -lh
total 2.4G
-rw-rw-r-- 1 username somepi_lab 2.4G Feb  3 10:13 A.csv
drwxr-x--- 2 username somepi_lab 4096 Feb  3 10:16 username_files
drwxr-x--- 12 user2 user2 4096 Feb  9 1:13 user2
drwxr-S--- 3 user2 somepi_lab 2.4G Feb  9 1:16 user2_shared

Understanding this output and the implications for group sharing requires some careful discussion of ideas from Linux permissions. There are three parts to permissions for every directory and regular file: owner, group, and mode. The first piece of information printed on each line is the mode and looks like -rw-rw-r--. It is exactly 10 characters long and they are all important to permissions. The second piece of information is the number of links and we won't need it for our discussion. The third piece of information is a username for the owner of the file. The fourth piece of information is a group name for the group associated with the file. The fifth, sixth, and seventh pieces of information are the size, file status change date, and the file or directory name.

We will begin by explaining the owner and group and then we will use these to explain how to make sense of the mode string. The owner is the user who owns the file or directory and can change permissions on it.

My colleague has opened up a directory for me to access. Why can't I use ls to look inside it?

I have access to two group storage allocations. How do I move data between them?

None of my colleagues can read files that I create in my work or bulk directory. What is going on?

How do I share data with another colleague on ARC without adding them to the unix group for my allocation?