Transferring Data from CHGI: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
No edit summary
Line 99: Line 99:
#* If you have read permissions, continue with instructions below.
#* If you have read permissions, continue with instructions below.


Log in to ARC DTN via SSH. If you are unable to connect, ensure that you are [https://ucalgary.service-now.com/kb_view.do?sysparm_article=KB0032795 connected to the University of Calgary General VPN]. Once connected, start a new screen session:
<syntaxhighlight lang="text">
# Connect to arc-dtn.ucalgary.ca.
$ ssh user@arc-dtn.ucalgary.ca


Log in to ARC DTN and start a new screen session:
<syntaxhighlight lang="text">
# Start a new screen session named 'transfer'
# Start a new screen session named 'transfer'
$ screen -S transfer
$ screen -S transfer

Revision as of 21:28, 28 October 2020

ARC-DTN and CHGI NFS mounts

Due to the large datasets that are currently stored at the Center for Health Genomics and Informatics (CHGI), we have set up a dedicated 10Gbit fibre connection between the ARC DTN (Data Transfer Node) and CHGI to help users quickly transfer files as part of the CHGI Transition.

Users needing to migrate their data from CHGI to ARC are able to do so through the read-only NFS mounts that have been set up on the ARC DTN node. The NFS mounts will automatically take advantage of the dedicated 10Gbit fibre connection to help maximize your transfer speed. Access to the NFS filesystems are restricted to authorized CHGI group members. Please contact support@hpc.ucalgary.ca if you have difficulty accessing your files.

Most filesystems at CHGI have been made available under the /external mount point on ARC DTN. Please refer to the following tables to help determine your filesystem path on ARC-DTN.

GPFS
ARC DTN CHGI
/external/chgihome /gpfs/home
/external/gpfs/achri_data /gpfs/achri_data
/external/gpfs/achri_galaxy /gpfs/achri_galaxy
/external/gpfs/cbousman /gpfs/cbousman
/external/gpfs/charb_data /gpfs/charb_data
/external/gpfs/common /gpfs/common
/external/gpfs/ebg_data /gpfs/ebg_data
/external/gpfs/ebg_gmb /gpfs/ebg_gmb
/external/gpfs/ebg_projects /gpfs/ebg_projects
/external/gpfs/ebg_web /gpfs/ebg_web
/external/gpfs/ebg_work /gpfs/ebg_work
/external/gpfs/gallo /gpfs/gallo
/external/gpfs/qlong /gpfs/qlong
/external/gpfs/snyder_irida /gpfs/snyder_irida
/external/gpfs/snyder_work /gpfs/snyder_work
/external/gpfs/vetmed_data /gpfs/vetmed_data
/external/gpfs/vetmed_stage /gpfs/vetmed_stage
Tiered
ARC DTN CHGI
/external/tiered/achri_data /tiered/achri_data
/external/tiered/chgi_data /tiered/chgi_data
/external/tiered/ebg_mic /tiered/ebg_mic
/external/tiered/ewang_scratch /tiered/ewang_scratch
/external/tiered/ewang /tiered/ewang
/external/tiered/jwasmuth /tiered/jwasmuth
/external/tiered/kkurek /tiered/kkurek
/external/tiered/morph /tiered/morph
/external/tiered/mtgraovac /tiered/mtgraovac
/external/tiered/parnold /tiered/parnold
/external/tiered/robbins /tiered/robbins
/external/tiered/smorrissy /tiered/smorrissy
/external/tiered/snyder_data /tiered/snyder_data

When transferring your files from CHGI to ARC, please remember that:

  • Your ARC home directory has a 500GB quota. Your CHGI home directory may have significantly more data than this quota. Please transfer large datasets to your group's work directory instead.
  • File permissions via NFS are made possible using the CHGI group permissions. Your files at CHGI must be group readable in order for them to be accessible via the NFS mount.

Starting a File Transfer

For large data transfers, we recommend using rsync within a screen or tmux session. This ensures that even when you disconnect from ARC DTN, your transfers will still continue to run.

Transferring files with rsync within screen

rsync is a file transfer utility which you can use to help transfer files from one location to another. Since processes spawned by your session typically terminates when you log out, we will need to use a terminal multiplexer program (such as screen or tmux) to help ensure your processes remain running after you log out. In this section, we will show you how to run rsync within a screen session to ensure your file transfers remain uninterrupted when you log out.

Before initiating a file transfer, ensure that you have the proper read permissions to your CHGI files:

  1. Log in to the ARC DTN via SSH at arc-dtn.ucalgary.ca.
  2. Ensure that you have read permissions to the files you are interested. Verify this by navigating to the filesystem based on the mapping shown above and try listing or reading your files.
    • If you do not have read permission or have any difficulties, please contact us at support@hpc.ucalgary.ca.
    • If you have read permissions, continue with instructions below.

Log in to ARC DTN via SSH. If you are unable to connect, ensure that you are connected to the University of Calgary General VPN. Once connected, start a new screen session:

# Connect to arc-dtn.ucalgary.ca.
$ ssh user@arc-dtn.ucalgary.ca

# Start a new screen session named 'transfer'
$ screen -S transfer

Once within the screen session, you may initiate a file transfer with rsync. If your data at CHGI was originally located at /gpfs/directoryA and you wish to migrate the data to your new work directory at /work/my-group, run the following rsync command:

$ rsync -axv /external/gpfs/directoryA  /work/my-group/

While the transfer is running, you may disconnect from the screen session by hitting the hotkey Ctrl-a followed by d. You may later reattach to this screen session by running:

# List all available screen sessions
$ screen -ls

# Reconnect to the screen session named 'transfer'
$ screen -r transfer

Once your file transfer is complete, you may quit the screen session by running exit within the screen session.